Skip to main content

Extract Requirements Overview

Saphira’s Extract Requirements feature uses advanced OCR, NLP, and Vision Language Models (VLMs) to automatically extract requirements, components, and technical specifications from your source materials with high accuracy.

Getting Started

From the Dashboard, navigate to the Seek Guidance section and click Extract Requirements. Extract Requirements Card This opens the extraction method selection dialog where you choose how to extract requirements from your source material.

Choose Extraction Method

Choose Extraction Method Saphira provides two extraction methods optimized for different source materials:

Extract from Document

Upload PDF, Word, or other documents to extract requirements and text content.Best for:
  • Technical specifications
  • Requirements documents
  • Standards documents
  • Engineering reports

Extract from Image

Upload architecture diagrams, screenshots, or images to extract requirements and structure.Best for:
  • System architecture diagrams
  • Block diagrams and flowcharts
  • Screenshots of requirements
  • Schematic drawings

Extract from Document

Upload text-based documents (PDF, Word, etc.) to extract structured requirements.

Supported Document Types

Extract from PDF files including:
  • Technical specifications and requirements documents
  • Standards documents (ISO, IEC, UL standards)
  • Engineering reports and design documents
  • Regulatory compliance documentation
Features:
  • Page-level extraction with section preservation
  • Table and figure extraction
  • Cross-reference detection and linking
  • Hierarchical structure preservation
Extract from Microsoft Word documents:
  • Structured requirements with headings
  • Tables of requirements
  • Embedded images and diagrams
  • Track changes and comments preserved
Extract from Excel/CSV files:
  • Requirements tables with ID, description, parent
  • Bill of Materials (BOM) data
  • Test matrices and traceability tables
Automatic column header detection and field mapping.

Document Extraction Workflow

1

Select Extract from Document

From the extraction method dialog, click Extract from Document
2

Upload Your Document

Drag and drop or browse to select your PDF, Word, or other document file
3

AI Extracts Requirements

Saphira analyzes the document and extracts:
  • Requirement IDs and descriptions
  • Values, units, and logical operators
  • Parent-child relationships
  • Cross-references
4

Classify & Validate

Review extracted requirements:
  • AI suggests requirement classification
  • INCOSE lint checks quality
  • Edit inline to refine
5

Import to Project

Import validated requirements to your project database with full traceability

Extracted Fields

For each requirement, Saphira extracts:
FieldDescriptionExample
IDUnique requirement identifierREQ-SAFETY-001
RequirementFull requirement text”The system shall…”
ParentParent requirement or categorySafety
Logical OperatorComparison operators>=, <=, within
UnitMeasurement unitms, °C, Gauss
ValueNumeric threshold500, 0.20-0.55
Requirement TypeNumeric or TextualNumeric

Extract from Image

Upload visual content like architecture diagrams and screenshots to extract requirements and system structure.

Supported Image Types

Extract from system architecture diagrams:
  • Component names and their roles (ECU, function, subsystem, system, process)
  • Connections and data flows between components
  • System boundaries and interfaces
  • Hierarchical structure
The VLM automatically:
  • Detects if an image is an architecture diagram (with confidence scoring)
  • Extracts every labeled element
  • Identifies relationships between components
Extract from visual diagrams:
  • Process steps and decision points
  • Data flow arrows and connections
  • Functional blocks and modules
  • State transitions
Extract from captured screenshots:
  • Requirements from legacy systems
  • Tables and structured data
  • Form fields and values
  • Any visible text content
Extract from electrical/electronic schematics:
  • Part numbers and component symbols
  • Component values and ratings
  • Connection information
  • Reference designators

Image Extraction Workflow

1

Select Extract from Image

From the extraction method dialog, click Extract from Image
2

Upload Your Image

Upload architecture diagrams, screenshots, or images (PNG, JPG, PDF)
3

VLM Analyzes Image

Saphira’s Vision Language Model:
  • Detects document type (diagram, chart, table, screenshot)
  • Extracts all visible text
  • Identifies technical components and relationships
  • Generates searchable keywords
4

Review Extracted Content

Review the structured extraction:
  • Components with roles and descriptions
  • Requirements derived from visual content
  • Hierarchical relationships
5

Import to Project

Import extracted requirements and components to your project

Architecture Diagram Processing

When you upload an architecture diagram, Saphira:
  1. Diagram Detection: Uses VLM to identify the image type with confidence scoring
  2. Component Enumeration: Extracts every labeled element:
    • Component names from diagram labels
    • Role classification (ECU, function, subsystem, system, process)
    • Brief descriptions of purpose and connections
  3. Relationship Mapping: Identifies connections and data flows
  4. Context Integration: Combines with item definition for complete system understanding

Extraction Accuracy

>90% Accuracy Target

Saphira uses a multi-pass extraction approach to achieve high accuracy:
  • Document/image structure analysis
  • Primary content extraction
  • Initial requirement identification
  • Component detection from diagrams
  • Cross-validation of extracted content
  • Relationship verification
  • Missing field detection
  • Confidence scoring
  • Format normalization
  • ID standardization
  • Traceability link verification
  • Quality assurance checks

Accuracy by Source Type

Source TypeAccuracy TargetKey Features
PDF Requirements>90%Hierarchical extraction, ID preservation
Architecture Diagrams>85%VLM component detection, relationship mapping
BOMs/Spreadsheets>95%Column auto-detection, format normalization
Screenshots>85%OCR text extraction, table detection
Schematics>80%Part number extraction, symbol recognition

Maximizing Accuracy

To get the best extraction results:
  1. Use High-Resolution Images: Clear, high-resolution images yield better VLM analysis
  2. Well-Formatted Documents: Documents with consistent formatting extract more accurately
  3. Clear Labels: Diagrams with readable labels improve component detection
  4. Review and Edit: Use the inline editor to refine extracted content

Post-Extraction Workflow

After extraction, requirements flow through:
Extract → Classify → INCOSE Lint → Suggest Children → Push to RM
AI automatically suggests requirement classification level:
  • Stakeholder
  • System
  • Subsystem
  • Component
  • Hardware/Software
Quality check against INCOSE requirements engineering principles:
  • Atomicity (single capability per requirement)
  • Verifiability (testable with single procedure)
  • Clarity (no ambiguous terms)
  • Suggested rewrites for improvement
AI suggests child requirements for decomposition:
  • Hierarchical breakdown
  • Component allocation
  • Traceability maintained
Import to project requirements database:
  • Full traceability to source document
  • Version control
  • Change tracking

Integration with Safety Workflows

Extracted requirements and components integrate directly with Saphira’s safety analysis:
  • FMEA: Extracted components become focus elements for failure mode analysis
  • HARA: Requirements populate hazard sources and control measures
  • TARA: Interface definitions feed cybersecurity asset identification
  • Gap Analysis: Extracted specifications enable automated standards compliance checking
  • Safety Case: Requirements become evidence in GSN safety arguments