When to use
Use fan-out extraction when your app needs to process documents that vary significantly in structure and content. A single extraction prompt cannot handle the differences between an invoice, a bill of lading, and a fuel receipt — each has different fields, layouts, and validation rules. This pattern is the right choice when:- You receive mixed document types in a single pipeline (email attachments, bulk uploads)
- Each document type has a distinct schema with specialized fields
- You need high extraction accuracy per type rather than a generic best-effort pass
- The number of document types may grow over time without rearchitecting the workflow
Architecture
The pattern follows two phases: classification, then type-specific extraction.Phase 1: Classification
A TEXT_UNDERSTANDING node receives the document and classifies it into one of the known types. The classification prompt constrains the output to an enumerated list, so the Condition node can route deterministically.Phase 2: Type-specific extraction
Each branch contains a TEXT_EXTRACTION node (using the Extract Data from File capability) configured with:- A prompt tailored to that document type, instructing the model which fields to look for
- A response schema defining the exact JSON structure expected for that type
- Extraction strategy settings optimized for the document format (text-heavy PDFs vs. scanned images)
Implementation
Step 1: Configure the classification node
Add a TEXT_UNDERSTANDING node and configure it to classify the document type. Example classification prompt:Step 2: Add the Condition node
Add a Condition node after the classification node. Configure branches based on thedocument_type value:
| Branch | Condition | Target |
|---|---|---|
| BOL | document_type == "BOL" | TEXT_EXTRACTION (BOL) |
| Invoice | document_type == "INVOICE" | TEXT_EXTRACTION (Invoice) |
| Rate confirmation | document_type == "RATE_CONFIRMATION" | TEXT_EXTRACTION (Rate confirmation) |
| Lumper receipt | document_type == "LUMPER_RECEIPT" | TEXT_EXTRACTION (Lumper receipt) |
| Fuel receipt | document_type == "FUEL_RECEIPT" | TEXT_EXTRACTION (Fuel receipt) |
| Default | document_type == "UNKNOWN" | Manual review queue |
Step 3: Configure type-specific extraction nodes
Each branch gets its own TEXT_EXTRACTION node with a prompt and schema optimized for that document type. The Extract Data from File capability handles PDFs, images (via OCR), and other supported file types automatically.- BOL (Bill of Lading)
- Invoice
- Rate confirmation
Prompt:Response schema:
Configuration reference
| Component | Node type | Key settings |
|---|---|---|
| Classifier | TEXT_UNDERSTANDING | Prompt with enumerated types, constrained response schema |
| Router | Condition | Branch per document type, default branch for unknown types |
| Extractor (per type) | TEXT_EXTRACTION (Extract Data from File) | Type-specific prompt, tailored response schema, extraction strategy |
| Merge | Merge / End node | Collects output from whichever branch executed |
The TEXT_EXTRACTION node uses the Extract Data from File capability, which supports PDFs, DOCX, XLSX, PPTX, and image formats (JPG, PNG, TIFF). Images are automatically converted to PDF before processing via OCR.
Scaling to many document types
This pattern scales well because adding a new document type requires only:- Adding the new type to the classification prompt’s enumerated list
- Adding a new branch in the Condition node
- Adding a new TEXT_EXTRACTION node with the type-specific prompt and schema
| Domain | Example document types |
|---|---|
| Logistics | BOL, invoice, rate confirmation, lumper receipt, fuel receipt, proof of delivery, customs declaration |
| Mortgage | Product sheets, income statements, bank statements, tax returns, appraisal reports, title documents, regulatory disclosures |
| Insurance | Claims forms, medical records, police reports, repair estimates, coverage declarations |
Variations
Parallel extraction
Instead of classifying first, run extraction for all document types simultaneously and pick the result with the highest confidence. This trades compute cost for lower latency and avoids classification errors.Parallel extraction works best when you have a small number of document types (under 5). Beyond that, the cost of running every extractor on every document becomes impractical.
Hierarchical classification
For large document sets, classify in two stages: first into a broad category (financial, shipping, legal), then into a specific type within that category. This reduces the number of options the classifier evaluates at each stage.Confidence fallback
Add a confidence threshold to the classification step. If the classifier returns a confidence below the threshold, route the document to a manual classification queue instead of risking an incorrect extraction.Related resources
Patterns overview
All available AI patterns and how to combine them
AI node types
Reference for TEXT_UNDERSTANDING, TEXT_EXTRACTION, and other node types
Extract Data from File
Configuration guide for the Extract Data from File node
AI comparison and reconciliation
Compare extracted data against system-of-record values

