Fan-out extraction

PreviewAgent Builder is currently in preview and may change before general availability.

When to use

Use fan-out extraction when your app needs to process documents that vary significantly in structure and content. A single extraction prompt cannot handle the differences between an invoice, a bill of lading, and a fuel receipt — each has different fields, layouts, and validation rules. This pattern is the right choice when:

You receive mixed document types in a single pipeline (email attachments, bulk uploads)
Each document type has a distinct schema with specialized fields
You need high extraction accuracy per type rather than a generic best-effort pass
The number of document types may grow over time without rearchitecting the workflow

Architecture

The pattern follows two phases: classification, then type-specific extraction.

Document
  │
  ▼
TEXT_UNDERSTANDING (classify type)
  │
  ▼
Condition (fork by type)
  ├──► TEXT_EXTRACTION (Type A) ──►─┐
  ├──► TEXT_EXTRACTION (Type B) ──►─┤
  ├──► TEXT_EXTRACTION (Type C) ──►─┤
  └──► TEXT_EXTRACTION (Type N) ──►─┘
                                    │
                                    ▼
                              Merge results

Phase 1: Classification

A TEXT_UNDERSTANDING node receives the document and classifies it into one of the known types. The classification prompt constrains the output to an enumerated list, so the Condition node can route deterministically.

Phase 2: Type-specific extraction

Each branch contains a TEXT_EXTRACTION node (using the Extract Data from File capability) configured with:

A prompt tailored to that document type, instructing the model which fields to look for
A response schema defining the exact JSON structure expected for that type
Extraction strategy settings optimized for the document format (text-heavy PDFs vs. scanned images)

A merge point downstream collects results from whichever branch executed.

Implementation

Step 1: Configure the classification node

Add a TEXT_UNDERSTANDING node and configure it to classify the document type. Example classification prompt:

You are a document classifier. Analyze the provided document and determine its type.

Respond with exactly one of the following values:
- BOL
- INVOICE
- RATE_CONFIRMATION
- LUMPER_RECEIPT
- FUEL_RECEIPT

Base your classification on the document layout, headers, and field labels.
If the document does not match any known type, respond with UNKNOWN.

Example response schema:

{
  "type": "object",
  "properties": {
    "document_type": {
      "type": "string",
      "enum": ["BOL", "INVOICE", "RATE_CONFIRMATION", "LUMPER_RECEIPT", "FUEL_RECEIPT", "UNKNOWN"]
    },
    "confidence": {
      "type": "number",
      "description": "Classification confidence between 0 and 1"
    }
  },
  "required": ["document_type", "confidence"]
}

Step 2: Add the Condition node

Add a Condition node after the classification node. Configure branches based on the document_type value:

Branch	Condition	Target
BOL	`document_type == "BOL"`	TEXT_EXTRACTION (BOL)
Invoice	`document_type == "INVOICE"`	TEXT_EXTRACTION (Invoice)
Rate confirmation	`document_type == "RATE_CONFIRMATION"`	TEXT_EXTRACTION (Rate confirmation)
Lumper receipt	`document_type == "LUMPER_RECEIPT"`	TEXT_EXTRACTION (Lumper receipt)
Fuel receipt	`document_type == "FUEL_RECEIPT"`	TEXT_EXTRACTION (Fuel receipt)
Default	`document_type == "UNKNOWN"`	Manual review queue

Step 3: Configure type-specific extraction nodes

Each branch gets its own TEXT_EXTRACTION node with a prompt and schema optimized for that document type. The Extract Data from File capability handles PDFs, images (via OCR), and other supported file types automatically.

BOL (Bill of Lading)
Invoice
Rate confirmation

Prompt:

Extract all relevant fields from this Bill of Lading document.
Pay special attention to carrier information, shipment details, and freight charges.
If a field is not present, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "bol_number": { "type": "string" },
    "carrier": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "mc_number": { "type": "string" },
        "scac_code": { "type": "string" }
      }
    },
    "shipper": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "zip": { "type": "string" }
      }
    },
    "consignee": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "zip": { "type": "string" }
      }
    },
    "ship_date": { "type": "string" },
    "delivery_date": { "type": "string" },
    "pieces": { "type": "integer" },
    "weight": { "type": "number" },
    "freight_charges": { "type": "number" }
  }
}

Prompt:

Extract all fields from this invoice document.
Capture line items as an array. Extract totals, tax amounts, and payment terms.
If a field is not present, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "invoice_date": { "type": "string" },
    "due_date": { "type": "string" },
    "vendor_name": { "type": "string" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "amount": { "type": "number" }
        }
      }
    },
    "subtotal": { "type": "number" },
    "tax": { "type": "number" },
    "total": { "type": "number" },
    "payment_terms": { "type": "string" }
  }
}

Prompt:

Extract all fields from this rate confirmation document.
Focus on load details, rate amounts, pickup and delivery information.
If a field is not present, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "confirmation_number": { "type": "string" },
    "load_number": { "type": "string" },
    "broker_name": { "type": "string" },
    "carrier_name": { "type": "string" },
    "rate_amount": { "type": "number" },
    "pickup_date": { "type": "string" },
    "pickup_location": { "type": "string" },
    "delivery_date": { "type": "string" },
    "delivery_location": { "type": "string" },
    "equipment_type": { "type": "string" },
    "commodity": { "type": "string" }
  }
}

Keep schemas as flat as possible for simple document types (like fuel receipts and lumper receipts). Reserve nested objects for complex types that genuinely have grouped fields (like BOL with shipper/consignee blocks).

Configuration reference

Component	Node type	Key settings
Classifier	TEXT_UNDERSTANDING	Prompt with enumerated types, constrained response schema
Router	Condition	Branch per document type, default branch for unknown types
Extractor (per type)	TEXT_EXTRACTION (Extract Data from File)	Type-specific prompt, tailored response schema, extraction strategy
Merge	Merge / End node	Collects output from whichever branch executed

The TEXT_EXTRACTION node uses the Extract Data from File capability, which supports PDFs, DOCX, XLSX, PPTX, and image formats (JPG, PNG, TIFF). Images are automatically converted to PDF before processing via OCR.

Scaling to many document types

This pattern scales well because adding a new document type requires only:

Adding the new type to the classification prompt’s enumerated list
Adding a new branch in the Condition node
Adding a new TEXT_EXTRACTION node with the type-specific prompt and schema

No existing branches are modified. This makes the pattern suitable for domains with dozens of document types.

Domain	Example document types
Logistics	BOL, invoice, rate confirmation, lumper receipt, fuel receipt, proof of delivery, customs declaration
Mortgage	Product sheets, income statements, bank statements, tax returns, appraisal reports, title documents, regulatory disclosures
Insurance	Claims forms, medical records, police reports, repair estimates, coverage declarations

Variations

Parallel extraction

Instead of classifying first, run extraction for all document types simultaneously and pick the result with the highest confidence. This trades compute cost for lower latency and avoids classification errors.

Parallel extraction works best when you have a small number of document types (under 5). Beyond that, the cost of running every extractor on every document becomes impractical.

Hierarchical classification

For large document sets, classify in two stages: first into a broad category (financial, shipping, legal), then into a specific type within that category. This reduces the number of options the classifier evaluates at each stage.

Document
  │
  ▼
TEXT_UNDERSTANDING (broad category)
  │
  ├──► TEXT_UNDERSTANDING (financial subtypes) ──► Condition ──► Extractors
  ├──► TEXT_UNDERSTANDING (shipping subtypes)  ──► Condition ──► Extractors
  └──► TEXT_UNDERSTANDING (legal subtypes)     ──► Condition ──► Extractors

Confidence fallback

Add a confidence threshold to the classification step. If the classifier returns a confidence below the threshold, route the document to a manual classification queue instead of risking an incorrect extraction.

Condition logic:
  confidence >= 0.8  → route to type-specific extractor
  confidence < 0.8   → route to manual classification

Start with a confidence threshold of 0.8 and adjust based on your observed accuracy. Track classification accuracy over time to identify document types that need prompt refinement.

Patterns overview

All available AI patterns and how to combine them

AI node types

Reference for TEXT_UNDERSTANDING, TEXT_EXTRACTION, and other node types

Extract Data from File

Configuration guide for the Extract Data from File node

AI comparison and reconciliation

Compare extracted data against system-of-record values

Config-time agents

Agent Builder

Using agents

Tutorials

AI Patterns

Fan-out extraction

When to use

Architecture

Phase 1: Classification

Phase 2: Type-specific extraction

Implementation

Step 1: Configure the classification node

Step 2: Add the Condition node

Step 3: Configure type-specific extraction nodes

Configuration reference

Scaling to many document types

Variations

Parallel extraction

Hierarchical classification

Confidence fallback

Patterns overview

AI node types

Extract Data from File

AI comparison and reconciliation

Config-time agents

Agent Builder

Using agents

Tutorials

AI Patterns

Documentation Index

​When to use

​Architecture

​Phase 1: Classification

​Phase 2: Type-specific extraction

​Implementation

​Step 1: Configure the classification node

​Step 2: Add the Condition node

​Step 3: Configure type-specific extraction nodes

​Configuration reference

​Scaling to many document types

​Variations

​Parallel extraction

​Hierarchical classification

​Confidence fallback

​Related resources

Patterns overview

AI node types

Extract Data from File

AI comparison and reconciliation

When to use

Architecture

Phase 1: Classification

Phase 2: Type-specific extraction

Implementation

Step 1: Configure the classification node

Step 2: Add the Condition node

Step 3: Configure type-specific extraction nodes

Configuration reference

Scaling to many document types

Variations

Parallel extraction

Hierarchical classification

Confidence fallback

Related resources