> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flowx.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Fan-out extraction

> Classify documents by type, then route each to a specialized extraction node with tailored prompts and schemas. Scale to dozens of document types.

<Warning>
  **Preview**

  Agent Builder is currently in preview and may change before general availability.
</Warning>

## When to use

Use fan-out extraction when your app needs to process documents that vary significantly in structure and content. A single extraction prompt cannot handle the differences between an invoice, a bill of lading, and a fuel receipt — each has different fields, layouts, and validation rules.

This pattern is the right choice when:

* You receive **mixed document types** in a single pipeline (email attachments, bulk uploads)
* Each document type has a **distinct schema** with specialized fields
* You need **high extraction accuracy** per type rather than a generic best-effort pass
* The number of document types may **grow over time** without rearchitecting the workflow

***

## Architecture

The pattern follows two phases: classification, then type-specific extraction.

```text theme={"system"}
Document
  │
  ▼
TEXT_UNDERSTANDING (classify type)
  │
  ▼
Condition (fork by type)
  ├──► TEXT_EXTRACTION (Type A) ──►─┐
  ├──► TEXT_EXTRACTION (Type B) ──►─┤
  ├──► TEXT_EXTRACTION (Type C) ──►─┤
  └──► TEXT_EXTRACTION (Type N) ──►─┘
                                    │
                                    ▼
                              Merge results
```

### Phase 1: Classification

A **TEXT\_UNDERSTANDING** node receives the document and classifies it into one of the known types. The classification prompt constrains the output to an enumerated list, so the Condition node can route deterministically.

### Phase 2: Type-specific extraction

Each branch contains a **TEXT\_EXTRACTION** node (using the Extract Data from File capability) configured with:

* A **prompt** tailored to that document type, instructing the model which fields to look for
* A **response schema** defining the exact JSON structure expected for that type
* **Extraction strategy** settings optimized for the document format (text-heavy PDFs vs. scanned images)

A merge point downstream collects results from whichever branch executed.

***

## Implementation

### Step 1: Configure the classification node

Add a **TEXT\_UNDERSTANDING** node and configure it to classify the document type.

**Example classification prompt:**

```text theme={"system"}
You are a document classifier. Analyze the provided document and determine its type.

Respond with exactly one of the following values:
- BOL
- INVOICE
- RATE_CONFIRMATION
- LUMPER_RECEIPT
- FUEL_RECEIPT

Base your classification on the document layout, headers, and field labels.
If the document does not match any known type, respond with UNKNOWN.
```

**Example response schema:**

```json theme={"system"}
{
  "type": "object",
  "properties": {
    "document_type": {
      "type": "string",
      "enum": ["BOL", "INVOICE", "RATE_CONFIRMATION", "LUMPER_RECEIPT", "FUEL_RECEIPT", "UNKNOWN"]
    },
    "confidence": {
      "type": "number",
      "description": "Classification confidence between 0 and 1"
    }
  },
  "required": ["document_type", "confidence"]
}
```

### Step 2: Add the Condition node

Add a **Condition** node after the classification node. Configure branches based on the `document_type` value:

| Branch            | Condition                              | Target                               |
| ----------------- | -------------------------------------- | ------------------------------------ |
| BOL               | `document_type == "BOL"`               | TEXT\_EXTRACTION (BOL)               |
| Invoice           | `document_type == "INVOICE"`           | TEXT\_EXTRACTION (Invoice)           |
| Rate confirmation | `document_type == "RATE_CONFIRMATION"` | TEXT\_EXTRACTION (Rate confirmation) |
| Lumper receipt    | `document_type == "LUMPER_RECEIPT"`    | TEXT\_EXTRACTION (Lumper receipt)    |
| Fuel receipt      | `document_type == "FUEL_RECEIPT"`      | TEXT\_EXTRACTION (Fuel receipt)      |
| Default           | `document_type == "UNKNOWN"`           | Manual review queue                  |

### Step 3: Configure type-specific extraction nodes

Each branch gets its own **TEXT\_EXTRACTION** node with a prompt and schema optimized for that document type. The Extract Data from File capability handles PDFs, images (via OCR), and other supported file types automatically.

<Tabs>
  <Tab title="BOL (Bill of Lading)">
    **Prompt:**

    ```text theme={"system"}
    Extract all relevant fields from this Bill of Lading document.
    Pay special attention to carrier information, shipment details, and freight charges.
    If a field is not present, return null for that field.
    ```

    **Response schema:**

    ```json theme={"system"}
    {
      "type": "object",
      "properties": {
        "bol_number": { "type": "string" },
        "carrier": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "mc_number": { "type": "string" },
            "scac_code": { "type": "string" }
          }
        },
        "shipper": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "address": { "type": "string" },
            "city": { "type": "string" },
            "state": { "type": "string" },
            "zip": { "type": "string" }
          }
        },
        "consignee": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "address": { "type": "string" },
            "city": { "type": "string" },
            "state": { "type": "string" },
            "zip": { "type": "string" }
          }
        },
        "ship_date": { "type": "string" },
        "delivery_date": { "type": "string" },
        "pieces": { "type": "integer" },
        "weight": { "type": "number" },
        "freight_charges": { "type": "number" }
      }
    }
    ```
  </Tab>

  <Tab title="Invoice">
    **Prompt:**

    ```text theme={"system"}
    Extract all fields from this invoice document.
    Capture line items as an array. Extract totals, tax amounts, and payment terms.
    If a field is not present, return null for that field.
    ```

    **Response schema:**

    ```json theme={"system"}
    {
      "type": "object",
      "properties": {
        "invoice_number": { "type": "string" },
        "invoice_date": { "type": "string" },
        "due_date": { "type": "string" },
        "vendor_name": { "type": "string" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "quantity": { "type": "number" },
              "unit_price": { "type": "number" },
              "amount": { "type": "number" }
            }
          }
        },
        "subtotal": { "type": "number" },
        "tax": { "type": "number" },
        "total": { "type": "number" },
        "payment_terms": { "type": "string" }
      }
    }
    ```
  </Tab>

  <Tab title="Rate confirmation">
    **Prompt:**

    ```text theme={"system"}
    Extract all fields from this rate confirmation document.
    Focus on load details, rate amounts, pickup and delivery information.
    If a field is not present, return null for that field.
    ```

    **Response schema:**

    ```json theme={"system"}
    {
      "type": "object",
      "properties": {
        "confirmation_number": { "type": "string" },
        "load_number": { "type": "string" },
        "broker_name": { "type": "string" },
        "carrier_name": { "type": "string" },
        "rate_amount": { "type": "number" },
        "pickup_date": { "type": "string" },
        "pickup_location": { "type": "string" },
        "delivery_date": { "type": "string" },
        "delivery_location": { "type": "string" },
        "equipment_type": { "type": "string" },
        "commodity": { "type": "string" }
      }
    }
    ```
  </Tab>
</Tabs>

<Tip>
  Keep schemas as flat as possible for simple document types (like fuel receipts and lumper receipts). Reserve nested objects for complex types that genuinely have grouped fields (like BOL with shipper/consignee blocks).
</Tip>

***

## Configuration reference

| Component                | Node type                                 | Key settings                                                        |
| ------------------------ | ----------------------------------------- | ------------------------------------------------------------------- |
| **Classifier**           | TEXT\_UNDERSTANDING                       | Prompt with enumerated types, constrained response schema           |
| **Router**               | Condition                                 | Branch per document type, default branch for unknown types          |
| **Extractor (per type)** | TEXT\_EXTRACTION (Extract Data from File) | Type-specific prompt, tailored response schema, extraction strategy |
| **Merge**                | Merge / End node                          | Collects output from whichever branch executed                      |

<Info>
  The TEXT\_EXTRACTION node uses the **Extract Data from File** capability, which supports PDFs, DOCX, XLSX, PPTX, and image formats (JPG, PNG, TIFF). Images are automatically converted to PDF before processing via OCR.
</Info>

***

## Scaling to many document types

This pattern scales well because adding a new document type requires only:

1. Adding the new type to the classification prompt's enumerated list
2. Adding a new branch in the Condition node
3. Adding a new TEXT\_EXTRACTION node with the type-specific prompt and schema

No existing branches are modified. This makes the pattern suitable for domains with dozens of document types.

| Domain        | Example document types                                                                                                      |
| ------------- | --------------------------------------------------------------------------------------------------------------------------- |
| **Logistics** | BOL, invoice, rate confirmation, lumper receipt, fuel receipt, proof of delivery, customs declaration                       |
| **Mortgage**  | Product sheets, income statements, bank statements, tax returns, appraisal reports, title documents, regulatory disclosures |
| **Insurance** | Claims forms, medical records, police reports, repair estimates, coverage declarations                                      |

***

## Variations

### Parallel extraction

Instead of classifying first, run extraction for all document types simultaneously and pick the result with the highest confidence. This trades compute cost for lower latency and avoids classification errors.

<Info>
  Parallel extraction works best when you have a small number of document types (under 5). Beyond that, the cost of running every extractor on every document becomes impractical.
</Info>

### Hierarchical classification

For large document sets, classify in two stages: first into a broad category (financial, shipping, legal), then into a specific type within that category. This reduces the number of options the classifier evaluates at each stage.

```text theme={"system"}
Document
  │
  ▼
TEXT_UNDERSTANDING (broad category)
  │
  ├──► TEXT_UNDERSTANDING (financial subtypes) ──► Condition ──► Extractors
  ├──► TEXT_UNDERSTANDING (shipping subtypes)  ──► Condition ──► Extractors
  └──► TEXT_UNDERSTANDING (legal subtypes)     ──► Condition ──► Extractors
```

### Confidence fallback

Add a confidence threshold to the classification step. If the classifier returns a confidence below the threshold, route the document to a manual classification queue instead of risking an incorrect extraction.

```text theme={"system"}
Condition logic:
  confidence >= 0.8  → route to type-specific extractor
  confidence < 0.8   → route to manual classification
```

<Tip>
  Start with a confidence threshold of 0.8 and adjust based on your observed accuracy. Track classification accuracy over time to identify document types that need prompt refinement.
</Tip>

***

## Related resources

<CardGroup cols={2}>
  <Card title="Patterns overview" icon="puzzle-piece" href="./overview">
    All available AI patterns and how to combine them
  </Card>

  <Card title="AI node types" icon="diagram-project" href="../agent-builder/node-types">
    Reference for TEXT\_UNDERSTANDING, TEXT\_EXTRACTION, and other node types
  </Card>

  <Card title="Extract Data from File" icon="file-lines" href="../agent-builder/extract-data-from-file">
    Configuration guide for the Extract Data from File node
  </Card>

  <Card title="AI comparison and reconciliation" icon="code-compare" href="./ai-comparison-reconciliation">
    Compare extracted data against system-of-record values
  </Card>
</CardGroup>
