Document processing pipeline

PreviewAgent Builder is currently in preview and may change before general availability.

In this tutorial, you build a document processing pipeline that verifies customer onboarding documents against application data. The pipeline receives uploaded files (ID card, proof of address, salary slip), classifies each document, extracts structured data, compares it to what the applicant declared, and routes discrepancies to a human reviewer. What you will build:

A file upload UI that accepts multiple documents
A document classification workflow that identifies each document type using AI
A fan-out extraction pipeline that routes each type to a specialized extractor
An AI reconciliation step that compares extracted data against application data
Business rules that flag mismatches (name, address, income)
A human review task for documents with discrepancies
A summary generation step that produces a verification report

AI node types used: Text Understanding, Extract Data from File, Text Generation Patterns demonstrated: Fan-out extraction, AI comparison and reconciliation

Architecture overview

The pipeline processes documents in four phases: upload, classify and extract, reconcile, and review. Workflow breakdown:

Workflow	AI nodes	Purpose
`classifyAndExtract`	Text Understanding + Extract Data from File	Classify document type, then extract fields using type-specific prompts
`reconcileData`	Text Understanding	Compare extracted fields against application data
`generateSummary`	Text Generation	Produce a human-readable verification report

Prerequisites

Before starting, make sure you have:

Access to a FlowX Designer workspace with AI Platform enabled
Familiarity with creating processes, workflows, and UI flows in FlowX
A project with the Documents Plugin configured (for file uploads)

Data model

Define the following data model keys in your process. These keys hold the application data submitted by the customer and the results produced by the AI pipeline.

{
  "applicant": {
    "firstName": "string",
    "lastName": "string",
    "dateOfBirth": "string",
    "address": {
      "street": "string",
      "city": "string",
      "postalCode": "string",
      "country": "string"
    },
    "monthlyIncome": "number",
    "employer": "string"
  },
  "documents": {
    "uploadedFiles": [
      {
        "fileId": "string",
        "filePath": "string",
        "fileName": "string"
      }
    ]
  },
  "extraction": {
    "classifiedDocs": [
      {
        "fileId": "string",
        "documentType": "string",
        "confidence": "number",
        "extractedData": "object"
      }
    ]
  },
  "reconciliation": {
    "matchRate": "number",
    "fieldResults": "array",
    "exceptions": "array",
    "overallStatus": "string"
  },
  "review": {
    "reviewerDecision": "string",
    "reviewerNotes": "string"
  },
  "summary": {
    "report": "string"
  }
}

Define these keys under your process data model before building the workflows. The AI nodes and business rules reference these paths at runtime.

Step 1: Build the classification and extraction workflow

Create a workflow named classifyAndExtract. This workflow receives a single document file path, classifies the document type, and then routes to the appropriate extraction branch. This implements the fan-out extraction pattern.

1.1 Add the classification node

Add a Text Understanding node as the first node after Start Flow. This node reads the document content and classifies it. Operation Prompt:

You are a document classifier for a bank's customer onboarding process.
Analyze the provided document and determine its type.

Respond with exactly one of the following values:
- ID_CARD
- PROOF_OF_ADDRESS
- SALARY_SLIP
- UNKNOWN

Classification rules:
- ID_CARD: Government-issued identification documents (passport, national ID,
  driver's license). Contains photo, name, date of birth, document number.
- PROOF_OF_ADDRESS: Utility bills, bank statements, or government letters
  showing a name and residential address. Must be dated within the last
  3 months.
- SALARY_SLIP: Employment payslips or salary certificates showing employer
  name, employee name, gross/net salary, and pay period.
- UNKNOWN: Document does not match any of the above categories.

Base your classification on the document layout, headers, field labels,
and content structure. If uncertain, return UNKNOWN.

Response schema:

{
  "type": "object",
  "properties": {
    "document_type": {
      "type": "string",
      "enum": ["ID_CARD", "PROOF_OF_ADDRESS", "SALARY_SLIP", "UNKNOWN"]
    },
    "confidence": {
      "type": "number",
      "description": "Classification confidence between 0 and 1"
    },
    "reasoning": {
      "type": "string",
      "description": "Brief explanation of why this type was chosen"
    }
  },
  "required": ["document_type", "confidence"]
}

Response Key: classificationResult

1.2 Add the Condition node

Add a Condition node after the Text Understanding node. Configure branches based on the document_type value:

Branch	Condition (Python)	Target
If	`input["classificationResult"]["document_type"] == "ID_CARD"`	Extract Data from File (ID card)
Else if	`input["classificationResult"]["document_type"] == "PROOF_OF_ADDRESS"`	Extract Data from File (proof of address)
Else if	`input["classificationResult"]["document_type"] == "SALARY_SLIP"`	Extract Data from File (salary slip)
Else	(default)	Script node (unknown document)

The Else branch handles UNKNOWN documents. Use a Script node to return a structured error so the parent process can flag the document for manual classification.

1.3 Configure type-specific extraction nodes

Each branch contains an Extract Data from File node with a prompt and schema tailored to that document type.

ID card
Proof of address
Salary slip

Extraction Strategy: LLM Model (handles varied ID layouts, photos, and security features)Operation Prompt:

Extract all personal identification fields from this ID document.
The document may be a passport, national ID card, or driver's license.
If a field is not present or not legible, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "document_number": { "type": "string" },
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "date_of_birth": {
      "type": "string",
      "description": "Format: YYYY-MM-DD"
    },
    "nationality": { "type": "string" },
    "expiry_date": {
      "type": "string",
      "description": "Format: YYYY-MM-DD"
    },
    "issuing_authority": { "type": "string" },
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "postal_code": { "type": "string" },
        "country": { "type": "string" }
      }
    }
  },
  "required": ["first_name", "last_name", "date_of_birth"]
}

Response Key: extractedData

Extraction Strategy: OCR Engine (utility bills and bank statements are typically clean scans)Operation Prompt:

Extract the resident's name, full address, document date, and issuing
organization from this proof of address document. The document may be a
utility bill, bank statement, or government letter.
If a field is not present, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "resident_name": { "type": "string" },
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "postal_code": { "type": "string" },
        "country": { "type": "string" }
      }
    },
    "document_date": {
      "type": "string",
      "description": "Format: YYYY-MM-DD"
    },
    "issuing_organization": { "type": "string" },
    "document_type_detail": {
      "type": "string",
      "description": "e.g., electricity bill, water bill, bank statement"
    }
  },
  "required": ["resident_name", "address"]
}

Response Key: extractedData

Extraction Strategy: LLM Model (salary slips have complex tabular layouts with deductions)Operation Prompt:

Extract salary and employment details from this payslip document.
Capture the employee name, employer, pay period, and all salary
components (gross, deductions, net).
If a field is not present, return null for that field.

Response schema:

{
  "type": "object",
  "properties": {
    "employee_name": { "type": "string" },
    "employer_name": { "type": "string" },
    "pay_period": {
      "type": "string",
      "description": "e.g., January 2025, 2025-01"
    },
    "gross_salary": { "type": "number" },
    "net_salary": { "type": "number" },
    "deductions": {
      "type": "object",
      "properties": {
        "tax": { "type": "number" },
        "social_security": { "type": "number" },
        "health_insurance": { "type": "number" },
        "other": { "type": "number" }
      }
    },
    "currency": { "type": "string" },
    "employee_id": { "type": "string" }
  },
  "required": ["employee_name", "employer_name", "net_salary"]
}

Response Key: extractedData

Choose the extraction strategy based on the document characteristics. LLM Model provides the highest accuracy for complex layouts but costs more per page. See Extract Data from File for strategy comparison.

1.4 Add the End Flow node

Add an End Flow node where all branches converge. Set the body to pass results back to the parent process:

{
  "output": {
    "documentType": "${classificationResult.document_type}",
    "confidence": "${classificationResult.confidence}",
    "extractedData": ${extractedData}
  }
}

Step 2: Build the reconciliation workflow

Create a workflow named reconcileData. This workflow compares the extracted document data against the applicant’s declared data. This implements the AI comparison and reconciliation pattern.

2.1 Add the comparison node

Add a Text Understanding node that receives both the extracted data and the application data. Operation Prompt:

You are a document verification agent for a bank's customer onboarding
process. Compare the AI-extracted document data against the applicant's
declared data and produce a structured exception report.

Instructions:
1. Compare each field individually. Use fuzzy matching for names
   (e.g., "John Smith" vs "JOHN SMITH" is a MATCH, "Jon Smith" vs
   "John Smith" is a WARNING).
2. For addresses, compare at the component level (street, city,
   postal code). Minor formatting differences are acceptable.
3. For dates, normalize to YYYY-MM-DD before comparing.
4. For income, flag if the extracted net salary differs from the
   declared monthly income by more than 10%.
5. Compute an overall match rate as a percentage (0-100).
6. Assign a confidence score (0-100) reflecting how certain you are
   in the comparison results.
7. Flag each exception with a severity:
   - CRITICAL: Identity mismatch (different person), expired document
   - WARNING: Minor name variation, address component mismatch,
     income difference 10-25%
   - INFO: Formatting differences, abbreviations

Extracted document data:
${extraction.classifiedDocs}

Applicant's declared data:
${applicant}

Response schema:

{
  "type": "object",
  "properties": {
    "matchRate": {
      "type": "number",
      "description": "Overall match rate from 0 to 100"
    },
    "confidenceScore": {
      "type": "number",
      "description": "Confidence in comparison accuracy from 0 to 100"
    },
    "fieldResults": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "fieldName": { "type": "string" },
          "documentType": { "type": "string" },
          "extractedValue": { "type": "string" },
          "declaredValue": { "type": "string" },
          "status": {
            "type": "string",
            "enum": ["MATCH", "MISMATCH", "MISSING"]
          },
          "severity": {
            "type": "string",
            "enum": ["CRITICAL", "WARNING", "INFO"]
          },
          "note": { "type": "string" }
        },
        "required": ["fieldName", "status", "severity"]
      }
    },
    "exceptions": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "fieldName": { "type": "string" },
          "description": { "type": "string" },
          "severity": {
            "type": "string",
            "enum": ["CRITICAL", "WARNING", "INFO"]
          }
        },
        "required": ["fieldName", "description", "severity"]
      }
    },
    "overallStatus": {
      "type": "string",
      "enum": ["APPROVED", "REVIEW_REQUIRED", "REJECTED"],
      "description": "Recommended action based on match rate and exceptions"
    }
  },
  "required": ["matchRate", "confidenceScore", "fieldResults",
    "exceptions", "overallStatus"]
}

Response Key: reconciliationResult

2.2 Add the End Flow node

{
  "output": ${reconciliationResult}
}

Step 3: Build the summary generation workflow

Create a workflow named generateSummary with a single Text Generation node. Operation Prompt:

You are a compliance documentation assistant. Generate a document
verification summary report based on the reconciliation results.

Structure the report as follows:

1. VERIFICATION OVERVIEW
   - Applicant name
   - Number of documents processed
   - Overall match rate
   - Verification status (Approved / Review Required / Rejected)

2. DOCUMENT DETAILS
   For each document:
   - Document type and classification confidence
   - Fields extracted
   - Match/mismatch status per field

3. EXCEPTIONS
   - List all exceptions with severity and description
   - Highlight any CRITICAL issues

4. RECOMMENDATION
   - Clear recommendation based on the findings
   - Specific follow-up actions if needed

Use professional, concise language. Format the report in Markdown.

Applicant data:
${applicant}

Extraction results:
${extraction.classifiedDocs}

Reconciliation results:
${reconciliation}

Reviewer notes (if any):
${review.reviewerNotes}

Response Key: summaryReport End Flow body:

{
  "output": {
    "report": "${summaryReport}"
  }
}

Step 4: Build the BPMN process

Create a process named documentVerify that orchestrates the full pipeline using the workflows you built.

Add a User Task for file upload

Add a User Task node after the Start Event. This task presents the file upload UI to the user.Configure the task with:

Task name: Upload documents
Assignment: Assigned to the initiating user

Attach a UI Flow (built in Step 5) that allows uploading multiple files.

Loop through uploaded documents

For each uploaded document, trigger the classifyAndExtract workflow. Add a Send Message Task node with a Start Integration Workflow action.Input mapping:

{
  "filePath": "${documents.uploadedFiles[index].filePath}",
  "fileId": "${documents.uploadedFiles[index].fileId}"
}

Add a Receive Message Task node to capture the extraction output. Set the Result Key to extraction.classifiedDocs[index].

For multiple documents, repeat the Send/Receive pattern for each file, or use a loop structure with an exclusive gateway that iterates until all files are processed.

Trigger the reconciliation workflow

Add another Send Message Task with a Start Integration Workflow action pointing to the reconcileData workflow.Input mapping:

{
  "classifiedDocs": "${extraction.classifiedDocs}",
  "applicant": "${applicant}"
}

Add a Receive Message Task node. Set the Result Key to reconciliation.

Add a business rule for validation

Add a Business Rule action (JavaScript) to perform deterministic validation checks that supplement the AI reconciliation.

// Check if any CRITICAL exceptions exist
var hasCritical = output.reconciliation.exceptions.some(function(e) {
  return e.severity === "CRITICAL";
});

// Check if ID document is expired
var idDoc = output.extraction.classifiedDocs.find(function(d) {
  return d.documentType === "ID_CARD";
});

var isExpired = false;
if (idDoc && idDoc.extractedData.expiry_date) {
  var expiryDate = new Date(idDoc.extractedData.expiry_date);
  isExpired = expiryDate < new Date();
}

// Check income discrepancy
var salaryDoc = output.extraction.classifiedDocs.find(function(d) {
  return d.documentType === "SALARY_SLIP";
});

var incomeDiscrepancy = false;
if (salaryDoc && salaryDoc.extractedData.net_salary) {
  var declared = output.applicant.monthlyIncome;
  var extracted = salaryDoc.extractedData.net_salary;
  var diff = Math.abs(declared - extracted) / declared;
  incomeDiscrepancy = diff > 0.1; // More than 10% difference
}

// Check all required document types are present
var docTypes = output.extraction.classifiedDocs.map(function(d) {
  return d.documentType;
});
var missingId = docTypes.indexOf("ID_CARD") === -1;
var missingAddress = docTypes.indexOf("PROOF_OF_ADDRESS") === -1;
var missingSalary = docTypes.indexOf("SALARY_SLIP") === -1;

// Set validation result
output.validation = {
  hasCriticalExceptions: hasCritical,
  isIdExpired: isExpired,
  incomeDiscrepancy: incomeDiscrepancy,
  missingDocuments: {
    idCard: missingId,
    proofOfAddress: missingAddress,
    salarySlip: missingSalary
  },
  requiresReview: hasCritical || isExpired || incomeDiscrepancy
    || missingId || missingAddress || missingSalary
};

Business rules provide deterministic, auditable checks. Use them alongside AI reconciliation to catch issues the LLM might miss, such as expired documents or missing required document types.

Add the routing gateway

Add an Exclusive Gateway after the business rule. Configure two branches:

Branch	Condition	Target
Auto-approve	`validation.requiresReview == false` AND `reconciliation.matchRate >= 90`	Proceed to summary generation
Human review	(default)	Human review User Task

Add the human review task

Add a User Task node for manual review. The reviewer sees:

Uploaded documents (viewable in a document viewer)
Extracted data side-by-side with declared data
The exception report from reconciliation
Validation flags from the business rule

The reviewer submits a decision:

Approve — continue to summary
Reject — end process with rejection status
Request re-upload — loop back to the upload step

Store the decision in review.reviewerDecision and any notes in review.reviewerNotes.

Trigger the summary generation workflow

After both the auto-approve and human-review-approve paths converge, add a Send Message Task to trigger the generateSummary workflow.Input mapping:

{
  "applicant": "${applicant}",
  "classifiedDocs": "${extraction.classifiedDocs}",
  "reconciliation": "${reconciliation}",
  "reviewerNotes": "${review.reviewerNotes}"
}

Add a Receive Message Task. Set the Result Key to summary.

Add the End Event

Add an End Event after the summary is received. The process instance now contains the full verification report at summary.report.

Step 5: Build the upload UI

Create a UI Flow with a page for document upload.

Create the UI Flow

Go to UI Flows in the project sidebar and create a new UI Flow named documentUpload.

Add an upload component

Add a File Upload component to the page. Configure it to:

Accept multiple files
Restrict file types to PDF, JPG, PNG
Map uploaded file paths to documents.uploadedFiles

Add applicant data display

Add form fields (read-only) that display the applicant’s declared data from applicant. This gives context to the person uploading documents.

Add a submit button

Add a Button component labeled Submit documents. Configure it to save the data and advance the User Task.

For the human review step, create a second UI Flow page that displays the extracted data, reconciliation results, and exception report alongside the original documents. Use a side-by-side layout so the reviewer can compare easily.

Step 6: Build the review UI

Create a second page in the UI Flow for the human review task. The review page should include:

Section	Data source	Component
Applicant info	`applicant`	Read-only form fields
Uploaded documents	`documents.uploadedFiles`	Document viewer
Extraction results	`extraction.classifiedDocs`	Data table
Reconciliation report	`reconciliation.fieldResults`	Data table with status badges
Exceptions	`reconciliation.exceptions`	List with severity highlighting
Validation flags	`validation`	Alert components for each flag
Decision	`review.reviewerDecision`	Radio buttons (Approve / Reject / Request re-upload)
Notes	`review.reviewerNotes`	Text area

Use conditional visibility to highlight rows with MISMATCH or CRITICAL status in the reconciliation table. This draws the reviewer’s attention to the issues that need their judgment.

Testing

Test classification in isolation

Open the classifyAndExtract workflow and use Run Workflow with a test file. Upload sample documents one at a time and verify the classification output.

Test document	Expected type	Expected confidence
Scanned passport	`ID_CARD`	> 0.9
Electricity bill PDF	`PROOF_OF_ADDRESS`	> 0.9
Monthly payslip	`SALARY_SLIP`	> 0.9
Random brochure	`UNKNOWN`	< 0.5

Test extraction accuracy

For each document type, compare the extracted fields against the actual document content. Check that:

Names are captured correctly (including accented characters)
Dates are in the expected YYYY-MM-DD format
Numeric values (salary, postal code) are accurate
Null is returned for missing fields (not hallucinated values)

Test reconciliation with known mismatches

Prepare test data with deliberate mismatches:

{
  "applicant": {
    "firstName": "John",
    "lastName": "Smith",
    "monthlyIncome": 5000
  }
}

Upload an ID card with the name “Jonathan Smith” and a salary slip showing a net salary of 4200. Verify the reconciliation output flags:

Name variation as WARNING
Income discrepancy (16%) as WARNING

Test the business rule

Verify the JavaScript business rule catches:

Expired ID documents
Missing required document types
Income discrepancy above 10%
CRITICAL exceptions from reconciliation

Test edge cases: all documents valid (auto-approve path), one missing document (review path), expired ID (review path).

Test the full end-to-end flow

Run the complete documentVerify process:

Upload three documents (ID, proof of address, salary slip)
Verify classification and extraction complete
Check the reconciliation report
If routed to review, complete the reviewer task
Verify the summary report is generated

Test both the auto-approve path (all documents match) and the human review path (with discrepancies).

What you learned

In this tutorial, you built a document processing pipeline that demonstrates several key patterns:

Fan-out extraction — classifying documents by type and routing each to a specialized extraction node with tailored prompts and schemas
AI reconciliation — comparing AI-extracted data against application data with structured exception reports
Hybrid AI + business rules — combining AI-driven comparison with deterministic validation (expired documents, missing types, income thresholds)
Human-in-the-loop — routing edge cases to a reviewer while auto-approving clean results
Workflow composition — building modular workflows for classification, reconciliation, and summary generation, then orchestrating them from a BPMN process

Next steps

Fan-out extraction pattern

Scale the classification and extraction pattern to dozens of document types

AI comparison and reconciliation

Deep-dive into the reconciliation pattern with threshold tuning

Extract Data from File

Configure extraction strategies, image extraction, and signature detection

AI node types

Reference for all AI node types available in Agent Builder

Config-time agents

Agent Builder

Using agents

Tutorials

AI Patterns

Document processing pipeline

Architecture overview

Prerequisites

Data model

Step 1: Build the classification and extraction workflow

1.1 Add the classification node

1.2 Add the Condition node

1.3 Configure type-specific extraction nodes

1.4 Add the End Flow node

Step 2: Build the reconciliation workflow

2.1 Add the comparison node

2.2 Add the End Flow node

Step 3: Build the summary generation workflow

Step 4: Build the BPMN process

Step 5: Build the upload UI

Step 6: Build the review UI

Testing

What you learned

Next steps

Fan-out extraction pattern

AI comparison and reconciliation

Extract Data from File

AI node types

Config-time agents

Agent Builder

Using agents

Tutorials

AI Patterns

Documentation Index

​Architecture overview

​Prerequisites

​Data model

​Step 1: Build the classification and extraction workflow

​1.1 Add the classification node

​1.2 Add the Condition node

​1.3 Configure type-specific extraction nodes

​1.4 Add the End Flow node

​Step 2: Build the reconciliation workflow

​2.1 Add the comparison node

​2.2 Add the End Flow node

​Step 3: Build the summary generation workflow

​Step 4: Build the BPMN process

​Step 5: Build the upload UI

​Step 6: Build the review UI

​Testing

​What you learned

​Next steps

Fan-out extraction pattern

AI comparison and reconciliation

Extract Data from File

AI node types

Architecture overview

Prerequisites

Data model

Step 1: Build the classification and extraction workflow

1.1 Add the classification node

1.2 Add the Condition node

1.3 Configure type-specific extraction nodes

1.4 Add the End Flow node

Step 2: Build the reconciliation workflow

2.1 Add the comparison node

2.2 Add the End Flow node

Step 3: Build the summary generation workflow

Step 4: Build the BPMN process

Step 5: Build the upload UI

Step 6: Build the review UI

Testing

What you learned

Next steps