Build an AI-powered document verification pipeline that classifies uploaded documents, extracts structured data, and reconciles it against application data.
Use this file to discover all available pages before exploring further.
PreviewAgent Builder is currently in preview and may change before general availability.
In this tutorial, you build a document processing pipeline that verifies customer onboarding documents against application data. The pipeline receives uploaded files (ID card, proof of address, salary slip), classifies each document, extracts structured data, compares it to what the applicant declared, and routes discrepancies to a human reviewer.What you will build:
A file upload UI that accepts multiple documents
A document classification workflow that identifies each document type using AI
A fan-out extraction pipeline that routes each type to a specialized extractor
An AI reconciliation step that compares extracted data against application data
Business rules that flag mismatches (name, address, income)
A human review task for documents with discrepancies
A summary generation step that produces a verification report
Define the following data model keys in your process. These keys hold the application data submitted by the customer and the results produced by the AI pipeline.
Step 1: Build the classification and extraction workflow
Create a workflow named classifyAndExtract. This workflow receives a single document file path, classifies the document type, and then routes to the appropriate extraction branch.This implements the fan-out extraction pattern.
Add a Text Understanding node as the first node after Start Flow. This node reads the document content and classifies it.Operation Prompt:
You are a document classifier for a bank's customer onboarding process.Analyze the provided document and determine its type.Respond with exactly one of the following values:- ID_CARD- PROOF_OF_ADDRESS- SALARY_SLIP- UNKNOWNClassification rules:- ID_CARD: Government-issued identification documents (passport, national ID, driver's license). Contains photo, name, date of birth, document number.- PROOF_OF_ADDRESS: Utility bills, bank statements, or government letters showing a name and residential address. Must be dated within the last 3 months.- SALARY_SLIP: Employment payslips or salary certificates showing employer name, employee name, gross/net salary, and pay period.- UNKNOWN: Document does not match any of the above categories.Base your classification on the document layout, headers, field labels,and content structure. If uncertain, return UNKNOWN.
Response schema:
{ "type": "object", "properties": { "document_type": { "type": "string", "enum": ["ID_CARD", "PROOF_OF_ADDRESS", "SALARY_SLIP", "UNKNOWN"] }, "confidence": { "type": "number", "description": "Classification confidence between 0 and 1" }, "reasoning": { "type": "string", "description": "Brief explanation of why this type was chosen" } }, "required": ["document_type", "confidence"]}
The Else branch handles UNKNOWN documents. Use a Script node to return a structured error so the parent process can flag the document for manual classification.
Each branch contains an Extract Data from File node with a prompt and schema tailored to that document type.
ID card
Proof of address
Salary slip
Extraction Strategy: LLM Model (handles varied ID layouts, photos, and security features)Operation Prompt:
Extract all personal identification fields from this ID document.The document may be a passport, national ID card, or driver's license.If a field is not present or not legible, return null for that field.
Extraction Strategy: OCR Engine (utility bills and bank statements are typically clean scans)Operation Prompt:
Extract the resident's name, full address, document date, and issuingorganization from this proof of address document. The document may be autility bill, bank statement, or government letter.If a field is not present, return null for that field.
Extraction Strategy: LLM Model (salary slips have complex tabular layouts with deductions)Operation Prompt:
Extract salary and employment details from this payslip document.Capture the employee name, employer, pay period, and all salarycomponents (gross, deductions, net).If a field is not present, return null for that field.
Choose the extraction strategy based on the document characteristics. LLM Model provides the highest accuracy for complex layouts but costs more per page. See Extract Data from File for strategy comparison.
Create a workflow named reconcileData. This workflow compares the extracted document data against the applicant’s declared data.This implements the AI comparison and reconciliation pattern.
Add a Text Understanding node that receives both the extracted data and the application data.Operation Prompt:
You are a document verification agent for a bank's customer onboardingprocess. Compare the AI-extracted document data against the applicant'sdeclared data and produce a structured exception report.Instructions:1. Compare each field individually. Use fuzzy matching for names (e.g., "John Smith" vs "JOHN SMITH" is a MATCH, "Jon Smith" vs "John Smith" is a WARNING).2. For addresses, compare at the component level (street, city, postal code). Minor formatting differences are acceptable.3. For dates, normalize to YYYY-MM-DD before comparing.4. For income, flag if the extracted net salary differs from the declared monthly income by more than 10%.5. Compute an overall match rate as a percentage (0-100).6. Assign a confidence score (0-100) reflecting how certain you are in the comparison results.7. Flag each exception with a severity: - CRITICAL: Identity mismatch (different person), expired document - WARNING: Minor name variation, address component mismatch, income difference 10-25% - INFO: Formatting differences, abbreviationsExtracted document data:${extraction.classifiedDocs}Applicant's declared data:${applicant}
Create a workflow named generateSummary with a single Text Generation node.Operation Prompt:
You are a compliance documentation assistant. Generate a documentverification summary report based on the reconciliation results.Structure the report as follows:1. VERIFICATION OVERVIEW - Applicant name - Number of documents processed - Overall match rate - Verification status (Approved / Review Required / Rejected)2. DOCUMENT DETAILS For each document: - Document type and classification confidence - Fields extracted - Match/mismatch status per field3. EXCEPTIONS - List all exceptions with severity and description - Highlight any CRITICAL issues4. RECOMMENDATION - Clear recommendation based on the findings - Specific follow-up actions if neededUse professional, concise language. Format the report in Markdown.Applicant data:${applicant}Extraction results:${extraction.classifiedDocs}Reconciliation results:${reconciliation}Reviewer notes (if any):${review.reviewerNotes}
Create a process named documentVerify that orchestrates the full pipeline using the workflows you built.
1
Add a User Task for file upload
Add a User Task node after the Start Event. This task presents the file upload UI to the user.Configure the task with:
Task name:Upload documents
Assignment: Assigned to the initiating user
Attach a UI Flow (built in Step 5) that allows uploading multiple files.
2
Loop through uploaded documents
For each uploaded document, trigger the classifyAndExtract workflow. Add a Send Message Task node with a Start Integration Workflow action.Input mapping:
Add a Receive Message Task node to capture the extraction output. Set the Result Key to extraction.classifiedDocs[index].
For multiple documents, repeat the Send/Receive pattern for each file, or use a loop structure with an exclusive gateway that iterates until all files are processed.
3
Trigger the reconciliation workflow
Add another Send Message Task with a Start Integration Workflow action pointing to the reconcileData workflow.Input mapping:
Add a Receive Message Task node. Set the Result Key to reconciliation.
4
Add a business rule for validation
Add a Business Rule action (JavaScript) to perform deterministic validation checks that supplement the AI reconciliation.
// Check if any CRITICAL exceptions existvar hasCritical = output.reconciliation.exceptions.some(function(e) { return e.severity === "CRITICAL";});// Check if ID document is expiredvar idDoc = output.extraction.classifiedDocs.find(function(d) { return d.documentType === "ID_CARD";});var isExpired = false;if (idDoc && idDoc.extractedData.expiry_date) { var expiryDate = new Date(idDoc.extractedData.expiry_date); isExpired = expiryDate < new Date();}// Check income discrepancyvar salaryDoc = output.extraction.classifiedDocs.find(function(d) { return d.documentType === "SALARY_SLIP";});var incomeDiscrepancy = false;if (salaryDoc && salaryDoc.extractedData.net_salary) { var declared = output.applicant.monthlyIncome; var extracted = salaryDoc.extractedData.net_salary; var diff = Math.abs(declared - extracted) / declared; incomeDiscrepancy = diff > 0.1; // More than 10% difference}// Check all required document types are presentvar docTypes = output.extraction.classifiedDocs.map(function(d) { return d.documentType;});var missingId = docTypes.indexOf("ID_CARD") === -1;var missingAddress = docTypes.indexOf("PROOF_OF_ADDRESS") === -1;var missingSalary = docTypes.indexOf("SALARY_SLIP") === -1;// Set validation resultoutput.validation = { hasCriticalExceptions: hasCritical, isIdExpired: isExpired, incomeDiscrepancy: incomeDiscrepancy, missingDocuments: { idCard: missingId, proofOfAddress: missingAddress, salarySlip: missingSalary }, requiresReview: hasCritical || isExpired || incomeDiscrepancy || missingId || missingAddress || missingSalary};
Business rules provide deterministic, auditable checks. Use them alongside AI reconciliation to catch issues the LLM might miss, such as expired documents or missing required document types.
5
Add the routing gateway
Add an Exclusive Gateway after the business rule. Configure two branches:
Branch
Condition
Target
Auto-approve
validation.requiresReview == false AND reconciliation.matchRate >= 90
Proceed to summary generation
Human review
(default)
Human review User Task
6
Add the human review task
Add a User Task node for manual review. The reviewer sees:
Uploaded documents (viewable in a document viewer)
Extracted data side-by-side with declared data
The exception report from reconciliation
Validation flags from the business rule
The reviewer submits a decision:
Approve — continue to summary
Reject — end process with rejection status
Request re-upload — loop back to the upload step
Store the decision in review.reviewerDecision and any notes in review.reviewerNotes.
7
Trigger the summary generation workflow
After both the auto-approve and human-review-approve paths converge, add a Send Message Task to trigger the generateSummary workflow.Input mapping:
Go to UI Flows in the project sidebar and create a new UI Flow named documentUpload.
2
Add an upload component
Add a File Upload component to the page. Configure it to:
Accept multiple files
Restrict file types to PDF, JPG, PNG
Map uploaded file paths to documents.uploadedFiles
3
Add applicant data display
Add form fields (read-only) that display the applicant’s declared data from applicant. This gives context to the person uploading documents.
4
Add a submit button
Add a Button component labeled Submit documents. Configure it to save the data and advance the User Task.
For the human review step, create a second UI Flow page that displays the extracted data, reconciliation results, and exception report alongside the original documents. Use a side-by-side layout so the reviewer can compare easily.
Create a second page in the UI Flow for the human review task.The review page should include:
Section
Data source
Component
Applicant info
applicant
Read-only form fields
Uploaded documents
documents.uploadedFiles
Document viewer
Extraction results
extraction.classifiedDocs
Data table
Reconciliation report
reconciliation.fieldResults
Data table with status badges
Exceptions
reconciliation.exceptions
List with severity highlighting
Validation flags
validation
Alert components for each flag
Decision
review.reviewerDecision
Radio buttons (Approve / Reject / Request re-upload)
Notes
review.reviewerNotes
Text area
Use conditional visibility to highlight rows with MISMATCH or CRITICAL status in the reconciliation table. This draws the reviewer’s attention to the issues that need their judgment.
Open the classifyAndExtract workflow and use Run Workflow with a test file. Upload sample documents one at a time and verify the classification output.
Test document
Expected type
Expected confidence
Scanned passport
ID_CARD
> 0.9
Electricity bill PDF
PROOF_OF_ADDRESS
> 0.9
Monthly payslip
SALARY_SLIP
> 0.9
Random brochure
UNKNOWN
< 0.5
2
Test extraction accuracy
For each document type, compare the extracted fields against the actual document content. Check that:
Names are captured correctly (including accented characters)
Dates are in the expected YYYY-MM-DD format
Numeric values (salary, postal code) are accurate
Null is returned for missing fields (not hallucinated values)
In this tutorial, you built a document processing pipeline that demonstrates several key patterns:
Fan-out extraction — classifying documents by type and routing each to a specialized extraction node with tailored prompts and schemas
AI reconciliation — comparing AI-extracted data against application data with structured exception reports
Hybrid AI + business rules — combining AI-driven comparison with deterministic validation (expired documents, missing types, income thresholds)
Human-in-the-loop — routing edge cases to a reviewer while auto-approving clean results
Workflow composition — building modular workflows for classification, reconciliation, and summary generation, then orchestrating them from a BPMN process