The Two-Agent Pattern for Production Document AI

When teams first build document AI systems, they typically follow one of two patterns:

Pattern A: Fully Deterministic A rigid pipeline with hardcoded rules. Reliable and auditable, but can’t adapt to new document formats or learn from mistakes.

Pattern B: Fully Autonomous AI An LLM agent that handles everything end-to-end. Flexible and adaptive, but unpredictable — different results on different runs, no audit trail, and hard to debug.

We’ve found that production document AI needs both — a deterministic runtime for reliability and an intelligent agent for improvement. We call this the Two-Agent Pattern.

Agent 1: The Document Processing Agent

The Document Processing Agent is the runtime engine. It processes documents through a fixed pipeline:

Upload → OCR → Split → Classify → Extract → Validate → Output

Key characteristics:

Deterministic — same input produces same output
Fast — optimized for throughput, processes documents in seconds
Auditable — every step produces a trace with confidence scores and source references
Stateless — no learning happens during processing

This agent uses AI models (GPT-4o, Claude, etc.) for classification and extraction, but wraps them in a deterministic framework:

The prompt is generated from a fixed template + document-type schema
The model’s response is parsed against a strict JSON schema
Business rules validate the extracted data
If validation fails, a retry is triggered with modified instructions
The final output includes confidence scores and source text references

The Document Processing Agent is not a chatbot or a general-purpose AI. It’s a deterministic pipeline that uses AI as a component — not as the orchestrator.

Agent 2: The Supervisor Agent

The Supervisor Agent operates offline — it doesn’t process documents directly. Instead, it:

Analyzes processing results — reviews extracted data, confidence scores, and validation failures
Identifies patterns — finds systematic errors across documents (e.g., “the model consistently misreads the ‘Policy Effective Date’ on ACORD 125 forms”)
Suggests improvements — proposes prompt modifications, field description changes, or business rule additions
Learns from corrections — when a human corrects an extraction error, the Supervisor incorporates that feedback

How the Supervisor Improves Accuracy

The key insight is that the Supervisor Agent doesn’t change the runtime pipeline. It changes the configuration that the Document Processing Agent uses.

Example flow:

Human reviews extracted data and corrects a field value
Supervisor Agent sees the correction
Supervisor analyzes: “The field ‘Total Premium’ was extracted from the wrong table. The model confused ‘Quoted Premium’ with ‘Total Premium’ on 3 out of 10 similar documents.”
Supervisor suggests: Update the field description to: “Total Premium — the final premium amount after all adjustments. Located in the ‘Premium Summary’ section. Do NOT use the ‘Quoted Premium’ from the proposal section.”
Human approves the suggestion
The updated description is saved to the document type configuration
Next time the Document Processing Agent processes this document type, it uses the improved description

The runtime is deterministic. The improvement is controlled.

Why Two Agents Instead of One?

Reliability vs. Adaptability

A single agent that both processes documents AND learns from feedback creates a dangerous coupling. If the learning affects the processing logic, you lose determinism:

Yesterday’s correct extraction might become wrong today because the agent “learned” something from a different document
You can’t reproduce a previous result because the agent’s state has changed
You can’t audit the decision because the reasoning is entangled with learning

The Two-Agent Pattern solves this by separating concerns:

Concern	Document Processing Agent	Supervisor Agent
When it runs	Real-time, on every document	Offline, periodically
What it does	Processes documents	Analyzes results, suggests improvements
State	Stateless	Stateful (tracks patterns over time)
Determinism	Fully deterministic	Non-deterministic (creative analysis)
Human oversight	Not needed per document	Required for approving changes

Enterprise Requirements

Enterprises need:

Reproducibility — given the same input, produce the same output (for audit)
Explainability — explain why a particular value was extracted (for compliance)
Controlled change — changes to processing logic go through approval (for governance)

The Two-Agent Pattern delivers all three: the Processing Agent provides reproducibility and explainability, while the Supervisor Agent provides improvement with human-approved change control.

Implementation Architecture

                    ┌─────────────────────────────┐
                    │    Supervisor Agent          │
                    │    (offline, analytical)     │
                    │                             │
                    │  • Analyze results          │
                    │  • Identify error patterns  │
                    │  • Suggest config changes   │
                    │  • Learn from corrections   │
                    └──────────┬──────────────────┘
                               │
                    Approved changes
                    (updated prompts,
                     field descriptions,
                     business rules)
                               │
                               ▼
┌──────────┐     ┌─────────────────────────────┐     ┌──────────┐
│ Documents │ ──► │  Document Processing Agent   │ ──► │ Extracted │
│           │     │  (real-time, deterministic)  │     │   Data    │
└──────────┘     │                             │     └──────────┘
                  │  • OCR                      │
                  │  • Classify                 │
                  │  • Extract (with schema)    │
                  │  • Validate (business rules)│
                  │  • Retry on failure         │
                  └─────────────────────────────┘

The two agents share a configuration store (document type definitions, field schemas, prompt templates, business rules) but operate independently. The Processing Agent reads configuration; the Supervisor Agent writes suggestions that humans approve.

Getting Started

DocAI Fabric implements the Two-Agent Pattern out of the box. When you set up a new document type:

The Document Processing Agent starts processing immediately using your natural language descriptions
As you review and correct results, the Supervisor Agent learns from your feedback
Over time, accuracy improves — but every change goes through your approval

The result: production-grade document processing from day one, with continuous improvement built in.

Try the platform and see the Two-Agent Pattern in action.

The Two-Agent Pattern for Production Document AI

Agent 1: The Document Processing Agent

Agent 2: The Supervisor Agent

How the Supervisor Improves Accuracy

Why Two Agents Instead of One?

Reliability vs. Adaptability

Enterprise Requirements

Implementation Architecture

Getting Started

Ready to try it?

Related Posts

Why GenAI-Native IDP Replaces Template-Based and ML-Training Approaches

Document AI Benchmark: GPT vs. Claude vs. Open-Source on Real-World Invoices

How We Process a Mortgage Application in Under 60 Seconds