There's a dirty secret in enterprise operations: the most expensive employee in your company might be the one copying data from PDFs into spreadsheets.
We see it everywhere. Insurance claims. Shipping manifests. Compliance filings. Purchase orders. The documents arrive in seventeen different formats, and someone has to read each one and type the numbers into SAP.
We built a system that reads faster than any human and never transposes a digit.
The Scale of the Problem
One of our clients — a logistics company processing 8,000 shipments per day — had a team of 23 people doing nothing but document entry. Bills of lading, customs declarations, packing lists, invoices. Each shipment generated 4-7 documents, each document had 15-30 fields that needed extraction.
That's roughly 50,000 pages per day, 250,000 data points, all entered manually. Error rate: 3.1%. Cost per document: about 2 EUR.
What We Built
Cognity's document intelligence module handles the full pipeline:
1. Ingestion
Documents arrive via email, SFTP, API upload, or scanned at intake stations. We normalize everything to a common format regardless of source. Handwritten notes, faxes (yes, faxes), thermal printer receipts — the system handles all of it.
2. Classification
Before extraction, the system identifies what it's looking at. Is this a bill of lading or a commercial invoice? A customs declaration or a packing list? Classification accuracy: 99.7%.
This step matters because extraction templates are document-specific. A field labeled "weight" means different things on different documents — gross weight, net weight, volumetric weight. Context determines interpretation.
3. Extraction
We use a combination of layout analysis and language models:
- Structured documents (forms, tables): Layout-aware extraction using spatial relationship modeling
- Semi-structured documents (invoices, POs): Template matching with fallback to LLM-based extraction
- Unstructured documents (emails, notes): Full NLP extraction with entity recognition
4. Validation
Every extracted field runs through business rule validation. Does this weight make sense for this commodity code? Is this shipper-consignee pair known? Does the total match the line items?
Anomalies get flagged for human review — but only anomalies. The humans now handle exceptions, not routine processing.
The Results
After deploying across their three main processing centers:
- Processing speed: 50,000 pages/day with 4 review operators (down from 23 full-time staff)
- Accuracy: 99.2% end-to-end (up from 96.9% manual)
- Processing time: Average 1.3 seconds per document (down from 4.2 minutes manual)
- Cost per document: 0.08 EUR (down from 2.00 EUR)
The 19 staff members who were previously doing data entry were redeployed to exception handling, customer service, and process improvement roles. None were laid off — they were reassigned to work that actually requires human judgment.
Technical Decisions That Mattered
On-premise deployment. These documents contain sensitive commercial data — pricing, volumes, customer relationships. Cloud processing was a non-starter. We deployed Cognity on-premise with GPU nodes for inference.
Confidence thresholds over accuracy targets. Instead of optimizing for maximum accuracy, we optimized for calibrated confidence. When the system says it's 98% confident, it's right 98% of the time. This lets operators trust the confidence scores and focus their review time where it matters.
Feedback loops. Every correction an operator makes feeds back into the model. The system launched at 94% accuracy. Six months later, it's at 99.2%. The operators are literally training their replacement — except they're not being replaced, they're being promoted to oversight roles.
The Uncomfortable Truth
Most enterprises know they have a document processing problem. They've known for years. The reason they haven't solved it isn't technology — it's organizational. Deploying document AI means changing workflows, redefining roles, and trusting a system to do work that humans have always done.
The technology is ready. The question is whether your organization is ready to use it.
We can help with both.