Inside GRAL's Document Intelligence: How Cognity Turns Unstructured Data Into Decisions

Enterprises run on documents. Insurance claims arrive as scanned PDFs. Supplier contracts come in as Word files with tracked changes. Regulatory filings arrive in XML wrapped in ZIP files attached to emails. Medical records mix handwritten notes with printed forms. Purchase orders vary between every vendor.

The data trapped in these documents drives business decisions — but extracting it reliably, at scale, across formats and languages, is a problem that most enterprises have not solved. They have teams of people doing manual data entry. They have OCR systems from 2014 that work on exactly one document type. They have RPA bots that break every time a form layout changes.

GRAL built Cognity to solve the document intelligence problem properly. Not for a single document type in a controlled format, but for the full chaos of enterprise document reality.

Why Traditional Document Processing Fails

The document processing market is full of tools that work in demos and fail in production. The failure modes are consistent:

Layout sensitivity. Traditional OCR and template-based extraction assume documents follow a fixed layout. Move a field two centimeters to the right, and extraction breaks. Change from a two-column to a three-column layout, and the system produces garbage. Real business documents — especially those from external parties — change layout without warning.

Format fragility. A system trained on clean digital PDFs fails on scanned documents. A system that handles scans fails on photographs of documents taken with phone cameras. A system built for English fails on German invoices. Enterprise document flows include every format, every quality level, every language.

Context blindness. Traditional extraction pulls text from fields. It does not understand what the document means. An invoice that says "net 30" next to a date requires understanding payment terms to extract the due date. A contract clause that references "Section 4.2(b)" requires understanding document structure. A medical form where "same as above" appears in the address field requires understanding context.

Scale brittleness. Systems that work at ten documents per hour collapse at ten thousand. Systems built for one document type require separate configuration for each new type. Adding a new vendor's invoice format means weeks of template building and testing.

How Cognity Works

Cognity is GRAL's document intelligence platform. It processes documents through a pipeline that combines computer vision, natural language understanding, and domain knowledge.

Document Ingestion

Cognity accepts documents in any format enterprises actually use:

Digital PDFs with embedded text — the easy case.
Scanned PDFs and image files requiring OCR — the common case.
Photographs of physical documents, including perspective distortion, variable lighting, and partial occlusion.
Email bodies and attachments — including cases where the relevant information is split between the email text and an attached document.
Office documents (Word, Excel, PowerPoint) with embedded objects, comments, and tracked changes.
Structured data formats (XML, JSON, CSV) that need to be correlated with unstructured documents.

The ingestion layer normalizes all inputs into a common representation that preserves both the textual content and the spatial layout of the document. This dual representation is critical — many extraction tasks depend on understanding where information appears on the page, not just what the text says.

Visual Understanding

Cognity's visual understanding layer processes the document as an image, independent of text extraction. This layer identifies:

Document structure. Headers, paragraphs, tables, lists, signatures, stamps, logos, and handwritten annotations. The structure model understands document layouts it has never seen before because it learned the visual grammar of business documents — not templates of specific forms.

Tables. Table extraction is notoriously difficult because tables in real documents have merged cells, spanning headers, implicit column boundaries, and nested sub-tables. Cognity's table model handles these cases because it was trained on thousands of real enterprise tables, not synthetic examples.

Handwriting. Many enterprise documents include handwritten elements — signatures, annotations, corrections, filled-in form fields. Cognity's handwriting recognition handles multiple scripts and works with the messy reality of real handwriting, not neatly written samples.

Semantic Extraction

Raw text and layout are inputs to extraction, not the output. Cognity's semantic extraction layer understands what the document means:

Entity extraction with domain context. Cognity extracts entities — names, dates, amounts, addresses, reference numbers — using models that understand the document's domain. A "date" field on an insurance claim has different semantics than a "date" field on a purchase order. The extraction model uses the document type and surrounding context to disambiguate.

Relationship mapping. Documents contain relationships between entities. A line item on an invoice connects a product description, a quantity, a unit price, and a total. A contract clause connects parties, obligations, conditions, and dates. Cognity extracts these relationships, not just isolated fields.

Cross-reference resolution. Enterprise documents reference other documents, internal sections, and external standards. Cognity resolves these references, linking "per Agreement dated March 15" to the actual agreement in the document store and "as defined in Section 2" to the relevant section within the same document.

Confidence and Validation

Every extraction in Cognity carries a confidence score. But confidence scores alone are not actionable — what matters is knowing when to trust the extraction and when to flag it for human review.

Calibrated confidence. Cognity's confidence scores are calibrated against actual accuracy. When the system reports 95% confidence, the extraction is correct 95% of the time. This calibration is maintained through continuous monitoring and recalibration against production data.

Business rule validation. Extracted data passes through configurable business rules. An invoice total that does not match the sum of line items is flagged regardless of extraction confidence. A contract date in the past when the system expects a future date is flagged. These rules catch errors that the extraction model misses.

Human-in-the-loop routing. When extraction confidence falls below a configurable threshold, or when business rules flag an inconsistency, the document is routed to a human reviewer. The reviewer sees the original document alongside the extraction results, with uncertain fields highlighted. Their corrections feed back into model improvement.

Integration with Enterprise Systems

Cognity does not exist in isolation. Extracted data flows into the enterprise systems where it drives decisions:

ERP systems receive validated invoice data, purchase order confirmations, and supplier information.
Claims systems receive extracted claim details, supporting documentation analysis, and compliance checks.
Contract management receives clause extraction, obligation tracking, and renewal date monitoring.
Regulatory systems receive compliance document analysis, filing validation, and audit documentation.

These integrations use GRAL's standard connector layer, the same infrastructure that Sentara and GRAL's other platforms use. A document processed by Cognity can trigger a workflow in ServiceNow, update a record in Salesforce, and send a notification in Teams — all within minutes of ingestion.

Production Performance

Cognity's production metrics across GRAL's managed deployments:

Processing throughput. 2,400 pages per hour per processing node. Horizontal scaling is linear — ten nodes process 24,000 pages per hour.
Extraction accuracy. 97.3% field-level accuracy across all document types. Domain-specific deployments with fine-tuned models achieve 99.1% on their target document types.
Table extraction accuracy. 94.8% cell-level accuracy on complex tables with merged cells and spanning headers.
Handwriting recognition. 91.2% character accuracy on real handwritten annotations. This number reflects the actual quality of handwriting GRAL encounters in production — not carefully written test samples.
End-to-end latency. P50: 2.3 seconds per page. P99: 8.1 seconds per page. Measured from ingestion to validated extraction output.

These metrics are production averages across multiple clients, document types, and quality levels.

What Makes GRAL's Approach Different

GRAL did not invent document AI. The difference is engineering discipline applied to production requirements:

No templates. Cognity does not require templates for new document types. The visual understanding and semantic extraction models generalize to documents they have not seen before. Adding a new vendor's invoice format does not require weeks of configuration — it requires uploading a few examples for validation.

Multilingual by default. Cognity processes documents in any language GRAL's clients encounter. European enterprises deal with documents in dozens of languages. A system that only handles English is not enterprise-ready.

On-premise processing. Like all GRAL platforms, Cognity runs on the client's infrastructure. Documents containing sensitive personal data, financial information, or trade secrets never leave the client's network.

Continuous learning. Every human correction in the review loop improves the extraction models. Cognity gets more accurate over time, not less. The learning is federated across deployments — improvements from one client's corrections benefit all clients, without sharing the actual documents.

GRAL built Cognity because enterprise document intelligence requires the same engineering rigor as any other production system. Not a demo that works on clean PDFs, but a platform that handles the full reality of enterprise documents — every format, every language, every quality level, every edge case — reliably, at scale, in production.