Practitioners are increasingly abandoning off-the-shelf document processing platforms in favor of custom hybrid pipelines that combine specialized layout models with language models to achieve reliable production-grade results.
Key Points
- Production deployments frequently fail to match demo performance, with accuracy often dropping significantly after the first page of complex documents.
- A two-stage architecture—using dedicated OCR or layout models to create structured markdown before applying language models—is the current industry standard.
- Table extraction remains the most significant technical hurdle, with many off-the-shelf tools struggling to process merged cells and multi-page layouts.
- Developers are successfully replacing expensive cloud-based API services with local, open-source stacks running on consumer-grade hardware.
- Human-in-the-loop review remains essential, with successful teams routing 15% to 30% of documents for manual verification to ensure data integrity.
- Agentic workflows often face reliability issues in production, leading many practitioners to prefer deterministic scripts for consistent, repetitive document formats.