Parseur
Intelligent document parsing platform that extracts, validates, and auto-corrects structured data from PDFs and images using an LLM-only approach.
Next.jsTypeScriptOpenAIClaudePrismaPostgreSQL
Role
Creator & Developer
Company
Personal
Year
2025
Project visual
Document pipeline
Visual extraction, schema validation, and auto-correction queue
2
LLM providers
Auto
Correction
Live
parseur.vercel.app
The Challenge
Extracting structured data from unstructured documents (PDFs, images) is notoriously hard. Traditional OCR pipelines are brittle and require manual rules per document type. I wanted a system that could understand any document layout using LLMs, with auto-correction and validation.
The Solution
- Multi-LLM pipeline combining OpenAI GPT-4o and Anthropic Claude 3.5 Sonnet
- Intelligent extraction with auto-correction and Zod schema validation
- S3-compatible storage with MinIO for local development
- Background job processing with Inngest for async document pipelines
- Full-stack Next.js 16 app with Prisma ORM and PostgreSQL
Results
2
LLM providers
Auto
Correction
Live
parseur.vercel.app
Learnings
Multi-LLM orchestration taught me that no single model excels at everything. GPT-4o handles visual layout understanding better, Claude excels at structured reasoning. The key is composing them — not choosing between them.
View on GitHub →