Parseur

Intelligent document parsing platform that extracts, validates, and auto-corrects structured data from PDFs and images using an LLM-only approach.

Next.jsTypeScriptOpenAIClaudePrismaPostgreSQL

Role

Creator & Developer

Company

Personal

Year

2025

Project visual

Document pipeline

Visual extraction, schema validation, and auto-correction queue

LLM providers

Auto

Correction

Live

parseur.vercel.app

The Challenge

Extracting structured data from unstructured documents (PDFs, images) is notoriously hard. Traditional OCR pipelines are brittle and require manual rules per document type. I wanted a system that could understand any document layout using LLMs, with auto-correction and validation.

The Solution

Multi-LLM pipeline combining OpenAI GPT-4o and Anthropic Claude 3.5 Sonnet
Intelligent extraction with auto-correction and Zod schema validation
S3-compatible storage with MinIO for local development
Background job processing with Inngest for async document pipelines
Full-stack Next.js 16 app with Prisma ORM and PostgreSQL

Results

LLM providers

Auto

Correction

Live

parseur.vercel.app

Learnings

Multi-LLM orchestration taught me that no single model excels at everything. GPT-4o handles visual layout understanding better, Claude excels at structured reasoning. The key is composing them — not choosing between them.

View on GitHub →

← Wortschatz