Home/Projects/Document Intelligence RAG

Document Intelligence RAG

Enterprise RAG system handling 10K+ document pages with Azure AI Search. Hybrid retrieval (BM25 + semantic) reduced hallucinations by 85% while maintaining sub-2s response times. Source-attributed answers via GPT-4.

PythonFastAPIAzure AI SearchAzure OpenAIGPT-4

Source Code

Problem

Traditional document search relies on keyword matching, missing semantic meaning and context. Enterprise teams need accurate, context-grounded answers from large document repositories — not just a list of matching files.

Architecture

The system has two core pipelines:

Ingestion Pipeline

Documents (PDF, DOCX, TXT) → Document Loader → Recursive Chunking (256 tokens, 50 overlap) → Azure OpenAI Embeddings (text-embedding-ada-002) → Azure AI Search Index

Retrieval + Generation Pipeline

User Query → Query Processing → Hybrid Search (Vector + BM25) → Reranking (Cross-encoder) → Context Assembly → GPT-4 Generation → Answer with citations

Technical Highlights

Chunking Strategy: Recursive overlapping sliding windows (256 tokens, 50 overlap) to preserve context at boundaries
Hybrid Search Scoring: final_score = (0.7 × vector_score) + (0.3 × bm25_score)
Hallucination Reduction: Grounding enforcement, fact verification, confidence scoring, and mandatory source attribution
Performance: Batch embedding, caching, index partitioning, async I/O