When I built my Document Intelligence RAG system, the first version used pure vector search. It worked well for general queries but fell flat on domain-specific terms — acronyms, product codes, and regulatory references would return irrelevant results.
The Problem with Pure Vector Search
Embeddings capture semantic meaning beautifully. "Annual revenue" and "yearly income" map to nearby vectors. But they struggle with:
- Exact terms: Searching for "ISO-27001" should find documents mentioning exactly that standard
- Acronyms: "MFA" in a security document shouldn't match "Ministry of Foreign Affairs"
- Product codes: "SKU-4821-B" needs exact matching, not semantic similarity
The Hybrid Approach
The solution is combining two search paradigms:
Vector Search (Semantic)
- Cosine similarity on Azure OpenAI embeddings (
text-embedding-ada-002) - Captures meaning, handles synonyms and paraphrasing
Keyword Search (BM25)
- Traditional statistical text matching
- Excels at exact terms, acronyms, and identifiers
The combined scoring formula:
1final_score = (0.7 × vector_score) + (0.3 × bm25_score)The 70/30 weighting was determined empirically — semantic understanding should dominate, but keyword precision acts as a powerful tiebreaker.
Chunking Matters Too
Before search even happens, how you chunk documents has a massive impact. I use recursive overlapping sliding windows:
- Chunk size: 256 tokens (balances context and specificity)
- Overlap: 50 tokens (prevents context loss at boundaries)
- Hierarchy preservation: Maintains document structure (headings, sections)
Results
After switching from pure vector to hybrid search, retrieval precision on our test corpus improved by ~23%. The biggest gains were on queries containing technical jargon and regulatory references.
Full implementation is on GitHub.
Read Next
MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning
MiA-RAG introduces a mindscape-aware embedder and retriever that inject global semantic context into RAG pipelines, dramatically improving long-document QA accuracy and retrieval recall.
I Run an AI Agent on a VPS. Here's My Actual Setup
A walkthrough of my real OpenClaw deployment: 13 Telegram topics, GPT-5.2 on Azure free tier, heartbeat-driven morning briefings, Playwright browser automation, memsearch semantic recall, and a Second Brain that auto-captures everything I text. Pulled directly from my live droplet.
Building an Agentic RAG Pipeline for Manufacturing
How I designed a multi-agent RAG system that answers questions from factory equipment manuals, safety SOPs, and maintenance logs — running fully offline with Ollama and Milvus Lite.