Hybrid Search in Enterprise RAG: Vector + BM25 Scoring

When I built my Document Intelligence RAG system, the first version used pure vector search. It worked well for general queries but fell flat on domain-specific terms — acronyms, product codes, and regulatory references would return irrelevant results.

The Problem with Pure Vector Search

Embeddings capture semantic meaning beautifully. "Annual revenue" and "yearly income" map to nearby vectors. But they struggle with:

Exact terms: Searching for "ISO-27001" should find documents mentioning exactly that standard
Acronyms: "MFA" in a security document shouldn't match "Ministry of Foreign Affairs"
Product codes: "SKU-4821-B" needs exact matching, not semantic similarity

The Hybrid Approach

The solution is combining two search paradigms:

Vector Search (Semantic)

Cosine similarity on Azure OpenAI embeddings (text-embedding-ada-002)
Captures meaning, handles synonyms and paraphrasing

Keyword Search (BM25)

Traditional statistical text matching
Excels at exact terms, acronyms, and identifiers

The combined scoring formula:

1final_score = (0.7 × vector_score) + (0.3 × bm25_score)

The 70/30 weighting was determined empirically — semantic understanding should dominate, but keyword precision acts as a powerful tiebreaker.

Chunking Matters Too

Before search even happens, how you chunk documents has a massive impact. I use recursive overlapping sliding windows:

Chunk size: 256 tokens (balances context and specificity)
Overlap: 50 tokens (prevents context loss at boundaries)
Hierarchy preservation: Maintains document structure (headings, sections)

Results

After switching from pure vector to hybrid search, retrieval precision on our test corpus improved by ~23%. The biggest gains were on queries containing technical jargon and regulatory references.

Full implementation is on GitHub.

ahad.

Hybrid Search in Enterprise RAG: Vector + BM25 Scoring

The Problem with Pure Vector Search

The Hybrid Approach

Chunking Matters Too

Results

Read Next

MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning

I Run an AI Agent on a VPS. Here's My Actual Setup

Building an Agentic RAG Pipeline for Manufacturing