Building an Agentic RAG Pipeline for Manufacturing

Building effective RAG systems for enterprise use cases requires moving beyond the naive "chunk-and-retrieve" pattern. In this post, I walk through the architecture of my Agentic RAG system, built specifically for manufacturing document Q&A.

Why "Agentic"?

Standard RAG follows a linear path: embed query → retrieve chunks → generate answer. The problem? Complex manufacturing questions like "Compare the LOTO procedures for Machine A and Machine B" require decomposition — breaking one query into multiple retrieval steps.

The agentic approach introduces three specialized agents:

Router Agent: Classifies the user's intent — is this a direct factual lookup, a retrieval task, or a multi-part comparison?
Retriever Agent: Executes semantic search (or multiple searches for decomposed queries) against the Milvus Lite vector store
Generator Agent: Synthesizes the retrieved context into a response with mandatory page-level citations

Going Offline-First

Manufacturing facilities often operate in air-gapped environments with no cloud connectivity. This system runs entirely on-premise:

Ollama serves the LLM locally (Llama 2, Mistral, etc.)
Milvus Lite acts as the vector database — no Docker required, runs as a local file
FastAPI provides the REST interface

What I Learned

The biggest challenge was getting citation accuracy right. Manufacturing documents have strict terminology (LOTO, PPE, pressure ratings), and the LLM needs to reproduce these exactly. Forcing mandatory source attribution with page numbers solved this — hallucination rates dropped significantly once the model was constrained to only cite retrieved chunks.

Try It

The full source code is on GitHub. You can have it running locally in under 5 minutes with the quick start guide.

ahad.

Building an Agentic RAG Pipeline for Manufacturing

Why "Agentic"?

Going Offline-First

What I Learned

Try It

Read Next

I Run an AI Agent on a VPS. Here's My Actual Setup

How I Set Up an On-Prem Agentic AI Stack with Open-Source Embeddings and Fully Local Inference

MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning