Building intelligent RAG & agentic AI systems.
I'm Ahad Ahmad Khan, a Generative AI Engineer at Capgemini. I design RAG pipelines, agentic AI workflows, and LLM-powered backends using Python, FastAPI & Azure.

Core Technologies
Currently Building
Agentic RAG for Manufacturing
A multi-agent RAG system for manufacturing document Q&A. Uses a Router → Retriever → Generator pipeline to answer questions from equipment manuals, safety SOPs, and maintenance docs — all running offline.
Recent Work
View all projects ->AI Gym Memory System
AI-powered conversational workout tracker. Log exercises with natural language, then query your history semantically — 'What did I train last Tuesday?' — powered by Gemini Flash and ChromaDB.
Document Intelligence RAG
Enterprise RAG system for querying documents with Azure AI Search and vector embeddings. Features hybrid search (BM25 + semantic), hallucination reduction, and source attribution with GPT-4.
MiA-RAG: Mindscape-Aware RAG
Paper-accurate implementation of Mindscape-Aware RAG (arXiv:2512.17220). Uses the official MiA-Emb-0.6B model with hierarchical summarization and residual score fusion for context-enriched retrieval.
Latest Posts
View all posts ->How I Set Up an On-Prem Agentic AI Stack with Open-Source Embeddings and Fully Local Inference
A practical guide to building a fully on-prem agentic AI system using open-source embeddings and local LLM inference — no APIs, no cloud, complete data control.
Qwen 3.5 in Production: Running with vLLM and Deploying Local Inference on Azure VM
A deep dive into deploying Qwen 3.5 with vLLM for high-throughput inference and running cost-efficient local inference on Azure VMs with GPU acceleration.
MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning
MiA-RAG introduces a mindscape-aware embedder and retriever that inject global semantic context into RAG pipelines, dramatically improving long-document QA accuracy and retrieval recall.