Blog

Deep dives into RAG pipelines, agentic AI, LLM integrations, and lessons from building production AI systems.

I Run an AI Agent on a VPS. Here's My Actual Setup

A walkthrough of my real OpenClaw deployment: 13 Telegram topics, GPT-5.2 on Azure free tier, Playwright browser automation with anti-bot bypass, and a massive ecosystem of over 5,700 ClawHub skills powering my Second Brain. Pulled directly from my live droplet.

LLMAgentic AIOpen Source|Mar 5, 2026· 16 min read

Zero-Cloud Agentic AI: Running Milvus and Local LLMs On-Prem

Sending sensitive internal data to closed APIs wasn't an option. Here is the exact architecture I used to build a fully local, autonomous agentic pipeline using Milvus, Ollama, and open-source embeddings.

Mar 5

7 min read

LLMAgentic AIOn-Prem

How I Set Up an On-Prem Agentic AI Stack with Open-Source Embeddings and Fully Local Inference

A practical guide to building a fully on-prem agentic AI system using open-source embeddings and local LLM inference — no APIs, no cloud, complete data control.

Mar 3

7 min read

LLMQwenvLLM

Qwen 3.5 in Production: Running with vLLM and Deploying Local Inference on Azure VM

A deep dive into deploying Qwen 3.5 with vLLM for high-throughput inference and running cost-efficient local inference on Azure VMs with GPU acceleration.

Mar 1

8 min read

RAGLLMInformation Retrieval

MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning

MiA-RAG introduces a mindscape-aware embedder and retriever that inject global semantic context into RAG pipelines, dramatically improving long-document QA accuracy and retrieval recall.

Feb 27

7 min read

GeminiChromaDBFastAPI

Natural Language Workout Logging with Gemini Flash

Using Google's Gemini Flash for intent extraction to build a conversational workout tracker with semantic memory retrieval via ChromaDB.

Feb 28

2 min read

RAGAgentic AIFastAPI

Building an Agentic RAG Pipeline for Manufacturing

How I designed a multi-agent RAG system that answers questions from factory equipment manuals, safety SOPs, and maintenance logs — running fully offline with Ollama and Milvus Lite.

Feb 15

2 min read

RAGAzure AI SearchInformation Retrieval

Hybrid Search in Enterprise RAG: Vector + BM25 Scoring

Why pure vector search isn't enough for enterprise documents, and how combining semantic embeddings with BM25 keyword matching dramatically improves retrieval accuracy.

Jan 28

2 min read

InfrastructureLLMOn-Prem

Sovereignty at Scale: Engineering Production-Grade RAG on Bare Metal

Stop paying the 'Internet Tax' and risking data leaks. We moved our RAG pipeline from SaaS to a local H100 cluster, cutting latency by 40% and TCO by 70% at scale.

Oct 24

11 min read

LLMRAGVectorDB

Moving Beyond Naive RAG: How We Built a 90% Hit-Rate Pipeline for Production

Basic vector search fails in production. Learn how we engineered a multi-stage RAG pipeline with hybrid search, re-ranking, and agentic loops to achieve 90%+ accuracy.

May 22

11 min read

Showing 10 of 10 articles