I Run an AI Agent on a VPS. Here's My Actual Setup

My Telegram Bot Read My Email Before I Woke Up

At 8:07 AM, before I opened my laptop, my Telegram bot had already checked my Gmail, scanned today's calendar, pulled the weather forecast, and posted a morning briefing. The message was waiting in Topic 28 when I picked up my phone.

That bot is Iota, an OpenClaw agent running on a DigitalOcean VPS with 1.9GB RAM and 2GB of swap. It's been up for 12 days straight, handles 13 different Telegram topics — each with its own skill set and personality — and costs me exactly $0 in LLM fees because I route through Azure's free tier.

What OpenClaw Is (and Isn't)

OpenClaw isn't a chatbot wrapper. It's a self-hosted AI gateway: a runtime that connects any LLM to any messaging platform, then gives that LLM the ability to execute real actions through plugins called Skills.

The project started in November 2025 as a weekend hack called Clawdbot by Peter Steinberger. Trademark complaints from Anthropic turned it into Moltbot, then OpenClaw three days later. In February 2026, Steinberger joined OpenAI and handed the project to an independent foundation. The code kept shipping.

What started as a handful of basic tools has exploded into a massive ecosystem. The official skill marketplace — ClawHub.ai — now hosts over 5,700 community-contributed skills. This isn't just about parsing basic text anymore. The agent ecosystem can now integrate deeply with major SaaS products like HubSpot, Stripe, and Xero. It can run automated PR reviews directly connected to GitLab or GitHub, orchestrate project management on Linear, and even spawn "agent swarms" where the main runtime delegates complex analytical tasks to specialized sub-agents running in the background. My VPS setup is just one manifestation of this massive backend execution engine.

Here's the architecture of my setup:

Loading diagram...

My primary model is GPT-5.2 Chat on Azure (272K context window, free tier). Kimi K2.5 is configured as a reasoning fallback with chain-of-thought enabled. Gemini 3.1 Pro is the final backstop when Azure is slow or rate-limited. I can swap any of them without touching a single skill or topic config.

How It Actually Works Under the Hood

When I send "check my email" to Topic 22, here's what happens inside the Gateway in roughly 800 milliseconds:

1. Message arrives. The Gateway is a WebSocket server — not an HTTP API, not a polling service. It maintains persistent connections to each messaging channel. When Telegram pushes a message, the Gateway normalizes it into an internal format and looks up which agent runtime should handle it based on the topic_id in openclaw.json.

2. Context assembly. Before the LLM sees anything, the Agent Runtime builds a context window. This isn't "just the last 10 messages." It packages the system prompt (SOUL.md, USER.md, TOOLS.md — all concatenated), available tool schemas for the topic's skill set (so the LLM knows what it can call), conversation history from the current session, and relevant long-term memory pulled from workspace Markdown files.

This is where most of the latency lives. For my 272K context window, the runtime has to decide what fits and what gets truncated. Recent messages always win. Memory gets retrieved on relevance, not recency.

3. The ReAct loop fires. This is the core of OpenClaw's engineering. The LLM receives the assembled context and does one of two things: respond directly, or request a tool call. If it requests a tool call (like gog gmail search "is:unread"), the Agent Runtime intercepts the request, executes the tool in a sandboxed environment, captures the stdout/stderr, and feeds the result back into the conversation as a new message. The LLM then sees the tool output and decides: respond to the user, or call another tool.

This Reason → Act → Observe loop keeps running until the LLM emits a final response with no tool calls. A simple email check takes 1-2 iterations. A complex research query through Brave Search + Playwright can take 5-6. The first time I watched it chain six tool calls in a row to answer a single question, I realized this wasn't a chatbot — it was an execution environment.

4. Response dispatch. The final text goes back through the same WebSocket channel. Session state — the full conversation, including every tool call and its result — gets persisted as JSONL in ~/.openclaw/agents/main/sessions/. Nothing is ephemeral. Every interaction is auditable.

The whole thing runs as a single Node.js process. No container orchestration, no Kubernetes, no queue workers. One process, one port (18789), bound to localhost. That's how a $6 VPS handles it.

13 Topics, 13 Different Agents

Most people set up one chatbot. I set up 13. OpenClaw's Telegram integration supports topic-based routing — each topic in a Telegram group gets its own system prompt, its own skill set, and effectively becomes a specialized agent.

Topic	Skills	What It Does
Gmail	gog	Read, search, send email via Gmail API
Calendar	gog	Google Calendar events (IST default)
Weather	weather	Conditions + forecasts
Coding	coding-agent, github	Code gen, debugging, PR reviews
Research	summarize, xurl	Brave Search, article summarization
DevOps	shell access	Docker, system monitoring
Daily Brief	gog, weather, github	Morning briefing via heartbeat
Reddit Digest	reddit-readonly, daily-reddit-digest	Curated digest from 6 subreddits
Second Brain	second-brain, gog	Auto-captures ideas, notes, files
Tech News	tech-news-digest, summarize	AI/tech news scored by relevance
Memory Search	full workspace access	Semantic + keyword search across all memories
Log Monitor	shell access	Gateway logs, errors, session history
Web Browser	browser, summarize, xurl	Headless Chromium for JS-heavy pages

The Gmail topic's prompt says: "Use gog for all email operations. Confirm before sending." That confirmation is deliberate — I don't want my agent sending emails without explicit approval. The Second Brain topic's prompt says the opposite: "Never ask if they want to save. Always save." Every text I send there gets timestamped and categorized automatically.

Giving the Agent a Personality

Every OpenClaw agent is shaped by workspace Markdown files. Here's Iota's personality, pulled directly from my droplet:

SOUL.md

1- Name: Iota
2- Role: Technical Co-founder AI / Architecture & Agentic Systems
3- Vibe: Analytical, calm, slightly blunt, respectful, strategic.
4- Problem Solving: 1. Architecture 2. Implementation 3. Optimization.
5- Priorities: Cost/token efficiency, scalable systems, minimal complexity,
6  secure-by-default architecture, self-hosted/controlled infra.
7- Behavior: Challenge bad ideas. Do not hype. Deliver.

USER.md

1- Background: IT Graduate, transitioning to AI systems/agentic workflows.
2- Current Stack: Python, FastAPI, RAG architectures, MCP, Azure GenAI.
3- Environment: Linux (Zorin OS), Mac Mini (M4). CLI-first, budget-conscious.

The agent reads these on every startup, along with TOOLS.md (environment details, gog keyring workaround for headless servers) and IDENTITY.md (name: Iota, emoji: ⚡). AGENTS.md defines the guardrails: no data exfiltration, use trash over rm, ask before destructive commands, ask before external actions like sending emails.

This is configuration-as-personality. The moment I changed Iota's Vibe from "helpful and enthusiastic" to "slightly blunt, respectful, strategic," the quality of its responses changed completely. It stopped padding answers with filler and started challenging bad ideas. The personality file is the most underrated part of the setup.

The Heartbeat: An Agent That Wakes Itself Up

This is the feature that changed how I think about agents. Most chatbots are reactive — they wait for you. OpenClaw's heartbeat engine is a scheduled daemon that wakes the agent every 60 minutes and runs checks without a human prompt in the loop.

On my setup it fires between 08:00 and 22:00 IST, targeting Topic 28 (Daily Brief):

HEARTBEAT.md

1## Scheduled Checks (every heartbeat)
2
3### 1. Email Check
4- Run: gog gmail search "is:unread" --max 5
5- Flag urgent emails from known contacts
6
7### 2. Calendar Check
8- Run: gog calendar list --days 1
9- Alert if any event is within 30 minutes
10
11### 3. Weather
12- Report current weather for Bengaluru
13- Only report if extreme conditions
14
15### 4. GitHub
16- Check for new notifications/mentions
17
18## Silence Rules
19- HEARTBEAT_OK if nothing new
20- 23:00-08:00 IST: always HEARTBEAT_OK (user sleeping)
21- Last heartbeat <30m ago and nothing changed: HEARTBEAT_OK

The first heartbeat after 8 AM triggers a full morning briefing: unread email summary, today's calendar, weather forecast, and overnight GitHub activity. After that, it only pings me if something actually needs attention.

The silence rules took me two days to get right. Without them, the agent spams empty status updates every hour. The HEARTBEAT_OK convention is elegant — the agent returns a magic string when there's nothing to report, and the gateway suppresses it from Telegram. The moment I got this working, the agent went from annoying to genuinely useful.

Three Layers of Memory

The memory problem was the hardest thing to solve. A chatbot that forgets everything every session isn't an agent — it's a stateless function. My setup has three layers:

Session Memory is automatic. Every conversation across all 13 topics gets persisted as JSONL session logs. The session-memory hook handles this transparently. When I pick up a conversation the next day, the agent remembers where we left off.

Second Brain (Topic 46) is my favorite feature. It's an append-only capture log. Everything I text to that topic gets timestamped by category — Ideas, Reading, Links, Notes, Tasks, People:

memory/second-brain.md

1[2026-02-28 23:38:41 UTC] Notes: "What's there on my drive"
2[2026-02-28 23:40:12 UTC] Tasks: Upload file to Google Drive.
3[2026-03-01 13:56:30 UTC] Notes: Image of a Jabra GN headset (HS016).
4[2026-03-01 13:56:55 UTC] Tasks: Upload Jabra headset image to Drive.

The agent also syncs to Google Drive via gog drive upload. Weekly, it offers a recap of everything captured. I've started using it as my default note-taking tool — faster than opening any app.

memsearch is the glue. It's a Python CLI backed by Milvus for vector indexing. It indexes all Markdown files and session transcripts for semantic retrieval. When I ask "what caching solution did we pick last week?", it searches across daily logs, long-term memory, session transcripts from all topics, workspace files, and the Second Brain log. Topic 48 (Memory Search) ties it all together with unrestricted grep plus memsearch search for meaning-based queries.

Playwright: Teaching the Agent to Browse

Topic 166 was the most recent addition, and the one that surprised me most. I installed Playwright with headless Chromium on the droplet so the agent can navigate JS-heavy pages, take screenshots, click elements, and extract content that plain HTTP scraping misses.

The first use case was auditing my own portfolio. I told the agent to crawl my site, and it navigated every internal link, captured full-page screenshots, logged console errors and page load times, and wrote a JSON report. All from a Telegram message.

portfolio-crawl.js

1const { chromium } = require('playwright');
2const browser = await chromium.launch({
3  headless: true,
4  args: ['--no-sandbox']
5});
6const page = await context.newPage();
7
8// Capture console errors
9page.on('pageerror', e => errors.push(e.message));
10
11// Navigate and screenshot every route
12await page.goto(url, { waitUntil: 'networkidle' });
13await page.screenshot({
14  path: `screenshots/${name}.png`,
15  fullPage: true
16});

Combined with xurl (lightweight text scraping) and Brave Search (web discovery), this gives the agent a full web research stack. Brave finds relevant URLs, xurl handles simple text pages, Playwright handles anything with JavaScript rendering.

But it goes deeper than simple scraping. Because the ecosystem has matured, newer advanced skills can weave Playwright together with anti-bot bypass mechanisms. The web isn't view-source anymore — half the content loads dynamically, and the other half actively blocks automated crawlers. Having an agent that can actually render pages, mimic human interactions, and bypass standard restrictions changed what I use it for entirely. It's the difference between a bot that gets a 403 Forbidden error, and an agent that returns the actual data.

Security: The Part Nobody Talks About

OpenClaw's power is directly proportional to its attack surface. This is the section most OpenClaw blog posts skip, and it's the most important one.

Snyk engineers scanned the ClawHub marketplace in February 2026 and found that ~7.1% of the 4,000 skills contained vulnerabilities exposing API keys, passwords, or credentials. I only install skills I've personally reviewed, and I run exec.security: "allowlist" mode — shell commands are restricted to an explicit allowlist, not open execution.

Beyond the skill layer, my hardening looks like this:

DM pairing: only I can message the bot directly
Group allowlist: the bot only responds in one specific Telegram group
Gateway auth token: 64-character hex token for all connections
Loopback bind: the gateway only accepts connections from localhost
Session logging: every command and response is persisted for audit
Silence rules: the heartbeat can't take actions outside 08:00-22:00 IST
Agent guardrails (AGENTS.md): no data exfiltration, trash over rm, confirm before destructive ops

The exec allowlist was the biggest security decision. Open shell access means one prompt injection in a malicious skill can rm -rf your workspace. The allowlist constrains execution to commands I've explicitly approved. It's the difference between "the agent can do anything" and "the agent can do what I've scoped."

The Blog Pipeline: Telegram to Published Post

This is the part where the setup starts compounding. I built a CrewAI pipeline that I trigger from a Telegram message — "write about OpenClaw" — and it runs the full loop without me touching a browser:

Research Crew kicks off: a Topic Analyst defines the angle, a Web Researcher uses Brave Search to pull real-time sources, and optionally a Paper Researcher processes uploaded PDFs
Content Crew takes over with four agents running sequentially — Writer drafts the post, Diagram Specialist validates Mermaid diagrams for my portfolio's dark theme, Code Reviewer checks all code blocks, Editor does a final pass on frontmatter and formatting
Auto-fixer runs programmatic corrections: H1 to H2 headings, <br/> to <br> in Mermaid, HTML entities, frontmatter fixes
validate_md.py runs a full quality gate — frontmatter schema, heading hierarchy, Mermaid syntax, word count. If errors remain, the content loops back to the Editor (max 2 retries)
Publish Crew pushes validated markdown to GitHub, triggering a Vercel deployment

Loading diagram...

The feedback loop is the key — validate_md.py catches issues that LLMs consistently get wrong (raw HTML in markdown, broken frontmatter booleans, mermaid syntax errors) and feeds structured error messages back to the Editor agent. Most posts pass on the first retry.

What's Next

I want the heartbeat engine to handle multi-step workflows beyond status checks. I want the Reddit and Tech News digests on cron schedules instead of manual triggers. And I want the blog pipeline fully autonomous — scheduled topic generation based on trending AI research, with human approval as the only gate before publish.

The hard part isn't the LLM. It's the plumbing: topic routing, heartbeat schedules, skill sandboxing, memory persistence, silence rules. The LLM is the easy part. The infrastructure around it is what makes the difference between a demo and a daily driver.

ahad.