Writing about code,
products & the craft.
Thoughts on full-stack engineering, AI integration, mobile dev, and building things that scale, by Rahul Kashyap.
Projects
Active builds and companies
Writing
Latest posts
Multi-Modal Harnesses: When Your Agent Needs to See, Hear, and Click
Most agent harnesses are text-only by design. When multimodal lands, the architecture has to change. Here's what actually shifts at the messages array level, and how to build it right.
Observability in Agent Harnesses: Logging, Tracing, and Knowing When Your Agent Is Stuck
A practical guide to adding observability to agent harnesses: structured JSONL logging, OpenTelemetry tracing, token accounting, stuck-agent detection, and turning production logs into eval datasets.
Harness Failure Modes: What Actually Breaks and How to Catch It
A production-focused breakdown of the 10 ways agent harnesses fail in the wild: infinite loops, context rot, prompt injection, hallucinated tool calls, budget blowouts, and more.
Evaluation Pipelines for Harnesses: How to Know If Your Agent Actually Works
Model benchmarks don't tell you whether your agent harness is working. Here's how to build evaluation pipelines that actually measure harness performance: task evals, tool quality metrics, regression testing, and the production feedback loop.
Cost at Scale: Why a 1,000-Token System Prompt Is Worth Engineering For
Pi's system prompt fits in under 1,000 tokens. Claude Code's runs 10,000+. That gap costs $630M/year at scale. Here's the math on system prompt token economics for agent harnesses.
Model-Harness Co-Training: The Flywheel Nobody Talks About
Claude Opus 4.5 scored 87% on SWE-bench with a matched multi-agent harness. Alone, it scored 74.8%. That 12-point gap isn't a model difference. It's a co-training flywheel, and it's reshaping how AI platform lock-in actually works.
The Harness as OS: What the Most Useful Analogy in AI Gets Right (and Where It Breaks)
The model is the CPU. The context window is RAM. Tools are system calls. The harness is the kernel. This is the most useful architectural framing for understanding AI agents, and it has real limits worth knowing.
The AGENTS.md and CLAUDE.md Pattern: Why Every Harness Arrived at the Same Solution
AGENTS.md and CLAUDE.md solve the same problem: persistent project context for AI coding agents. How they work mechanically, why the open standard converged, what to put in them (and leave out), and where the pattern breaks.
A2A Protocol: When Agents Talk to Each Other
A technical look at Google's Agent2Agent protocol: what it is, how the spec works, how it compares to MCP, the multi-agent patterns it enables, the security gaps that still exist, and whether it's becoming the industry standard for inter-agent communication.
The Harness Is Your Last Line of Defense
Your model can't touch the filesystem. Your harness can. A practical look at sandboxing, least-privilege tools, prompt injection defenses, path traversal, secret handling, MCP supply chain risks, and multi-agent trust boundaries, with real CVEs and working code.
The System Prompt and How the Harness Uses It
Pi gives the model 800 tokens. Claude Code gives it 27,000. Both ship working code. A practical look at how harnesses structure system prompts, where the token budget goes, what long prompts do to model performance, and why caching is an architectural decision at scale.
Agent Harnesses: The Hidden Layer That Actually Runs Your AI
GPT-5.5 jumped 25 percentage points by switching harnesses. Same model, same weights. A practical look at what agent harnesses are, how Anthropic, OpenAI, and Google build theirs, and how to build or customize your own.
Zero-downtime deploys on a single EC2 instance (no Kubernetes, no ECS)
How we stopped taking the site down on every deploy. Blue-green switching with Docker and nginx on one EC2 instance. No Kubernetes, no ECS, no added cost.