Writing about code,
products & the craft.

Most agent harnesses are text-only by design. When multimodal lands, the architecture has to change. Here's what actually shifts at the messages array level, and how to build it right.

AI Agents Agent Harnesses Multimodal AI Computer Use

May 7, 2026 18 min read

Observability in Agent Harnesses: Logging, Tracing, and Knowing When Your Agent Is Stuck

A practical guide to adding observability to agent harnesses: structured JSONL logging, OpenTelemetry tracing, token accounting, stuck-agent detection, and turning production logs into eval datasets.

AI Engineering Agent Harnesses Observability OpenTelemetry

Observability in agent harnesses post cover

May 7, 2026 18 min read

Harness Failure Modes: What Actually Breaks and How to Catch It

A production-focused breakdown of the 10 ways agent harnesses fail in the wild: infinite loops, context rot, prompt injection, hallucinated tool calls, budget blowouts, and more.

AI Engineering Agent Harnesses Agentic AI Security

May 7, 2026 22 min read

Evaluation Pipelines for Harnesses: How to Know If Your Agent Actually Works

Model benchmarks don't tell you whether your agent harness is working. Here's how to build evaluation pipelines that actually measure harness performance: task evals, tool quality metrics, regression testing, and the production feedback loop.

AI Engineering Agent Harnesses Evaluation MLOps

Evaluation pipelines for agent harnesses post cover

May 7, 2026 13 min read

Cost at Scale: Why a 1,000-Token System Prompt Is Worth Engineering For

Pi's system prompt fits in under 1,000 tokens. Claude Code's runs 10,000+. That gap costs $630M/year at scale. Here's the math on system prompt token economics for agent harnesses.

Agent Harnesses LLM Cost Token Economics AI Infrastructure

Cost at scale system prompt token efficiency post cover

May 7, 2026 20 min read

Model-Harness Co-Training: The Flywheel Nobody Talks About

Claude Opus 4.5 scored 87% on SWE-bench with a matched multi-agent harness. Alone, it scored 74.8%. That 12-point gap isn't a model difference. It's a co-training flywheel, and it's reshaping how AI platform lock-in actually works.

AI Engineering Agent Harnesses RLHF AI Strategy

Model harness co-training flywheel post cover

May 7, 2026 14 min read

The Harness as OS: What the Most Useful Analogy in AI Gets Right (and Where It Breaks)

The model is the CPU. The context window is RAM. Tools are system calls. The harness is the kernel. This is the most useful architectural framing for understanding AI agents, and it has real limits worth knowing.

AI Agent Harnesses System Design Architecture

May 7, 2026 12 min read

The AGENTS.md and CLAUDE.md Pattern: Why Every Harness Arrived at the Same Solution

AGENTS.md and CLAUDE.md solve the same problem: persistent project context for AI coding agents. How they work mechanically, why the open standard converged, what to put in them (and leave out), and where the pattern breaks.

AI Engineering Agent Harnesses Claude Code Developer Tools

AGENTS.md and CLAUDE.md pattern post cover

May 7, 2026 16 min read

A2A Protocol: When Agents Talk to Each Other

A technical look at Google's Agent2Agent protocol: what it is, how the spec works, how it compares to MCP, the multi-agent patterns it enables, the security gaps that still exist, and whether it's becoming the industry standard for inter-agent communication.

AI Engineering Agent Harnesses Multi-Agent Systems A2A Protocol

A2A protocol agent to agent communication post cover

May 7, 2026 22 min read

The Harness Is Your Last Line of Defense

Your model can't touch the filesystem. Your harness can. A practical look at sandboxing, least-privilege tools, prompt injection defenses, path traversal, secret handling, MCP supply chain risks, and multi-agent trust boundaries, with real CVEs and working code.

AI Engineering Security Agent Harnesses Prompt Injection

May 7, 2026 16 min read

The System Prompt and How the Harness Uses It

Pi gives the model 800 tokens. Claude Code gives it 27,000. Both ship working code. A practical look at how harnesses structure system prompts, where the token budget goes, what long prompts do to model performance, and why caching is an architectural decision at scale.

AI Engineering Agent Harnesses System Prompts Claude Code

May 7, 2026 18 min read

Agent Harnesses: The Hidden Layer That Actually Runs Your AI

GPT-5.5 jumped 25 percentage points by switching harnesses. Same model, same weights. A practical look at what agent harnesses are, how Anthropic, OpenAI, and Google build theirs, and how to build or customize your own.

AI Engineering Agent Harnesses LLM System Design

April 30, 2026 12 min read

Zero-downtime deploys on a single EC2 instance (no Kubernetes, no ECS)

How we stopped taking the site down on every deploy. Blue-green switching with Docker and nginx on one EC2 instance. No Kubernetes, no ECS, no added cost.

DevOps Docker AWS GitHub Actions