Learn/Agents, Tools & Harnesses/Lesson 01

Lesson 01

Agent vs chatbot: the loop mental model

Most people use "agent" and "chatbot" interchangeably. They are not the same architecture. The difference is not intelligence. It is whether the system can take actions in the world and keep going until a task is done.

The one idea

A chatbot is a single turn (or a short back-and-forth) where the model only produces text. An agent is a loop: the model can request tools, see the results, and call the model again until the harness decides the job is finished.

What a chatbot actually does

A classic chatbot has a simple contract. You send a message. The model reads the conversation history plus your new message, predicts the next tokens, and returns text. That might be an answer, a question back to you, or a refusal. Then it waits.

Nothing in that loop touches your filesystem, your database, or your production API unless some other application code does it separately. The model proposes language. A human or a separate program decides what to do with it.

That one-shot pattern is fine for drafting, explaining, brainstorming, and many support flows where a human still clicks the final button. It is weak for jobs that require looking things up, trying something, checking the result, and adjusting.

A chatbot ends when the model finishes generating text.

What an agent adds

An agent still uses the same underlying model. The shift is architectural. The system wraps the model in a loop and gives it callable tools: read a file, run a query, send an HTTP request, edit code, search a doc index.

On each turn the model can either:

Return a final answer to the user, or
Emit a structured tool call: "run this function with these arguments."

When it chooses a tool, the harness (not the model) executes the function, captures the result, appends it to the conversation state, and calls the model again. That cycle repeats until the model says it is done or the harness hits a budget limit.

The model never directly opens a file or runs shell commands. It outputs intent. The harness turns intent into real effects and feeds reality back in.

The harness owns the loop, tool execution, and stop conditions. The model owns the decisions inside each turn.

Why the loop matters more than the label

Marketing teams call everything an "agent" now. For engineering, the useful question is narrower: does this system run an action loop with tool execution and external state, or does it just generate text?

Consider three products you might call agents:

Plain ChatGPT with browsing off: mostly a chatbot. One model call per user message unless plugins or custom tooling are added.
Claude Code or Cursor Agent: a real agent. File tools, shell, multi-step edits, session persistence, hard iteration caps.
A support bot that only retrieves FAQ chunks and answers: often RAG plus a single generation step. Useful, but not necessarily an agent unless it can act (create tickets, refund orders, run workflows).

The loop is what changes reliability, cost, and risk. A chatbot that hallucinates produces a bad paragraph. An agent that hallucinates might call the wrong API, delete the wrong file, or loop for hours burning tokens.

Skip the full agent loop when the task is bounded, the stakes are low, and a human will review every output anyway. Drafting email copy, summarizing a doc, classifying support intent: a single model call or a short chain is usually enough.

Reach for an agent when the job needs multiple dependent steps, fresh data from tools, or side effects in your systems. "Fix this failing test," "investigate this outage," "onboard this repo": those need look, act, observe, repeat.

Workflow vs agent: a decision tree

Not every multi-step AI feature needs a model-in-the-loop at every step. Ask in order:

Is the path fixed? If steps are always A → B → C with known APIs, use a deterministic workflow (Temporal, Inngest, plain code). No model loop required.
Does the model only draft text a human acts on? That is chat or RAG, not an agent.
Does the task need branching based on tool results? If yes, you likely need an agent loop or a workflow with explicit decision nodes, not a single completion.
Are side effects irreversible? Agent or workflow both work; the harness must gate writes either way.

Hybrid shapes are common: a workflow orchestrates the macro job; an agent handles one messy sub-step (debug this test) inside a bounded sandbox. The mistake is defaulting to "agent" when a cron plus three API calls would ship faster.

Agents earn their complexity when steps depend on observations you cannot hard-code upfront.

Landmark guide

Anthropic — Building effective agents

Anthropic research · 2024

The clearest vendor-agnostic framing for when to build agents vs workflows, how to compose simple patterns, and why more autonomy is not free. Read it after this lesson and before L07.

Take from it: Workflow vs agent distinction, augmenting LLMs with retrieval and tools without over-building, and practical patterns (routing, parallelization, orchestrator-workers) with sober cost notes.

It skips: MCP permission models, harness observability schemas, compaction mechanics, and dollar math on runaway loops. Those are what lessons 02–06 in this course cover.

Model, harness, agent: three layers

Before going deeper, keep three terms separate. They get mixed up constantly.

Model: the neural network that reasons and generates tokens. It has no hands.
Harness: everything around the model that makes action possible: the loop, tool registry, permissions, memory, logging, compaction, stop rules.
Agent: the full product experience: model plus harness, doing a task end to end.

When someone says "Claude is better at coding than GPT," they are usually comparing agents (Claude Code vs Codex), not raw model weights. A meaningful chunk of the gap is harness design, not IQ.

Benchmarks make this worse. SWE-bench scores measure an agent configuration: model, tools, prompts, and orchestration together. Changing only the harness around the same model can swing results by double-digit percentage points. We unpack that later in this course.

The mental model to keep

Think of the model as a planner sitting in a room with a window. It can read what you slide under the door (context) and write instructions on a notepad (tool calls and answers). It cannot leave the room.

The harness is the staff outside: it runs the instructions, brings back results, decides which tools exist, enforces budgets, and ends the session when limits are hit. The agent is the whole operation: planner plus staff plus tools plus rules.

Once you see it that way, debugging gets easier. Bad final answers might be model limits. Repeated tool errors are often schema or description problems. Runaway cost is almost always missing harness guardrails, not "the model went rogue."

A concrete example: fix a failing test

Compare the same task as chatbot vs agent.

Chatbot: You paste the error, paste the test file, paste the implementation. The model suggests a fix in text. You copy it, run tests, paste new errors back. You are the harness. You are the loop.

Agent: You say "fix the failing test in auth.test.ts." The harness loop runs: read file, read test output, search repo, edit file, run tests, read failures, edit again, run tests, report pass. You might approve writes along the way, but you are not manually ferrying every observation.

Same model family could power both experiences. The product shape differs. That is why "we added GPT-4" is not the same as "we shipped an agent."

Cost and latency change shape

A chat turn is usually one model call. An agent task is a distribution: maybe three calls, maybe thirty. Each call adds input tokens (growing history) and output tokens (plans, tool JSON).

Product implications:

Users need progress signals during long runs.
You need per-session budgets before finance notices.
P95 latency matters more than mean latency.
Caching static system prompts and tool schemas saves real money.

None of that appears in a chatbot mental model. It is harness economics.

Autonomy is a dial, not a switch

Real products sit on a spectrum:

Suggest only: model proposes commands; human runs them.
Approve each action: harness queues tool calls for confirmation.
Auto-run reads, approve writes: common in coding agents.
Full auto within sandbox: CI bots, batch refactors with rollback.

The harness implements the dial. The model does not "know" your risk tolerance unless you enforce it in code and surface it in the system prompt.

Questions to ask before calling it an agent

When reviewing a design or vendor pitch, ask:

Can it call tools without the user copy-pasting results back?
Who enforces iteration and spend limits?
What happens after a tool error?
Can I replay a session from logs?
Are write actions reversible or approval-gated?

If the answers are vague, you likely have a chatbot with extra UI, not an agent stack ready for production load.

How this course is ordered

Next lessons go deeper on tools, harness ownership, the loop phases, long-run state, failures, and multi-agent patterns. Each lesson assumes you can separate model output from harness execution. That separation is the skill this opening lesson installs.

If you have only used chat UIs so far, run one coding agent on a real repo task and watch the tool trace. The trace is the harness talking. Reading it once teaches more than a dozen architecture diagrams.

Keep that trace open while you read the rest of this course. It grounds every abstract term in something you can point at.

Engineering reality

Agent loops multiply cost and latency. A chat turn might be one 2-second API call. An agent task might be twenty calls across three minutes. Product and infra teams feel that immediately in token bills and user patience. Design the loop deliberately; do not bolt tools onto a chat UI and call it autonomy.

Checkpoint

You're ready for the next lesson if you can answer these from memory:

What is the structural difference between a chatbot turn and an agent loop?
Who executes a tool call: the model or the harness?
Why do agent failures tend to be more expensive than chatbot failures?
What is the difference between a model, a harness, and an agent?

Quick check

It uses a larger or newer model
It runs a loop where the model can request tools and see results until the task ends
It streams tokens to the user
It remembers previous conversations

The model, directly
The harness, after validating the model's tool request
The user, always

Raw model weights only
Whole agent configurations, including harness design
Editor color themes

Yes, because it uses retrieval
Not necessarily. It may be RAG or chat unless it can take actions via tools
Yes, any bot is an agent