Lesson 06

Why LLMs hallucinate

LLMs are trained to produce likely text, not to maintain a database of verified claims. That difference explains a lot: confident wrong answers, fake citations, plausible APIs, and facts that drift under pressure.

The one idea

A hallucination is a fluent answer that is not grounded in reality or in the provided evidence. It happens because next-token prediction rewards plausible continuation, not truth by itself.

Fluency is not truth

An LLM can write a sentence that sounds exactly like a correct answer while being wrong. That is not a bug in string formatting. It follows from the training setup. The model learned patterns in text and learned to continue a context in ways that fit those patterns. Truth helps when truthful text was the best pattern to learn, but the training objective is not "verify this claim against the world."

This is why hallucinations often look polished. The model is good at style, structure, and local coherence. It can produce the shape of an answer even when it lacks the facts.

Where hallucinations come from

Several failure modes get grouped under the word hallucination:

Missing knowledge. The model was never trained on the fact, or the fact changed after training.
Weak retrieval. The right evidence was not put into the context, or irrelevant evidence crowded it out.
Conflicting context. The prompt contains multiple claims, and the model blends them into a wrong answer.
Over-specific pressure. The prompt asks for names, citations, commands, or numbers, so the model fills the requested shape even without evidence.
Sampling variance. A higher-randomness decode can pick a plausible but false path.

The common thread is grounding. If the answer cannot be tied back to reliable evidence or a tool result, fluency is doing too much work.

Why fake citations feel so natural

A citation has a recognizable pattern: author names, title casing, a venue, a year, maybe a URL. The model can learn that pattern easily. But producing a valid citation requires a different ability: checking that the source exists and says what the answer claims. Unless the source is in context or a tool retrieved it, the model may produce a citation-shaped string instead of a citation.

The same thing happens with package names, CLI flags, legal clauses, API methods, and paper titles. The model knows the shape. Shape is not enough.

How to reduce hallucinations

You cannot remove hallucinations with one magic prompt. You reduce them with system design:

Ground the answer. Put the relevant source text, data, or tool result in context.
Force citations to retrieved sources. Let the model cite only document IDs or URLs that your retriever actually returned.
Separate reasoning from lookup. Use tools for fresh facts, math, database reads, and code execution. Use the model to interpret results.
Allow abstention. Make "I don't have enough information" a valid output for the product.
Validate structured outputs. Parse JSON, check schemas, run commands in a sandbox, and verify links or IDs before showing them.
Evaluate on real tasks. Track hallucination patterns with tests, not vibes.

Engineering reality

The fix is usually outside the model call. Retrieval quality, source ranking, tool permissions, output validation, logging, and evals decide whether hallucinations become user-visible failures. A better prompt helps, but it is not a substitute for grounding and verification.

Hallucination is task-dependent

In creative writing, inventing details may be the point. In a medical, financial, legal, deployment, or customer-support setting, invented details are failures. So do not ask "does this model hallucinate?" Ask "does this system hallucinate on this task, under these inputs, at an acceptable rate?"

That framing turns a vague complaint into an engineering problem. You can build a dataset of risky prompts, define what counts as unsupported, run regression tests, and improve the pipeline.

The right mental model

An LLM is a powerful language and pattern engine with learned knowledge, not a truth oracle. It can reason over context, summarize evidence, write code, transform formats, and explain ideas. But when the task depends on exact, current, or auditable facts, give it evidence and check the output.

That is the bridge into the next tracks: prompting, RAG, tools, agents, evaluation, and observability. The transformer explains the raw capability. The product layer decides whether that capability becomes reliable.

Checkpoint

You're done with this course if you can answer these from memory:

Why does next-token prediction produce fluent text without guaranteeing truth?
What is the difference between a missing-knowledge hallucination and a weak-retrieval hallucination?
Why are fake citations such a natural LLM failure mode?
What system-level controls reduce hallucination risk?

Quick check

Because plausible continuation and verified truth are not the same objective
Because tokenizers cannot represent facts
Only because temperature is always too high

Ask the model to be very careful
Restrict citations to retrieved or verified sources and validate them
Require at least five citations per paragraph

Only in the prompt wording
Across the retrieval, tool, validation, and evaluation layers
Only in the UI copy