RAG prompts, context, and citations
Once retrieval has selected evidence, the prompt has to make that evidence usable. The model needs clear boundaries, source IDs, answer rules, and permission to say when the evidence is not enough.
A RAG prompt should separate instructions, user question, and retrieved evidence. Citations should refer to source IDs your system supplied, not sources the model invents.
Put evidence in a predictable shape
Retrieved chunks should be easy for the model to distinguish from the user question and system instructions. Give each chunk a short source ID, title, date or version when useful, and text. Keep the format boring and consistent.
Question:
What is the refund window for annual plans?
Sources:
[S1] Billing policy, updated 2026-04-12
Annual plans can be refunded within 14 days of purchase...
[S2] Enterprise terms, updated 2026-05-01
Enterprise contracts use the refund terms in the signed order form...
The exact format can vary, but the principle stays the same: the model should know what is evidence and how to cite it.
Ask for grounded answers
RAG prompts should tell the model to answer from the supplied sources, cite the source IDs it used, and avoid unsupported claims. The important part is not the wording. The important part is that the model is constrained to the evidence you actually retrieved.
For high-stakes flows, add an abstention path: "If the sources do not contain enough information, say that the answer is not available from the provided sources." That is a product decision, not just a prompt trick. The UI and workflow must accept that answer.
Handle conflicting sources
Real document sets contain conflicts: old pages, regional policies, draft docs, duplicated help center articles, and partial migrations. If the model sees conflicting chunks without guidance, it may blend them into a fake compromise.
Use metadata and ranking to avoid conflicts before prompting. When conflicts remain, make the model surface the conflict instead of hiding it. A useful answer might say, "The current billing policy says 14 days, but the enterprise terms say the signed order form controls for enterprise contracts."
Citations need validation
Do not let the model create arbitrary citations. Give it source IDs and require citations from that set. After generation, parse the cited IDs and check that they exist in the retrieved context. If you need stronger guarantees, check whether the cited chunk actually supports the sentence.
A citation is not a decoration. It is a link in the chain of evidence. If users cannot inspect it, or if it points to a source that was never retrieved, it does not count.
Context budgeting
Every retrieved chunk spends tokens. So do instructions, conversation history, tool outputs, and the model's answer. A RAG system should decide what enters context, in what order, and how much room to leave for the answer.
Common approaches include top-k chunks, score thresholds, deduplication, section compression, and source diversity rules. The right choice depends on whether the task needs one exact fact, several supporting facts, or a synthesis across documents.
Long context can hide retrieval bugs. If you stuff in twenty chunks, the answer may work in demos but become slow, expensive, and brittle. Treat context as a budget, not a dump truck.
Checkpoint
You're ready for the next lesson if you can answer these from memory:
- Why should retrieved chunks have source IDs?
- What does an abstention path do?
- How should a RAG system handle conflicting sources?
- Why should citations be validated after generation?
Quick check
- Let the model generate URLs from memory
- Require citations only to retrieved source IDs and validate them
- Ask for at least five citations every time
- Because sometimes the retrieved evidence cannot support an answer
- Because it makes vector search faster
- Because every RAG question should be refused