Lesson 05

Avoid catastrophic forgetting

A fine-tune can make a model excellent at your narrow task and worse at everything around it. That tradeoff is not theoretical. It is one of the easiest ways to ship a model that looks improved in demos and regresses in production.

The one idea

Catastrophic forgetting happens when training on a narrow distribution overwrites or suppresses useful behavior from the base model. You reduce it with smaller updates, mixed data, early stopping, and regression evals.

Why forgetting happens

The model does not know which abilities are sacred. During fine-tuning, the optimizer only sees your training objective. If your dataset is narrow, repetitive, or overtrained, the model is pushed hard toward that narrow pattern. General instruction-following, refusal behavior, tone control, and broad knowledge can all drift.

Adapters reduce the blast radius because the base weights stay frozen, but they do not eliminate forgetting-like behavior. The active adapter can still steer outputs so strongly that the combined model behaves worse outside the target task.

What forgetting looks like

Forgetting rarely announces itself as "I forgot." It shows up as changed behavior:

  • The model overuses the fine-tuning format even when the task changes.
  • It becomes less helpful on general questions near the domain.
  • It refuses too often or not often enough because the training set had skewed policy examples.
  • It collapses labels toward the majority class.
  • It imitates the tone or mistakes of the training data too aggressively.

If your tune only has a task eval, you may miss these regressions. Add neighboring tasks and negative cases to the eval suite.

Use mixed data to anchor behavior

One common defense is mixing in examples that preserve behavior you care about. If the tuned model must still refuse unsafe requests, include refusal and safe-completion examples. If it must still answer normal support questions, include a slice of normal support traffic. If it must only use the new format for a specific endpoint, include counterexamples where the old behavior remains correct.

Mixed data is not padding. It is a statement of what should not change. The trick is balance: too little and the anchor does nothing, too much and the task-specific signal gets diluted.

Practical pattern

Build eval sets in pairs: target evals that should improve and guardrail evals that should stay flat. A tune that wins the first and loses the second is not ready.

Train less than feels satisfying

Forgetting gets worse when the update is too large. High learning rates, too many epochs, and tiny repetitive datasets all push the model to over-specialize. Early stopping is not a nice-to-have. It is one of the main controls for keeping the model useful.

This is another reason to evaluate checkpoints. The best checkpoint may be the one before the task metric peaks, if the next checkpoint starts breaking guardrails. Production quality is a multi-objective score, not a single leaderboard number.

Prefer narrow routing when possible

If a fine-tuned model is only better for one workflow, route only that workflow to it. Keep the base model or another tuned model for everything else. This avoids asking one artifact to be both a specialist and a generalist when your product can make the routing decision more reliably.

Routing also makes rollback easier. If the tuned extraction model regresses, disable that route without changing the entire assistant. Fine-tuning should make the system more capable, not harder to operate.

Engineering reality

Forgetting is a system problem. The fix is not only "train better." It is eval coverage, routing, versioning, rollback, and knowing which product surfaces are allowed to use the tuned model.

Checkpoint

You're ready for the next lesson if you can answer these from memory:

  • Why does a narrow dataset cause forgetting?
  • How can adapters still produce forgetting-like behavior?
  • What is the role of mixed data?
  • Why can routing be safer than sending all traffic to the tuned model?

Quick check

  • Over-specialization or forgetting-like drift
  • It needs more epochs
  • The validation set is too large
  • Only the target task improvement
  • Important behaviors that must remain stable after tuning
  • How fast the training job runs