Track 3 · Training & adapting models
Distillation & Compression
Big models are useful teachers, but they are not always the right thing to run in production. This course teaches how to make models smaller, faster, and cheaper without lying to yourself about the quality loss.
01
02
03
04
05
06
What is model distillation?
How a smaller student model learns from a larger teacher, and why distillation is a behavior transfer problem.
Build a teacher-student distillation dataset
How to choose prompts, teacher outputs, rationales, hard cases, and filters that make the student worth training.
Quantization: int8, int4, and GGUF
Why lower precision saves memory, where quality drops, and how formats like GGUF fit local inference.
Pruning, sparsity, and low-rank compression
The structural compression ideas: remove, factor, or skip work, then check whether hardware actually gets faster.
Evaluate compressed LLMs
How to measure the trade: task quality, calibration, latency, memory, throughput, cost, and failure modes.
Deploy small, fast models
Routing, fallback, monitoring, versioning, and when a compressed model should serve traffic.