Learn/Preference Tuning & RL
Track 3 · Training & adapting models

Preference Tuning & RL

Supervised fine-tuning teaches a model what good answers look like. Preference tuning teaches it which answer is better when several answers are plausible. This course covers RLHF, DPO, GRPO, reward models, and the eval work needed before you trust the result.

6 lessons Intermediate After Fine-tuning