Learn/How Inference Works
Track 4 · Inference & serving

How Inference Works

Inference is where an LLM turns a prompt into tokens, one step at a time. This course explains the mechanics underneath the API call so latency, cost, context length, and hardware tradeoffs stop feeling mysterious.

6 lessons Intermediate After The Transformer & LLMs