Track 1 · Foundations
The Transformer & LLMs
The mental model behind modern language models: how text becomes tokens, how attention moves information between them, how a transformer block is built, and why generation is both powerful and fallible.
01
02
03
04
05
06
Tokenization: how text becomes tokens
Why an LLM does not read words directly, how subword tokens work, and why token boundaries affect cost, speed, and behavior.
Attention: how tokens look at each other
Queries, keys, values, and attention weights without drowning in matrix notation. The core trick that made transformers work.
Transformer architecture: the LLM block
The repeating block inside an LLM: embeddings, positional information, attention, MLPs, residual paths, and layer norm.
How LLMs generate text
Next-token prediction, logits, probabilities, sampling, temperature, and why generation is a loop instead of a one-shot answer.
Context windows: prompt, memory, and limits
What actually fits in context, why the model does not remember outside it, and how long context changes cost and latency.
Why LLMs hallucinate
Why fluent text is not the same as truth, where hallucinations come from, and what engineering patterns reduce them.