Learn AI engineering,
in the right order.
Everything about AI is already online, scattered across a thousand posts, papers, and videos. What's missing is one trustworthy path: what to learn, in what order, explained clearly, with the engineering reality nobody puts in the tutorials. That's what this is.
Built bottom-up: how models work, how to adapt them, how to serve them fast and cheap, and how to build products on top. Audio and voice is one track of several. Most lessons are written here from scratch; a few landmark topics point you to the one resource worth watching. New courses are landing over time, the full map is below.
Foundations
The mental models the rest of the path assumes. Start here even if you've shipped with AI APIs already.
Building with models
Ship real things on top of models you didn't train. No GPUs required.
Prompting & Context Engineering
SoonSteering a model for real, structured output, and treating context as a budget.
Coming soonRetrieval-Augmented Generation (RAG)
LiveRetrieval pipelines, chunking, embeddings, vector search, reranking, citations, and where RAG quietly breaks.
7 lessonsAgents, Tools & Harnesses
SoonTool use, the agent loop, what a harness does, and multi-agent patterns.
Coming soonEvaluation & Observability
SoonHow to know an AI feature works: task evals, LLM-as-judge, tracing, regression testing.
Coming soonSafety, Guardrails & Security
SoonPrompt injection, sandboxing, least-privilege tools, and data leakage.
Coming soonTraining & adapting models
Make a model yours: teach it your domain, your language, your taste, or shrink it.
Data
SoonDatasets, labeling, synthetic data, and why data quality dominates everything.
Coming soonFine-tuning
LiveWhen to fine-tune, dataset design, LoRA/PEFT/QLoRA, evals, catastrophic forgetting, and deployment.
6 lessonsPreference Tuning & RL
LivePreference datasets, reward models, RLHF, DPO, GRPO, evals, and when RL on open models is worth it.
6 lessonsDistillation & Compression
LiveModel distillation, teacher-student data, quantization, pruning, evals, and deployment tradeoffs.
6 lessonsInference & serving
Make it fast and make it cheap. Where most production AI cost and latency actually lives.
How Inference Works
LivePrefill vs decode, the KV cache, memory bandwidth, context length, output tokens, and the numbers that explain latency.
6 lessonsLatency & Throughput
SoonBatching, continuous batching, speculative decoding, and streaming.
Coming soonServing & Economics
SoonvLLM/TGI/llama.cpp, GPU cost per token, self-host vs API break-even, edge.
Coming soonAudio & voice
The full stack behind Whisper, ElevenLabs, and Sarvam, taught from the waveform up.
Audio Foundations
SoonSample rate, bit depth, channels, PCM, codecs, then FFT, spectrograms, and mel features.
Coming soonVoice Activity Detection
SoonFinding speech in audio: energy baselines, hysteresis, neural VAD, streaming.
Coming soonSpeech-to-Text (ASR)
SoonCTC vs seq2seq vs transducers, Whisper-style models, streaming, WER, API vs self-host.
Coming soonText-to-Speech & Voice Cloning
SoonPhonemes, prosody, vocoders, cloning and its ethics, streaming TTS, measuring naturalness.
Coming soonRealtime Voice Agents
SoonThe full duplex loop, turn-taking, barge-in, and the end-to-end latency budget.
Coming soonLater tracks
Mapped, not yet started.
Vision & Multimodal
PlannedImage models, OCR, vision-language models, and text + audio + vision together.
PlannedAI Product & Ops
PlannedScoping use cases, pricing your own AI feature, deployment, monitoring, cost control.
Planned