Learn AI engineering,
in the right order.

Everything about AI is already online, scattered across a thousand posts, papers, and videos. What's missing is one trustworthy path: what to learn, in what order, explained clearly, with the engineering reality nobody puts in the tutorials. That's what this is.

Built bottom-up: how models work, how to adapt them, how to serve them fast and cheap, and how to build products on top. Audio and voice is one track of several. Most lessons are written here from scratch; a few landmark topics point you to the one resource worth watching. New courses are landing over time, the full map is below.

1

Foundations

The mental models the rest of the path assumes. Start here even if you've shipped with AI APIs already.

2

Building with models

Ship real things on top of models you didn't train. No GPUs required.

Prompting & Context Engineering

Soon

Steering a model for real, structured output, and treating context as a budget.

Coming soon

Retrieval-Augmented Generation (RAG)

Live

Retrieval pipelines, chunking, embeddings, vector search, reranking, citations, and where RAG quietly breaks.

7 lessons

Agents, Tools & Harnesses

Soon

Tool use, the agent loop, what a harness does, and multi-agent patterns.

Coming soon

Evaluation & Observability

Soon

How to know an AI feature works: task evals, LLM-as-judge, tracing, regression testing.

Coming soon

Safety, Guardrails & Security

Soon

Prompt injection, sandboxing, least-privilege tools, and data leakage.

Coming soon
3

Training & adapting models

Make a model yours: teach it your domain, your language, your taste, or shrink it.

4

Inference & serving

Make it fast and make it cheap. Where most production AI cost and latency actually lives.

How Inference Works

Live

Prefill vs decode, the KV cache, memory bandwidth, context length, output tokens, and the numbers that explain latency.

6 lessons

Latency & Throughput

Soon

Batching, continuous batching, speculative decoding, and streaming.

Coming soon

Serving & Economics

Soon

vLLM/TGI/llama.cpp, GPU cost per token, self-host vs API break-even, edge.

Coming soon
5

Audio & voice

The full stack behind Whisper, ElevenLabs, and Sarvam, taught from the waveform up.

Audio Foundations

Soon

Sample rate, bit depth, channels, PCM, codecs, then FFT, spectrograms, and mel features.

Coming soon

Voice Activity Detection

Soon

Finding speech in audio: energy baselines, hysteresis, neural VAD, streaming.

Coming soon

Speech-to-Text (ASR)

Soon

CTC vs seq2seq vs transducers, Whisper-style models, streaming, WER, API vs self-host.

Coming soon

Text-to-Speech & Voice Cloning

Soon

Phonemes, prosody, vocoders, cloning and its ethics, streaming TTS, measuring naturalness.

Coming soon

Realtime Voice Agents

Soon

The full duplex loop, turn-taking, barge-in, and the end-to-end latency budget.

Coming soon
+

Later tracks

Mapped, not yet started.

Vision & Multimodal

Planned

Image models, OCR, vision-language models, and text + audio + vision together.

Planned

AI Product & Ops

Planned

Scoping use cases, pricing your own AI feature, deployment, monitoring, cost control.

Planned