Lesson 01

What machine learning actually is

Before any model, any GPU, any clever architecture, there is one move that makes all of it work. Once you see it clearly, the rest of AI stops being magic and starts being engineering.

The one idea

Normal programming: you write the rules, the computer follows them. Machine learning flips it: you show the computer examples of the right answers, and it works out the rules for you.

Two ways to make a computer do something

Say you want to tell spam from real email. The classical approach is to sit down and write the rules yourself: if the subject shouts in all caps, add points. If it mentions a lottery, add more. If it comes from a known contact, subtract some. You keep adding rules forever, and spammers keep finding ways around them. The logic lives in your head, and you transcribe it by hand.

Machine learning takes a different path. You don't write a single spam rule. Instead you collect a pile of emails already labeled "spam" or "not spam", hand them to an algorithm, and let it find the patterns that separate the two. The output is not a program you wrote. It's a program the computer wrote, by example.

Classical programming Rules Data Computer Answers Machine learning Data Answers Training Rules
Same pieces, swapped around. Classical code turns rules into answers. Machine learning turns answers into rules.

That swap is the whole thing. In classical programming, rules go in and answers come out. In machine learning, answers go in and rules come out. The "rules" it produces are called a model.

So what does "learning" mean here?

It sounds human, but it's narrower and more mechanical than the word suggests. Learning here means fitting a function to examples. You can picture the simplest version on a graph: you have a scatter of points, and you want a line that passes as close to them as possible. "Learning" is the process of nudging that line until it fits the points well. A real model is the same idea with millions of knobs instead of a line's two, and inputs far richer than a single number, but the move is identical: adjust the knobs until the output matches the known answers.

Nothing in there is understanding. The model does not know what spam is. It has found a shape that separates your labeled examples, and it bets that new email will fall on the same side of that shape. That bet is the entire product.

Plain version

A model is a very flexible shape squeezed to fit examples you've already seen, then used to guess about examples you haven't.

The three ingredients

Every machine learning system, from a spam filter to a voice model, is built from the same three parts. When something goes wrong, it's almost always one of these, so it's worth naming them now.

  • Data: the labeled examples. Inputs paired with the right answers. This is the raw material, and its quality sets the ceiling on everything else.
  • Model: the flexible shape with adjustable knobs (its parameters). A bigger model has more knobs and can fit more complicated patterns, up to a point.
  • Objective: a number that measures how wrong the model currently is, usually called the loss. Learning is the act of changing the knobs to make that number small.

Lessons 04 and 05 open up the model and the objective. For now, just hold the shape of it: data feeds a model, an objective tells the model how badly it's doing, and training pushes the knobs in the direction that helps.

Where this breaks (the part tutorials skip)

The bet only pays off if new data looks like the training data. This is where most real systems fail, and it's worth seeing the failure modes before you ever touch code.

Engineering reality

Three things sink machine learning systems in production, and none of them are about the model being "smart" enough:

Garbage examples. If your labels are wrong or biased, the model faithfully learns the wrong thing. No architecture fixes bad data. This is why teams spend more time on data than on models.

Memorizing instead of generalizing. A model with enough knobs can ace the examples it trained on and still fail on anything new, like a student who memorized the answer key instead of the subject. This is overfitting, and you only catch it by testing on data the model never saw.

The world drifts. Spam from last year doesn't look like spam today. A model is a snapshot of the data it learned from, so its accuracy quietly decays as reality moves on. Shipping a model is the start of maintaining it, not the end.

Keep these in mind as a permanent backdrop. Most of the engineering in AI, the evaluation, the monitoring, the data work, exists because of these three problems, not because the math is hard.

Go deeper (optional)

You don't need this to continue, and the rest of the course doesn't depend on it. But if you want the visual, geometric intuition for how a network bends a shape to fit data, one short series does it better than anything written. We'll point you to it rather than badly redraw it.

This is the rule for the whole course: when a resource is genuinely the best on a topic and would take months to match, we send you there and add the parts it leaves out, like the production realities above. Most lessons are written here from scratch. A few, like the math intuition for neural nets, are not worth reinventing.

Checkpoint

You're ready for the next lesson if you can answer these from memory:

  • In one sentence, how is machine learning different from normal programming?
  • What does "learning" actually mean in this context?
  • Name the three ingredients of any machine learning system and what each one does.
  • What is overfitting, and why can't you spot it by looking only at training data?
  • Why does a deployed model get worse over time even if nothing about it changes?

Quick check

  • In ML you supply examples and answers, and the computer derives the rules
  • Machine learning is just classical programming that runs faster
  • Machine learning always needs a GPU and classical code never does
  • It needs more parameters
  • It overfit: it memorized the training set instead of learning the pattern
  • The objective function is too small
  • The data and its labels
  • The size of the model
  • The objective function