Part VIII — Probability - Mathematics for Programmers

Chapters 241–270¶

What This Part Covers and Why It Matters¶

Determinism is a convenient fiction. Every real system — a neural network’s training data, a sensor reading, a user’s click — is entangled with noise, incompleteness, and uncertainty. Probability is the mathematical language for reasoning precisely under those conditions.

This Part builds probability from scratch: sample spaces and events, through distributions and expectations, through Bayes’ theorem and Markov chains, to Monte Carlo simulation. By the end, you will be able to:

Model uncertain systems exactly
Derive and compute probability distributions
Apply Bayes’ theorem to update beliefs with evidence
Simulate stochastic processes from first principles
Understand what every ML loss function is actually computing

The Mental Shift Required¶

In Parts I–VII, variables had values. x = 3.7. A function returned a number. A gradient pointed somewhere specific.

In Part VIII, variables have distributions. A random variable X does not equal 3.7 — it equals 3.7 with some probability, 4.1 with another probability, and everything in between with varying density. The shift is from what is the value? to what is the distribution of possible values, and how is probability mass spread across them?

This is not vagueness. It is precision about uncertainty.

The second shift: expectation replaces the single answer. Instead of computing one result, you compute the average over all possible results — weighted by their likelihood. This is the foundation of loss functions, risk, and Bayesian inference.

Map of This Part¶

FOUNDATIONS
  ch241 Randomness
  ch242 Sample Spaces
  ch243 Events
  ch244 Probability Rules
        |
        v
CONDITIONAL REASONING
  ch245 Conditional Probability
  ch246 Bayes Theorem
        |
        v
RANDOM VARIABLES
  ch247 Random Variables
  ch248 Probability Distributions
  ch249 Expected Value
  ch250 Variance
        |
        v
KEY DISTRIBUTIONS
  ch251 Binomial Distribution
  ch252 Poisson Distribution
  ch253 Normal Distribution
        |
        v
LIMIT THEOREMS
  ch254 Central Limit Theorem
  ch255 Law of Large Numbers
        |
        v
SIMULATION & PROCESSES
  ch256 Monte Carlo Methods
  ch257 Markov Chains
  ch258 Random Walks
  ch259 Simulation Techniques
        |
        v
PROJECT
  ch260 Project: Monte Carlo π
        |
        v
PROBABILITY EXPERIMENTS (261-270)
  ch261–ch270 Advanced probability experiments and simulations

Prerequisites From Prior Parts¶

Functions and composition (Part III, ch051–ch090): probability distributions are functions; CDFs are composed from PDFs.
Integration (Part VII, ch221–ch224): continuous probability requires integrating density functions.
Summation and series (Part II, ch021–ch050): discrete probability is summation over outcomes.
Logarithms (Part II, ch043–ch045): entropy, log-likelihood, and information theory are all logarithmic.
Linear algebra (Part VI, ch151–ch200): Markov chains are matrix powers; covariance matrices generalize variance.
Exponential functions (Part II, ch041–ch042): the Poisson and normal distributions are exponential in form.

Motivating Problem: The Spam Filter¶

You have a corpus of 10,000 emails — 3,000 spam, 7,000 legitimate. A new email arrives containing the word “offer”. You know:

40% of spam emails contain “offer”
5% of legitimate emails contain “offer”

What is the probability the email is spam, given it contains “offer”?

You cannot answer this yet with the tools from prior parts. Work through the cell below — it raises a NotImplementedError. By the end of ch246 (Bayes Theorem), you will replace that stub with a derivation from first principles.

# Motivating Problem: Spam classification via Bayes
# This cell is intentionally incomplete. You will solve this in ch246.

def probability_spam_given_offer(
    p_spam: float,        # prior probability of spam
    p_offer_given_spam: float,   # likelihood of "offer" in spam
    p_offer_given_legit: float,  # likelihood of "offer" in legitimate
) -> float:
    """
    Compute P(spam | contains 'offer') using Bayes' theorem.
    Derivation required — do not use a formula you cannot prove.
    """
    raise NotImplementedError("Requires Bayes' theorem — see ch246")


# Known values
p_spam = 3000 / 10000          # 30% of emails are spam
p_offer_given_spam = 0.40      # 40% of spam contains 'offer'
p_offer_given_legit = 0.05     # 5% of legit contains 'offer'

try:
    result = probability_spam_given_offer(p_spam, p_offer_given_spam, p_offer_given_legit)
    print(f"P(spam | 'offer') = {result:.4f}")
except NotImplementedError as e:
    print(f"Not yet implemented: {e}")
    print("\nCome back here after ch246 and fill in the derivation.")

The answer is approximately 0.774 — even though only 30% of emails are spam, the word “offer” is so much more common in spam that observing it shifts the probability to 77.4%. This is Bayesian reasoning: updating a prior belief with evidence.

Every probabilistic classifier, every Bayesian neural network, every anomaly detector operates on exactly this logic. Part VIII gives you the machinery to build and reason about all of them.

Begin with ch241 — Randomness.