Chapters 241–270¶
What This Part Covers and Why It Matters¶
Determinism is a convenient fiction. Every real system — a neural network’s training data, a sensor reading, a user’s click — is entangled with noise, incompleteness, and uncertainty. Probability is the mathematical language for reasoning precisely under those conditions.
This Part builds probability from scratch: sample spaces and events, through distributions and expectations, through Bayes’ theorem and Markov chains, to Monte Carlo simulation. By the end, you will be able to:
Model uncertain systems exactly
Derive and compute probability distributions
Apply Bayes’ theorem to update beliefs with evidence
Simulate stochastic processes from first principles
Understand what every ML loss function is actually computing
The Mental Shift Required¶
In Parts I–VII, variables had values. x = 3.7. A function returned a number. A gradient pointed somewhere specific.
In Part VIII, variables have distributions. A random variable X does not equal 3.7 — it equals 3.7 with some probability, 4.1 with another probability, and everything in between with varying density. The shift is from what is the value? to what is the distribution of possible values, and how is probability mass spread across them?
This is not vagueness. It is precision about uncertainty.
The second shift: expectation replaces the single answer. Instead of computing one result, you compute the average over all possible results — weighted by their likelihood. This is the foundation of loss functions, risk, and Bayesian inference.
Map of This Part¶
FOUNDATIONS
ch241 Randomness
ch242 Sample Spaces
ch243 Events
ch244 Probability Rules
|
v
CONDITIONAL REASONING
ch245 Conditional Probability
ch246 Bayes Theorem
|
v
RANDOM VARIABLES
ch247 Random Variables
ch248 Probability Distributions
ch249 Expected Value
ch250 Variance
|
v
KEY DISTRIBUTIONS
ch251 Binomial Distribution
ch252 Poisson Distribution
ch253 Normal Distribution
|
v
LIMIT THEOREMS
ch254 Central Limit Theorem
ch255 Law of Large Numbers
|
v
SIMULATION & PROCESSES
ch256 Monte Carlo Methods
ch257 Markov Chains
ch258 Random Walks
ch259 Simulation Techniques
|
v
PROJECT
ch260 Project: Monte Carlo π
|
v
PROBABILITY EXPERIMENTS (261-270)
ch261–ch270 Advanced probability experiments and simulationsPrerequisites From Prior Parts¶
Functions and composition (Part III, ch051–ch090): probability distributions are functions; CDFs are composed from PDFs.
Integration (Part VII, ch221–ch224): continuous probability requires integrating density functions.
Summation and series (Part II, ch021–ch050): discrete probability is summation over outcomes.
Logarithms (Part II, ch043–ch045): entropy, log-likelihood, and information theory are all logarithmic.
Linear algebra (Part VI, ch151–ch200): Markov chains are matrix powers; covariance matrices generalize variance.
Exponential functions (Part II, ch041–ch042): the Poisson and normal distributions are exponential in form.
Motivating Problem: The Spam Filter¶
You have a corpus of 10,000 emails — 3,000 spam, 7,000 legitimate. A new email arrives containing the word “offer”. You know:
40% of spam emails contain “offer”
5% of legitimate emails contain “offer”
What is the probability the email is spam, given it contains “offer”?
You cannot answer this yet with the tools from prior parts. Work through the cell below — it raises a NotImplementedError. By the end of ch246 (Bayes Theorem), you will replace that stub with a derivation from first principles.
# Motivating Problem: Spam classification via Bayes
# This cell is intentionally incomplete. You will solve this in ch246.
def probability_spam_given_offer(
p_spam: float, # prior probability of spam
p_offer_given_spam: float, # likelihood of "offer" in spam
p_offer_given_legit: float, # likelihood of "offer" in legitimate
) -> float:
"""
Compute P(spam | contains 'offer') using Bayes' theorem.
Derivation required — do not use a formula you cannot prove.
"""
raise NotImplementedError("Requires Bayes' theorem — see ch246")
# Known values
p_spam = 3000 / 10000 # 30% of emails are spam
p_offer_given_spam = 0.40 # 40% of spam contains 'offer'
p_offer_given_legit = 0.05 # 5% of legit contains 'offer'
try:
result = probability_spam_given_offer(p_spam, p_offer_given_spam, p_offer_given_legit)
print(f"P(spam | 'offer') = {result:.4f}")
except NotImplementedError as e:
print(f"Not yet implemented: {e}")
print("\nCome back here after ch246 and fill in the derivation.")The answer is approximately 0.774 — even though only 30% of emails are spam, the word “offer” is so much more common in spam that observing it shifts the probability to 77.4%. This is Bayesian reasoning: updating a prior belief with evidence.
Every probabilistic classifier, every Bayesian neural network, every anomaly detector operates on exactly this logic. Part VIII gives you the machinery to build and reason about all of them.
Begin with ch241 — Randomness.