Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Part VIII: Probability | Computational Mathematics for Programmers


1. What Randomness Actually Is

“Random” is not a synonym for “unknown” or “unpredictable by humans.” In mathematics, a process is random if its outcomes cannot be determined in advance — not because we lack information, but because the outcome genuinely varies across repeated identical setups.

There are two positions on randomness:

Frequentist: Randomness is a property of a repeatable physical process. Flip a fair coin infinitely many times — heads appears in exactly 50% of them. Probability is a limiting frequency.

Bayesian: Randomness is a description of epistemic state — our uncertainty about what will happen. Probability quantifies degree of belief, not physical frequency.

Both views produce the same mathematics. The distinction matters when interpreting results. This book uses both where appropriate.


2. Pseudorandomness in Computing

Computers are deterministic. They cannot produce true randomness. What they produce is pseudorandomness: a sequence that passes statistical tests for randomness but is generated by a deterministic algorithm.

The algorithm is called a Pseudorandom Number Generator (PRNG). Given an identical seed, it produces an identical sequence.

import numpy as np
import matplotlib.pyplot as plt

# Same seed => same sequence, every time
rng = np.random.default_rng(seed=42)
sequence_a = rng.integers(1, 7, size=10)  # ten rolls of a d6

rng2 = np.random.default_rng(seed=42)
sequence_b = rng2.integers(1, 7, size=10)

print("Sequence A:", sequence_a)
print("Sequence B:", sequence_b)
print("Identical:", np.array_equal(sequence_a, sequence_b))
Sequence A: [1 5 4 3 3 6 1 5 2 1]
Sequence B: [1 5 4 3 3 6 1 5 2 1]
Identical: True

Always use numpy.random.default_rng() rather than the legacy numpy.random.seed() interface. The new API produces a Generator object with better statistical properties (PCG64 algorithm) and explicit state management.

# Empirical demonstration: does 'random' actually spread uniformly?
rng = np.random.default_rng(seed=0)
n = 100_000
rolls = rng.integers(1, 7, size=n)  # 1 through 6 inclusive

values, counts = np.unique(rolls, return_counts=True)
frequencies = counts / n

fig, ax = plt.subplots(figsize=(7, 4))
ax.bar(values, frequencies, color='steelblue', edgecolor='white', linewidth=0.5)
ax.axhline(1/6, color='tomato', linestyle='--', label='Expected 1/6')
ax.set_xlabel('Die face')
ax.set_ylabel('Observed frequency')
ax.set_title(f'Empirical frequency of d6 rolls (n={n:,})')
ax.legend()
plt.tight_layout()
plt.show()

print("\nObserved frequencies:")
for v, f in zip(values, frequencies):
    print(f"  Face {v}: {f:.4f}  (deviation from 1/6: {abs(f - 1/6):.4f})")
<Figure size 700x400 with 1 Axes>

Observed frequencies:
  Face 1: 0.1672  (deviation from 1/6: 0.0005)
  Face 2: 0.1653  (deviation from 1/6: 0.0014)
  Face 3: 0.1680  (deviation from 1/6: 0.0013)
  Face 4: 0.1669  (deviation from 1/6: 0.0003)
  Face 5: 0.1673  (deviation from 1/6: 0.0006)
  Face 6: 0.1654  (deviation from 1/6: 0.0013)

3. Sources of Randomness

SourceTypeUse case
PRNG (PCG64, Mersenne Twister)PseudorandomSimulations, ML training
OS entropy (/dev/urandom)Cryptographically secureKeys, tokens
Hardware RNG (thermal noise)True randomHigh-security applications
Quantum processesTrue randomResearch

For scientific computing and ML, PRNGs are entirely sufficient. Use secrets or os.urandom when security is the concern.

import secrets

# Cryptographically secure random integer in [0, n)
secure_token = secrets.token_hex(16)
print("Secure token:", secure_token)

# For reproducible science, always use numpy with an explicit seed
rng = np.random.default_rng(seed=2024)
print("First draw:", rng.uniform(0, 1))
print("Second draw:", rng.uniform(0, 1))
Secure token: d7d7f84893e00c73e85006b6543fdce6
First draw: 0.6758313379812818
Second draw: 0.21432320123825765

4. Testing Randomness: Basic Statistical Tests

How do you know a sequence is “random enough”? Apply tests. A good PRNG should:

  • Produce each value with equal frequency (uniformity)

  • Show no autocorrelation (successive values are independent)

  • Pass the chi-squared goodness-of-fit test

from scipy import stats

rng = np.random.default_rng(seed=99)
n = 10_000
rolls = rng.integers(1, 7, size=n)

# Chi-squared test: are observed frequencies consistent with uniform distribution?
observed = np.bincount(rolls, minlength=7)[1:]   # counts for faces 1-6
expected = np.full(6, n / 6)

chi2_stat, p_value = stats.chisquare(observed, expected)
print(f"Chi-squared statistic: {chi2_stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Conclusion: {'Cannot reject uniformity' if p_value > 0.05 else 'Reject uniformity'}")

# Autocorrelation: does knowing roll[i] tell you anything about roll[i+1]?
# Pearson correlation between consecutive pairs
x = rolls[:-1].astype(float)
y = rolls[1:].astype(float)
corr, p_corr = stats.pearsonr(x, y)
print(f"\nAutocorrelation (lag-1): {corr:.4f}  (p={p_corr:.4f})")
print("Near zero means successive rolls are independent.")
Chi-squared statistic: 3.9788
p-value: 0.5525
Conclusion: Cannot reject uniformity

Autocorrelation (lag-1): -0.0026  (p=0.7927)
Near zero means successive rolls are independent.

5. Randomness Is the Foundation of Probability

Probability theory is the formal study of random processes. Every construct in the next 29 chapters — events, distributions, expectations, Markov chains — is built on top of the simple fact that outcomes vary.

The key insight: even though individual outcomes are unpredictable, aggregate behavior is highly regular. Roll one die and you cannot know the outcome. Roll a million dice and the average is almost certainly within 0.01 of 3.5. This regularity is what makes probability useful.

(The precise statement of this regularity is the Law of Large Numbers — see ch255.)


6. Summary

  • Randomness in computing is pseudorandom: deterministic, reproducible with a seed, but statistically indistinguishable from true random.

  • Use numpy.random.default_rng(seed) for reproducible scientific simulations.

  • Statistical tests (chi-squared, autocorrelation) verify quality of random sequences.

  • Probability is the mathematical framework for reasoning about random processes — individual outcomes are unpredictable; aggregate behavior is precisely characterizable.


7. Forward References

This chapter introduced the computational infrastructure for randomness. The mathematical structure — what outcomes are possible and how probability is assigned — begins in ch242 (Sample Spaces). The regularity of large samples is formalized in ch255 (Law of Large Numbers). Every simulation in ch256–ch259 draws on the PRNG patterns established here.