1. The core idea¶
Diffusion models (Ho et al., 2020) learn to reverse a gradual noising process.
Forward process: add small Gaussian noise at each of steps until . Reverse process: learn a neural network that removes the noise step by step.
A key property: can be sampled directly from in closed form:
where .
The network is trained to predict the noise added at each step — a simple MSE loss:
(Gaussian distribution: ch253. ELBO connection: ch325. Markov chains: ch258.)
import numpy as np
import matplotlib.pyplot as plt
class DDPM1D:
"""
Denoising Diffusion Probabilistic Model for 1D data.
Demonstrates the forward process and (analytically) the reverse.
"""
def __init__(self, T: int = 100, beta_start: float = 1e-4, beta_end: float = 0.02):
self.T = T
self.betas = np.linspace(beta_start, beta_end, T)
self.alphas = 1.0 - self.betas
self.alpha_bars = np.cumprod(self.alphas)
def q_sample(self, x0: np.ndarray, t: int, rng: np.random.Generator) -> tuple:
"""Sample x_t from x_0 directly (closed form)."""
sqrt_ab = np.sqrt(self.alpha_bars[t])
sqrt_1mab = np.sqrt(1 - self.alpha_bars[t])
eps = rng.standard_normal(x0.shape)
x_t = sqrt_ab * x0 + sqrt_1mab * eps
return x_t, eps
def noise_level_at(self, t: int) -> float:
return float(np.sqrt(1 - self.alpha_bars[t]))
rng = np.random.default_rng(0)
ddpm = DDPM1D(T=200)
# 1D toy data: bimodal Gaussian
x0 = np.concatenate([rng.normal(-2, 0.3, 200), rng.normal(2, 0.3, 200)])
# Show forward process: data → noise
timesteps_to_show = [0, 20, 50, 100, 150, 199]
fig, axes = plt.subplots(2, 3, figsize=(13, 7))
for ax, t in zip(axes.ravel(), timesteps_to_show):
x_t, _ = ddpm.q_sample(x0, t, rng)
ax.hist(x_t, bins=50, density=True, color='#3498db', alpha=0.75, edgecolor='white')
noise_level = ddpm.noise_level_at(t)
ax.set_title(f't={t} noise_level={noise_level:.3f}')
ax.set_xlim(-5, 5)
ax.set_ylim(0, 1.2)
if t == 0:
ax.set_ylabel('Density')
plt.suptitle('Forward diffusion process: data gradually becomes pure noise', fontsize=11)
plt.tight_layout()
plt.savefig('ch327_diffusion_forward.png', dpi=120)
plt.show()
# Show noise schedule
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
t_axis = np.arange(ddpm.T)
axes[0].plot(ddpm.betas, color='#e74c3c', lw=2)
axes[0].set_title('Beta schedule (noise added per step)')
axes[0].set_xlabel('Timestep t'); axes[0].set_ylabel('β_t')
axes[1].plot(ddpm.alpha_bars, color='#3498db', lw=2, label='ᾱ_t (signal fraction)')
axes[1].plot(1-ddpm.alpha_bars, color='#e74c3c', lw=2, label='1-ᾱ_t (noise fraction)')
axes[1].set_title('Signal vs noise fraction across timesteps')
axes[1].set_xlabel('Timestep t'); axes[1].legend()
plt.tight_layout()
plt.savefig('ch327_diffusion_schedule.png', dpi=120)
plt.show()2. The denoising network¶
The network must:
Accept the noisy sample and timestep as inputs.
Predict the noise that was added.
For images: U-Net architecture with time embedding injected at each layer. For text (Diffusion-LM): Transformer with time conditioning.
Time embedding: is embedded as a sinusoidal positional encoding (ch323) and projected into the network via scale-and-shift in each layer.
3. Sampling¶
To generate new data from a trained diffusion model:
Start from .
Repeatedly apply the learned denoising step:
After steps, is a new sample.
DDIM (Song et al., 2020) enables sampling in 10–50 steps instead of 1000 by making the reverse process deterministic.
4. Why diffusion beats GANs¶
| Property | GANs | Diffusion |
|---|---|---|
| Training stability | Adversarial; unstable | MSE loss; stable |
| Sample diversity | Mode collapse risk | Full distribution coverage |
| Sample quality | Very high (StyleGAN) | Higher (DALL-E 2, Stable Diffusion) |
| Inference speed | Fast (one forward pass) | Slow (hundreds of steps) |
5. Summary¶
Diffusion models: learn to reverse a Markov chain that gradually adds Gaussian noise.
Closed-form forward sampling enables efficient training (directly compute from ).
Training objective: predict the noise added, with MSE loss.
Reverse process is iterative (slow but high quality); DDIM accelerates it.
6. Forward and backward references¶
Used here: Markov chains (ch258), Gaussian distribution (ch253), ELBO (ch325), MSE loss (ch305), positional encoding for time (ch323).
This will reappear in ch340 — Capstone II, where a simple 1D diffusion model is trained as part of the end-to-end deep learning system.