1. The adversarial idea¶
A GAN (Goodfellow et al., 2014) trains two networks simultaneously:
Generator : maps noise to fake data .
Discriminator : classifies samples as real () or fake ().
They play a minimax game:
At Nash equilibrium: produces samples indistinguishable from real data; outputs everywhere.
(Game theory / minimax: ch232. BCE loss: ch305. Expected value: ch249.)
import numpy as np
import matplotlib.pyplot as plt
def relu(z): return np.maximum(0, z)
def relu_grad(z): return (z > 0).astype(float)
def sigmoid(z): return 1/(1+np.exp(-np.clip(z,-500,500)))
def sigmoid_grad(z): s=sigmoid(z); return s*(1-s)
class MLP_GAN:
"""Simple MLP used as Generator or Discriminator."""
def __init__(self, layer_sizes: list, seed: int = 0):
rng = np.random.default_rng(seed)
self.params = []
for i in range(len(layer_sizes)-1):
fi, fo = layer_sizes[i], layer_sizes[i+1]
W = rng.normal(0, np.sqrt(2.0/fi), (fo, fi))
b = np.zeros(fo)
self.params.append([W, b]) # mutable lists for in-place update
def forward(self, x: np.ndarray) -> tuple:
"""Returns (output, cache). Last layer uses sigmoid; hidden use relu."""
a = x; cache = [x]
for i, (W, b) in enumerate(self.params):
z = a @ W.T + b
a = sigmoid(z) if i == len(self.params)-1 else relu(z)
cache.extend([z, a])
return a, cache
def update(self, grads: list, lr: float):
for (W, b), (dW, db) in zip(self.params, grads):
W -= lr * dW
b -= lr * db
def train_gan_step(G: MLP_GAN, D: MLP_GAN,
real_data: np.ndarray, noise_dim: int,
lr: float = 0.001, rng: np.random.Generator = None):
"""One GAN training step. Returns (d_loss, g_loss)."""
B = real_data.shape[0]
z = rng.standard_normal((B, noise_dim))
# ── Train Discriminator ──
fake = G.forward(z)[0]
d_real, _ = D.forward(real_data)
d_fake, _ = D.forward(fake)
eps = 1e-8
d_loss = -np.mean(np.log(d_real+eps) + np.log(1-d_fake+eps))
# Discriminator gradients (numerical for clarity)
D_grads = []
for layer_idx, (W, b) in enumerate(D.params):
dW = np.zeros_like(W); db = np.zeros_like(b)
delta = 1e-4
for idx in np.ndindex(*W.shape):
if rng.random() > 0.1: continue
W[idx] += delta
dr2 = D.forward(real_data)[0]; df2 = D.forward(fake)[0]
lp = -np.mean(np.log(dr2+eps)+np.log(1-df2+eps))
W[idx] -= 2*delta
dr3 = D.forward(real_data)[0]; df3 = D.forward(fake)[0]
lm = -np.mean(np.log(dr3+eps)+np.log(1-df3+eps))
W[idx] += delta
dW[idx] = (lp-lm)/(2*delta)
D_grads.append((dW, db))
D.update(D_grads, lr)
# ── Train Generator ──
z2 = rng.standard_normal((B, noise_dim))
fake2 = G.forward(z2)[0]
d_fake2 = D.forward(fake2)[0]
g_loss = -np.mean(np.log(d_fake2+eps))
G_grads = []
for W, b in G.params:
dW = np.zeros_like(W); db = np.zeros_like(b)
delta = 1e-4
for idx in np.ndindex(*W.shape):
if rng.random() > 0.1: continue
W[idx] += delta
f2 = G.forward(z2)[0]; d2 = D.forward(f2)[0]
lp = -np.mean(np.log(d2+eps))
W[idx] -= 2*delta
f3 = G.forward(z2)[0]; d3 = D.forward(f3)[0]
lm = -np.mean(np.log(d3+eps))
W[idx] += delta
dW[idx] = (lp-lm)/(2*delta)
G_grads.append((dW, db))
G.update(G_grads, lr)
return float(d_loss), float(g_loss)
# Train 1D GAN: generator should learn to match a bimodal Gaussian
rng = np.random.default_rng(42)
noise_dim = 4
G = MLP_GAN([noise_dim, 16, 16, 1], seed=0)
D = MLP_GAN([1, 16, 16, 1], seed=1)
def real_data_sample(n, rng):
"""Bimodal Gaussian."""
mix = rng.choice([-2.0, 2.0], n)
return mix[:, None] + rng.normal(0, 0.3, (n, 1))
d_losses, g_losses = [], []
for step in range(400):
real = real_data_sample(64, rng)
dl, gl = train_gan_step(G, D, real, noise_dim, lr=0.005, rng=rng)
d_losses.append(dl); g_losses.append(gl)
# Visualise
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(d_losses, label='D loss', color='#e74c3c', lw=1.5, alpha=0.7)
ax1.plot(g_losses, label='G loss', color='#3498db', lw=1.5, alpha=0.7)
ax1.set_title('GAN training losses'); ax1.set_xlabel('Step'); ax1.legend()
ax1.axhline(np.log(2), color='gray', linestyle='--', alpha=0.5, label='Nash equilibrium')
z_eval = rng.standard_normal((500, noise_dim))
gen_samples = G.forward(z_eval)[0].ravel()
real_samples = real_data_sample(500, rng).ravel()
ax2.hist(real_samples, bins=40, alpha=0.6, color='#e74c3c', label='Real', density=True)
ax2.hist(gen_samples, bins=40, alpha=0.6, color='#3498db', label='Generated', density=True)
ax2.set_title('Real vs Generated distribution'); ax2.legend()
plt.tight_layout()
plt.savefig('ch326_gan.png', dpi=120)
plt.show()2. Training instabilities¶
GANs are notoriously difficult to train:
Mode collapse: generator learns to produce one mode regardless of .
Oscillation: generator and discriminator cycle without converging.
Gradient vanishing: when is too good, .
WGAN (Arjovsky et al., 2017) replaces BCE with the Wasserstein distance, giving a smoother, more informative gradient. The discriminator becomes a “critic” constrained to be 1-Lipschitz (via gradient penalty or weight clipping).
3. Notable GAN variants¶
| Model | Innovation |
|---|---|
| DCGAN | Convolutional layers; stable image generation |
| WGAN | Wasserstein distance; stable training |
| StyleGAN | Style-based latent control; high-res face synthesis |
| CycleGAN | Unpaired image-to-image translation |
| Conditional GAN | Class-conditioned generation |
4. Summary¶
GANs: Generator and Discriminator play minimax game until Nash equilibrium.
Training is unstable; requires careful hyperparameter tuning and tricks.
WGAN provides a more reliable training signal via Wasserstein distance.
Modern image synthesis (StyleGAN3) achieves photorealistic quality.
Diffusion models (ch327) have largely superseded GANs for image generation.
5. Forward and backward references¶
Used here: BCE loss (ch305), expected value (ch249), minimax optimisation (ch213), backpropagation (ch306).
This will reappear in ch327 — Diffusion Models, which offer a more stable training objective while achieving superior sample quality.