ch326 — Generative Adversarial Networks - Mathematics for Programmers

1. The adversarial idea¶

A GAN (Goodfellow et al., 2014) trains two networks simultaneously:

Generator $G_\theta$ : maps noise $z \sim p(z)$ to fake data $G(z)$ .
Discriminator $D_\phi$ : classifies samples as real ( $D(x) \approx 1$ ) or fake ( $D(G(z)) \approx 0$ ).

They play a minimax game:

\min_G \max_D \, \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))]

(1)

At Nash equilibrium: $G$ produces samples indistinguishable from real data; $D$ outputs $1/2$ everywhere.

(Game theory / minimax: ch232. BCE loss: ch305. Expected value: ch249.)

import numpy as np
import matplotlib.pyplot as plt


def relu(z): return np.maximum(0, z)
def relu_grad(z): return (z > 0).astype(float)
def sigmoid(z): return 1/(1+np.exp(-np.clip(z,-500,500)))
def sigmoid_grad(z): s=sigmoid(z); return s*(1-s)


class MLP_GAN:
    """Simple MLP used as Generator or Discriminator."""

    def __init__(self, layer_sizes: list, seed: int = 0):
        rng = np.random.default_rng(seed)
        self.params = []
        for i in range(len(layer_sizes)-1):
            fi, fo = layer_sizes[i], layer_sizes[i+1]
            W = rng.normal(0, np.sqrt(2.0/fi), (fo, fi))
            b = np.zeros(fo)
            self.params.append([W, b])  # mutable lists for in-place update

    def forward(self, x: np.ndarray) -> tuple:
        """Returns (output, cache). Last layer uses sigmoid; hidden use relu."""
        a = x; cache = [x]
        for i, (W, b) in enumerate(self.params):
            z = a @ W.T + b
            a = sigmoid(z) if i == len(self.params)-1 else relu(z)
            cache.extend([z, a])
        return a, cache

    def update(self, grads: list, lr: float):
        for (W, b), (dW, db) in zip(self.params, grads):
            W -= lr * dW
            b -= lr * db


def train_gan_step(G: MLP_GAN, D: MLP_GAN,
                   real_data: np.ndarray, noise_dim: int,
                   lr: float = 0.001, rng: np.random.Generator = None):
    """One GAN training step. Returns (d_loss, g_loss)."""
    B = real_data.shape[0]
    z = rng.standard_normal((B, noise_dim))

    # ── Train Discriminator ──
    fake = G.forward(z)[0]
    d_real, _ = D.forward(real_data)
    d_fake, _ = D.forward(fake)

    eps = 1e-8
    d_loss = -np.mean(np.log(d_real+eps) + np.log(1-d_fake+eps))

    # Discriminator gradients (numerical for clarity)
    D_grads = []
    for layer_idx, (W, b) in enumerate(D.params):
        dW = np.zeros_like(W); db = np.zeros_like(b)
        delta = 1e-4
        for idx in np.ndindex(*W.shape):
            if rng.random() > 0.1: continue
            W[idx] += delta
            dr2 = D.forward(real_data)[0]; df2 = D.forward(fake)[0]
            lp = -np.mean(np.log(dr2+eps)+np.log(1-df2+eps))
            W[idx] -= 2*delta
            dr3 = D.forward(real_data)[0]; df3 = D.forward(fake)[0]
            lm = -np.mean(np.log(dr3+eps)+np.log(1-df3+eps))
            W[idx] += delta
            dW[idx] = (lp-lm)/(2*delta)
        D_grads.append((dW, db))
    D.update(D_grads, lr)

    # ── Train Generator ──
    z2 = rng.standard_normal((B, noise_dim))
    fake2 = G.forward(z2)[0]
    d_fake2 = D.forward(fake2)[0]
    g_loss = -np.mean(np.log(d_fake2+eps))

    G_grads = []
    for W, b in G.params:
        dW = np.zeros_like(W); db = np.zeros_like(b)
        delta = 1e-4
        for idx in np.ndindex(*W.shape):
            if rng.random() > 0.1: continue
            W[idx] += delta
            f2 = G.forward(z2)[0]; d2 = D.forward(f2)[0]
            lp = -np.mean(np.log(d2+eps))
            W[idx] -= 2*delta
            f3 = G.forward(z2)[0]; d3 = D.forward(f3)[0]
            lm = -np.mean(np.log(d3+eps))
            W[idx] += delta
            dW[idx] = (lp-lm)/(2*delta)
        G_grads.append((dW, db))
    G.update(G_grads, lr)

    return float(d_loss), float(g_loss)


# Train 1D GAN: generator should learn to match a bimodal Gaussian
rng = np.random.default_rng(42)
noise_dim = 4

G = MLP_GAN([noise_dim, 16, 16, 1], seed=0)
D = MLP_GAN([1, 16, 16, 1], seed=1)

def real_data_sample(n, rng):
    """Bimodal Gaussian."""
    mix = rng.choice([-2.0, 2.0], n)
    return mix[:, None] + rng.normal(0, 0.3, (n, 1))

d_losses, g_losses = [], []
for step in range(400):
    real = real_data_sample(64, rng)
    dl, gl = train_gan_step(G, D, real, noise_dim, lr=0.005, rng=rng)
    d_losses.append(dl); g_losses.append(gl)

# Visualise
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(d_losses, label='D loss', color='#e74c3c', lw=1.5, alpha=0.7)
ax1.plot(g_losses, label='G loss', color='#3498db', lw=1.5, alpha=0.7)
ax1.set_title('GAN training losses'); ax1.set_xlabel('Step'); ax1.legend()
ax1.axhline(np.log(2), color='gray', linestyle='--', alpha=0.5, label='Nash equilibrium')

z_eval = rng.standard_normal((500, noise_dim))
gen_samples = G.forward(z_eval)[0].ravel()
real_samples = real_data_sample(500, rng).ravel()
ax2.hist(real_samples, bins=40, alpha=0.6, color='#e74c3c', label='Real', density=True)
ax2.hist(gen_samples,  bins=40, alpha=0.6, color='#3498db', label='Generated', density=True)
ax2.set_title('Real vs Generated distribution'); ax2.legend()

plt.tight_layout()
plt.savefig('ch326_gan.png', dpi=120)
plt.show()

2. Training instabilities¶

GANs are notoriously difficult to train:

Mode collapse: generator learns to produce one mode regardless of $z$ .
Oscillation: generator and discriminator cycle without converging.
Gradient vanishing: when $D$ is too good, $\log(1-D(G(z))) \to 0$ .

WGAN (Arjovsky et al., 2017) replaces BCE with the Wasserstein distance, giving a smoother, more informative gradient. The discriminator becomes a “critic” constrained to be 1-Lipschitz (via gradient penalty or weight clipping).

3. Notable GAN variants¶

Model	Innovation
DCGAN	Convolutional layers; stable image generation
WGAN	Wasserstein distance; stable training
StyleGAN	Style-based latent control; high-res face synthesis
CycleGAN	Unpaired image-to-image translation
Conditional GAN	Class-conditioned generation

4. Summary¶

GANs: Generator and Discriminator play minimax game until Nash equilibrium.
Training is unstable; requires careful hyperparameter tuning and tricks.
WGAN provides a more reliable training signal via Wasserstein distance.
Modern image synthesis (StyleGAN3) achieves photorealistic quality.
Diffusion models (ch327) have largely superseded GANs for image generation.

5. Forward and backward references¶

Used here: BCE loss (ch305), expected value (ch249), minimax optimisation (ch213), backpropagation (ch306).

This will reappear in ch327 — Diffusion Models, which offer a more stable training objective while achieving superior sample quality.