Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

ch326 — Generative Adversarial Networks

1. The adversarial idea

A GAN (Goodfellow et al., 2014) trains two networks simultaneously:

  • Generator GθG_\theta: maps noise zp(z)z \sim p(z) to fake data G(z)G(z).

  • Discriminator DϕD_\phi: classifies samples as real (D(x)1D(x) \approx 1) or fake (D(G(z))0D(G(z)) \approx 0).

They play a minimax game:

minGmaxDExpdata[logD(x)]+Ezp(z)[log(1D(G(z)))]\min_G \max_D \, \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))]

At Nash equilibrium: GG produces samples indistinguishable from real data; DD outputs 1/21/2 everywhere.

(Game theory / minimax: ch232. BCE loss: ch305. Expected value: ch249.)

import numpy as np
import matplotlib.pyplot as plt


def relu(z): return np.maximum(0, z)
def relu_grad(z): return (z > 0).astype(float)
def sigmoid(z): return 1/(1+np.exp(-np.clip(z,-500,500)))
def sigmoid_grad(z): s=sigmoid(z); return s*(1-s)


class MLP_GAN:
    """Simple MLP used as Generator or Discriminator."""

    def __init__(self, layer_sizes: list, seed: int = 0):
        rng = np.random.default_rng(seed)
        self.params = []
        for i in range(len(layer_sizes)-1):
            fi, fo = layer_sizes[i], layer_sizes[i+1]
            W = rng.normal(0, np.sqrt(2.0/fi), (fo, fi))
            b = np.zeros(fo)
            self.params.append([W, b])  # mutable lists for in-place update

    def forward(self, x: np.ndarray) -> tuple:
        """Returns (output, cache). Last layer uses sigmoid; hidden use relu."""
        a = x; cache = [x]
        for i, (W, b) in enumerate(self.params):
            z = a @ W.T + b
            a = sigmoid(z) if i == len(self.params)-1 else relu(z)
            cache.extend([z, a])
        return a, cache

    def update(self, grads: list, lr: float):
        for (W, b), (dW, db) in zip(self.params, grads):
            W -= lr * dW
            b -= lr * db


def train_gan_step(G: MLP_GAN, D: MLP_GAN,
                   real_data: np.ndarray, noise_dim: int,
                   lr: float = 0.001, rng: np.random.Generator = None):
    """One GAN training step. Returns (d_loss, g_loss)."""
    B = real_data.shape[0]
    z = rng.standard_normal((B, noise_dim))

    # ── Train Discriminator ──
    fake = G.forward(z)[0]
    d_real, _ = D.forward(real_data)
    d_fake, _ = D.forward(fake)

    eps = 1e-8
    d_loss = -np.mean(np.log(d_real+eps) + np.log(1-d_fake+eps))

    # Discriminator gradients (numerical for clarity)
    D_grads = []
    for layer_idx, (W, b) in enumerate(D.params):
        dW = np.zeros_like(W); db = np.zeros_like(b)
        delta = 1e-4
        for idx in np.ndindex(*W.shape):
            if rng.random() > 0.1: continue
            W[idx] += delta
            dr2 = D.forward(real_data)[0]; df2 = D.forward(fake)[0]
            lp = -np.mean(np.log(dr2+eps)+np.log(1-df2+eps))
            W[idx] -= 2*delta
            dr3 = D.forward(real_data)[0]; df3 = D.forward(fake)[0]
            lm = -np.mean(np.log(dr3+eps)+np.log(1-df3+eps))
            W[idx] += delta
            dW[idx] = (lp-lm)/(2*delta)
        D_grads.append((dW, db))
    D.update(D_grads, lr)

    # ── Train Generator ──
    z2 = rng.standard_normal((B, noise_dim))
    fake2 = G.forward(z2)[0]
    d_fake2 = D.forward(fake2)[0]
    g_loss = -np.mean(np.log(d_fake2+eps))

    G_grads = []
    for W, b in G.params:
        dW = np.zeros_like(W); db = np.zeros_like(b)
        delta = 1e-4
        for idx in np.ndindex(*W.shape):
            if rng.random() > 0.1: continue
            W[idx] += delta
            f2 = G.forward(z2)[0]; d2 = D.forward(f2)[0]
            lp = -np.mean(np.log(d2+eps))
            W[idx] -= 2*delta
            f3 = G.forward(z2)[0]; d3 = D.forward(f3)[0]
            lm = -np.mean(np.log(d3+eps))
            W[idx] += delta
            dW[idx] = (lp-lm)/(2*delta)
        G_grads.append((dW, db))
    G.update(G_grads, lr)

    return float(d_loss), float(g_loss)


# Train 1D GAN: generator should learn to match a bimodal Gaussian
rng = np.random.default_rng(42)
noise_dim = 4

G = MLP_GAN([noise_dim, 16, 16, 1], seed=0)
D = MLP_GAN([1, 16, 16, 1], seed=1)

def real_data_sample(n, rng):
    """Bimodal Gaussian."""
    mix = rng.choice([-2.0, 2.0], n)
    return mix[:, None] + rng.normal(0, 0.3, (n, 1))

d_losses, g_losses = [], []
for step in range(400):
    real = real_data_sample(64, rng)
    dl, gl = train_gan_step(G, D, real, noise_dim, lr=0.005, rng=rng)
    d_losses.append(dl); g_losses.append(gl)

# Visualise
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(d_losses, label='D loss', color='#e74c3c', lw=1.5, alpha=0.7)
ax1.plot(g_losses, label='G loss', color='#3498db', lw=1.5, alpha=0.7)
ax1.set_title('GAN training losses'); ax1.set_xlabel('Step'); ax1.legend()
ax1.axhline(np.log(2), color='gray', linestyle='--', alpha=0.5, label='Nash equilibrium')

z_eval = rng.standard_normal((500, noise_dim))
gen_samples = G.forward(z_eval)[0].ravel()
real_samples = real_data_sample(500, rng).ravel()
ax2.hist(real_samples, bins=40, alpha=0.6, color='#e74c3c', label='Real', density=True)
ax2.hist(gen_samples,  bins=40, alpha=0.6, color='#3498db', label='Generated', density=True)
ax2.set_title('Real vs Generated distribution'); ax2.legend()

plt.tight_layout()
plt.savefig('ch326_gan.png', dpi=120)
plt.show()

2. Training instabilities

GANs are notoriously difficult to train:

  • Mode collapse: generator learns to produce one mode regardless of zz.

  • Oscillation: generator and discriminator cycle without converging.

  • Gradient vanishing: when DD is too good, log(1D(G(z)))0\log(1-D(G(z))) \to 0.

WGAN (Arjovsky et al., 2017) replaces BCE with the Wasserstein distance, giving a smoother, more informative gradient. The discriminator becomes a “critic” constrained to be 1-Lipschitz (via gradient penalty or weight clipping).


3. Notable GAN variants

ModelInnovation
DCGANConvolutional layers; stable image generation
WGANWasserstein distance; stable training
StyleGANStyle-based latent control; high-res face synthesis
CycleGANUnpaired image-to-image translation
Conditional GANClass-conditioned generation

4. Summary

  • GANs: Generator and Discriminator play minimax game until Nash equilibrium.

  • Training is unstable; requires careful hyperparameter tuning and tricks.

  • WGAN provides a more reliable training signal via Wasserstein distance.

  • Modern image synthesis (StyleGAN3) achieves photorealistic quality.

  • Diffusion models (ch327) have largely superseded GANs for image generation.


5. Forward and backward references

Used here: BCE loss (ch305), expected value (ch249), minimax optimisation (ch213), backpropagation (ch306).

This will reappear in ch327 — Diffusion Models, which offer a more stable training objective while achieving superior sample quality.