ch233 — Constrained Optimisation - Mathematics for Programmers

Part VII — Calculus¶

Unconstrained optimisation finds the minimum of f(x) over all x. Most real problems are constrained: minimise cost subject to a budget, maximise accuracy subject to a memory limit. This chapter develops the mathematical machinery — Lagrange multipliers and KKT conditions — and connects it to ML regularisation and support vector machines.

(Builds on ch212 — Gradient Descent, ch214 — Saddle Points)

1. The Equality-Constrained Problem¶

Problem: minimise f(x,y) subject to g(x,y) = 0.

Key insight: at the constrained minimum, the gradient of f must be parallel to the gradient of g — otherwise you could move along the constraint surface and decrease f.

This gives the Lagrangian: L(x, y, λ) = f(x,y) − λ·g(x,y)

Setting ∇L = 0 yields: ∇f = λ∇g and g(x,y) = 0

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

# Minimise f(x,y) = x^2 + y^2  subject to  x + y = 1
# Lagrangian: L = x^2 + y^2 - lam*(x + y - 1)
# dL/dx = 2x - lam = 0  =>  x = lam/2
# dL/dy = 2y - lam = 0  =>  y = lam/2
# constraint: x + y = 1  =>  lam = 1  =>  x = y = 0.5

x_opt, y_opt = 0.5, 0.5
f_opt = x_opt**2 + y_opt**2
print(f'Analytical solution: x={x_opt}, y={y_opt}, f={f_opt}')

# Visualise
x = np.linspace(-1, 2, 300)
y = np.linspace(-1, 2, 300)
X, Y = np.meshgrid(x, y)
F = X**2 + Y**2

fig, ax = plt.subplots(figsize=(7, 6))
levels = np.linspace(0, 5, 20)
cs = ax.contourf(X, Y, F, levels=levels, cmap='Blues_r', alpha=0.8)
ax.contour(X, Y, F, levels=levels, colors='steelblue', linewidths=0.5, alpha=0.5)
plt.colorbar(cs, ax=ax, label='f(x,y)')

# Constraint line x + y = 1
x_line = np.linspace(-0.5, 1.5, 200)
y_line = 1 - x_line
ax.plot(x_line, y_line, 'r-', lw=2.5, label='constraint: x+y=1')
ax.plot(x_opt, y_opt, 'ko', ms=10, zorder=5, label=f'minimum ({x_opt},{y_opt})')

ax.set_xlabel('x'); ax.set_ylabel('y')
ax.set_title('Equality Constraint: Lagrange Multipliers')
ax.legend(); ax.set_xlim(-0.5, 1.5); ax.set_ylim(-0.5, 1.5)
plt.tight_layout()
plt.savefig('ch233_equality.png', dpi=120)
plt.show()
print('The minimum on the constraint line is where the circular level curve is tangent to the line.')

Analytical solution: x=0.5, y=0.5, f=0.5

The minimum on the constraint line is where the circular level curve is tangent to the line.

2. KKT Conditions for Inequality Constraints¶

Problem: minimise f(x) subject to g(x) ≤ 0.

The Karush-Kuhn-Tucker (KKT) conditions extend Lagrange multipliers:

Stationarity: ∇f + μ∇g = 0
Primal feasibility: g(x) ≤ 0
Dual feasibility: μ ≥ 0
Complementary slackness: μ·g(x) = 0

Condition 4 means either the constraint is active (g=0) or the multiplier is zero. This is the mathematical foundation of support vectors in SVMs — most training points have μ=0; only the points on the margin boundary matter.

# KKT example: minimise (x-3)^2 + (y-3)^2  subject to  x^2 + y^2 <= 4
# Unconstrained minimum is (3,3) — outside the feasible region (circle r=2)
# So the constrained minimum is on the boundary: x^2 + y^2 = 4

# Lagrangian: L = (x-3)^2 + (y-3)^2 + mu*(x^2 + y^2 - 4)
# Stationarity:
#   2(x-3) + 2*mu*x = 0  =>  x(1+mu) = 3
#   2(y-3) + 2*mu*y = 0  =>  y(1+mu) = 3
# So x = y = 3/(1+mu)
# Constraint active: x^2 + y^2 = 4  =>  2*(3/(1+mu))^2 = 4
# => (1+mu)^2 = 9/2  =>  1+mu = 3/sqrt(2)  =>  mu = 3/sqrt(2) - 1

mu = 3/np.sqrt(2) - 1
x_opt = 3 / (1 + mu)
y_opt = 3 / (1 + mu)
print(f'mu = {mu:.4f} (>0: dual feasibility satisfied)')
print(f'x = y = {x_opt:.4f}')
print(f'Constraint: x^2+y^2 = {x_opt**2 + y_opt**2:.4f} (active at boundary)')

fig, ax = plt.subplots(figsize=(6, 6))
X2, Y2 = np.meshgrid(np.linspace(-3, 4, 300), np.linspace(-3, 4, 300))
F2 = (X2-3)**2 + (Y2-3)**2
ax.contourf(X2, Y2, F2, levels=20, cmap='YlOrRd_r', alpha=0.7)
ax.contour(X2, Y2, F2, levels=20, colors='orange', linewidths=0.5, alpha=0.6)

theta = np.linspace(0, 2*np.pi, 300)
ax.fill(2*np.cos(theta), 2*np.sin(theta), alpha=0.2, color='green', label='feasible region')
ax.plot(2*np.cos(theta), 2*np.sin(theta), 'g-', lw=2)
ax.plot(3, 3, 'bx', ms=12, mew=3, label='unconstrained min (3,3)')
ax.plot(x_opt, y_opt, 'ko', ms=10, zorder=5, label=f'constrained min ({x_opt:.2f},{y_opt:.2f})')
ax.plot([x_opt, 3], [y_opt, 3], 'k--', alpha=0.5)
ax.set_xlabel('x'); ax.set_ylabel('y')
ax.set_title('KKT: Inequality Constraint (active boundary)')
ax.legend(loc='upper left'); ax.set_aspect('equal')
plt.tight_layout()
plt.savefig('ch233_kkt.png', dpi=120)
plt.show()

mu = 1.1213 (>0: dual feasibility satisfied)
x = y = 1.4142
Constraint: x^2+y^2 = 4.0000 (active at boundary)

3. L2 Regularisation as a Constrained Problem¶

Ridge regression is normally written as: minimise ||y - Xw||² + λ||w||²

This is equivalent to: minimise ||y - Xw||² subject to ||w||² ≤ t(λ)

The regularisation parameter λ is the Lagrange multiplier. Larger λ = tighter constraint = smaller weights. L1 regularisation (Lasso) uses ||w||₁ ≤ t, whose constraint boundary has corners — this is why Lasso produces sparse solutions.

# Visualise L1 vs L2 constraint regions and why L1 promotes sparsity
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, norm, title in zip(axes, [2, 1], ['L2 (Ridge): ||w||_2 <= t', 'L1 (Lasso): ||w||_1 <= t']):
    w1 = np.linspace(-1.5, 1.5, 400)
    w2 = np.linspace(-1.5, 1.5, 400)
    W1, W2 = np.meshgrid(w1, w2)

    # Loss landscape (shifted from origin to create interesting intersection)
    w1_star, w2_star = 1.2, 0.8
    Loss = (W1 - w1_star)**2 + 2*(W2 - w2_star)**2
    ax.contour(W1, W2, Loss, levels=15, cmap='Reds', alpha=0.7)

    # Constraint region
    t = 0.7
    if norm == 2:
        theta = np.linspace(0, 2*np.pi, 300)
        cx, cy = t*np.cos(theta), t*np.sin(theta)
        ax.fill(cx, cy, alpha=0.3, color='blue')
        ax.plot(cx, cy, 'b-', lw=2)
        # Minimum is on the circle, not at a corner
        # Approximate: project (w1_star, w2_star) onto circle
        angle = np.arctan2(w2_star, w1_star)
        w_constrained = np.array([t*np.cos(angle), t*np.sin(angle)])
    else:
        # L1 ball is a diamond
        corners = np.array([[t,0],[0,t],[-t,0],[0,-t],[t,0]])
        ax.fill(corners[:,0], corners[:,1], alpha=0.3, color='blue')
        ax.plot(corners[:,0], corners[:,1], 'b-', lw=2)
        # Minimum often hits a corner (sparse solution)
        w_constrained = np.array([t, 0.0])

    ax.plot(w1_star, w2_star, 'rx', ms=12, mew=3, label='unconstrained min')
    ax.plot(w_constrained[0], w_constrained[1], 'ko', ms=10, label='constrained min')
    ax.axhline(0, color='k', lw=0.5); ax.axvline(0, color='k', lw=0.5)
    ax.set_xlabel('w1'); ax.set_ylabel('w2')
    ax.set_title(title); ax.legend(); ax.set_aspect('equal')
    ax.set_xlim(-1.5, 1.5); ax.set_ylim(-1.5, 1.5)

axes[1].annotate('Corner!\n(w2=0, sparse)', xy=(t, 0), xytext=(0.2, 0.7),
    arrowprops=dict(arrowstyle='->', color='black'), fontsize=10)

plt.suptitle('L1 corners promote sparsity; L2 smooth boundary does not', fontsize=12)
plt.tight_layout()
plt.savefig('ch233_regularisation.png', dpi=120)
plt.show()

4. Numerical Constrained Optimisation with scipy¶

from scipy.optimize import minimize

# Portfolio optimisation: minimise variance, target return >= r_min
# weights w, returns mu, covariance Sigma
np.random.seed(42)
n_assets = 5
mu_assets = np.array([0.10, 0.12, 0.08, 0.15, 0.09])  # expected returns
# Random positive-definite covariance
A = np.random.randn(n_assets, n_assets)
Sigma = A.T @ A / n_assets + 0.05 * np.eye(n_assets)

def portfolio_variance(w):
    return w @ Sigma @ w

def portfolio_return(w):
    return mu_assets @ w

r_min = 0.11  # target return

constraints = [
    {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},      # weights sum to 1
    {'type': 'ineq', 'fun': lambda w: portfolio_return(w) - r_min}  # return >= r_min
]
bounds = [(0, 1)] * n_assets  # no short selling

w0 = np.ones(n_assets) / n_assets
result = minimize(portfolio_variance, w0, method='SLSQP',
                  bounds=bounds, constraints=constraints)

print('Optimisation result:', result.message)
print(f'Weights: {result.x.round(4)}')
print(f'Portfolio variance: {result.fun:.6f}')
print(f'Portfolio return:   {portfolio_return(result.x):.4f} (target >= {r_min})')
print(f'Weights sum:        {result.x.sum():.6f}')

Optimisation result: Optimization terminated successfully
Weights: [0.289  0.2533 0.2564 0.2013 0.    ]
Portfolio variance: 0.240347
Portfolio return:   0.1100 (target >= 0.11)
Weights sum:        1.000000

# Efficient frontier: trace the minimum variance for each target return
returns_range = np.linspace(0.085, 0.145, 40)
variances = []

for r_target in returns_range:
    cons = [
        {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},
        {'type': 'ineq', 'fun': lambda w, r=r_target: portfolio_return(w) - r}
    ]
    res = minimize(portfolio_variance, w0, method='SLSQP',
                   bounds=bounds, constraints=cons)
    variances.append(res.fun if res.success else np.nan)

fig, ax = plt.subplots(figsize=(7, 5))
ax.plot(np.sqrt(variances), returns_range, 'b-o', ms=4, lw=2, label='Efficient Frontier')
ax.axhline(r_min, color='red', linestyle='--', alpha=0.7, label=f'Target return={r_min}')
ax.scatter(np.sqrt(np.diag(Sigma)), mu_assets, c='orange', s=80, zorder=5, label='Individual assets')
ax.set_xlabel('Portfolio Std Dev (Risk)')
ax.set_ylabel('Expected Return')
ax.set_title('Markowitz Efficient Frontier (constrained optimisation)')
ax.legend()
plt.tight_layout()
plt.savefig('ch233_frontier.png', dpi=120)
plt.show()

5. Summary¶

Concept	Key Idea
Lagrange multipliers	At constrained minimum, ∇f ∥ ∇g
KKT conditions	Extend Lagrange to inequality constraints
Complementary slackness	μ·g(x)=0: either constraint active or multiplier zero
L2 regularisation	Equivalent to weight-norm equality constraint
L1 regularisation	Diamond constraint; corners produce sparsity
scipy SLSQP	Sequential Least SQuares Programming solver

6. Forward References¶

The KKT complementary slackness condition reappears in ch291 — Optimisation Methods, where we derive support vector machines formally.
The efficient frontier construction previews ch271 — Data and Measurement and ch293 — Clustering, where risk-return tradeoffs model real portfolio decisions.
L1 sparsity connects to ch298 — Information Theory: sparse representations minimise redundancy.