Chapter 54 — Function Composition - Mathematics for Programmers

Prerequisites: ch051 (What is a Function?), ch052 (Functions as Programs), ch053 (Domain and Range)
You will learn:
Definition of function composition f ∘ g
Domain of the composition and when it is valid
Composition as the basis for building complex transformations from simple parts
Non-commutativity: f ∘ g ≠ g ∘ f in general
Environment: Python 3.x, numpy, matplotlib

1. Concept¶

Function composition is the operation of applying one function to the result of another.

Given f : B → C and g : A → B, their composition is:

(f ∘ g)(x) = f(g(x))

Read: “f after g” or “f composed with g”.

The composition takes an input from A, applies g to get something in B, then applies f to get something in C. The intermediate set B must be compatible: the range of g must be a subset of the domain of f.

Composition is not commutative. In general, f ∘ g ≠ g ∘ f. Example: f(x) = x + 1, g(x) = x². Then:

(f ∘ g)(x) = f(g(x)) = x² + 1
(g ∘ f)(x) = g(f(x)) = (x+1)² These are different functions.

Composition is associative. f ∘ (g ∘ h) = (f ∘ g) ∘ h. Order of evaluation is fixed by nesting, not by grouping.

Why it matters: Every deep learning model is a composition of functions. A neural network with 100 layers is a composition of 100 simpler functions. Understanding composition is understanding deep learning’s structure.

2. Intuition & Mental Models¶

Physical analogy: Assembly line. Each station is a function. The output of one station is the input of the next. The final product is the composition of all stations. The order of stations matters — welding before painting is different from painting before welding.

Computational analogy: UNIX pipes: cat file | grep pattern | sort | uniq. Each command is a function; | is composition. Data flows left to right. The output of grep must be valid input for sort. This is exactly (uniq ∘ sort ∘ grep ∘ cat)(file).

Recall from ch052 (Functions as Programs): the compose(*fns) pipeline we built was exactly function composition. Now we give it formal mathematical treatment.

3. Visualization¶

# --- Visualization: f∘g vs g∘f — order matters ---
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

x = np.linspace(-3, 3, 400)

f = lambda x: x + 1       # shift up by 1
g = lambda x: x**2        # square
fog = lambda x: f(g(x))   # f after g: x² + 1
gof = lambda x: g(f(x))   # g after f: (x+1)²

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].plot(x, f(x), color='steelblue', linewidth=2, label='f(x) = x+1')
axes[0].plot(x, g(x), color='crimson', linewidth=2, label='g(x) = x²')
axes[0].set_title('Component Functions')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')
axes[0].legend()

axes[1].plot(x, fog(x), color='darkorange', linewidth=2, label='(f∘g)(x) = x²+1')
axes[1].plot(x, gof(x), color='darkgreen', linewidth=2, label='(g∘f)(x) = (x+1)²')
axes[1].set_title('Compositions: f∘g vs g∘f')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')
axes[1].legend()

# Difference plot
axes[2].plot(x, fog(x) - gof(x), color='purple', linewidth=2)
axes[2].axhline(0, color='black', linewidth=0.5)
axes[2].set_title('Difference: (f∘g) - (g∘f)\n(Non-zero → composition not commutative)')
axes[2].set_xlabel('x')
axes[2].set_ylabel('Difference')

plt.tight_layout()
plt.show()

# Algebraic check:
# fog(x) = x² + 1
# gof(x) = (x+1)² = x² + 2x + 1
# difference = fog - gof = -2x  (linear, as shown)

C:\Users\user\AppData\Local\Temp\ipykernel_5552\3042565820.py:36: UserWarning: Glyph 8728 (\N{RING OPERATOR}) missing from font(s) Arial.
  plt.tight_layout()
c:\Users\user\OneDrive\Documents\book\.venv\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 8728 (\N{RING OPERATOR}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)

4. Mathematical Formulation¶

Definition: Given f : B → C and g : A → B:

(f ∘ g) : A → C
(f ∘ g)(x) = f(g(x))

Domain of composition: dom(f ∘ g) = {x ∈ dom(g) : g(x) ∈ dom(f)}

Associativity: f ∘ (g ∘ h) = (f ∘ g) ∘ h

Identity element: The identity function id(x) = x satisfies f ∘ id = id ∘ f = f

# --- Mathematical Formulation: Composition operator ---
import numpy as np

def compose(*fns):
    """
    Create the composition of functions, applied right-to-left.
    compose(f, g, h)(x) = f(g(h(x)))
    
    Args:
        *fns: callables, functions to compose (applied right-to-left)
    Returns:
        callable: composed function
    """
    def composed(x):
        result = x
        for f in reversed(fns):  # right-to-left: innermost first
            result = f(result)
        return result
    return composed

# Example: compose(f, g)(x) = f(g(x))
f = lambda x: x + 1
g = lambda x: x**2
h = lambda x: np.abs(x)

fog = compose(f, g)          # f(g(x)) = x² + 1
gof = compose(g, f)          # g(f(x)) = (x+1)²
fogoh = compose(f, g, h)     # f(g(h(x))) = |x|² + 1 = x² + 1 for real x

test_x = np.array([-2, -1, 0, 1, 2])
print("x:        ", test_x)
print("(f∘g)(x): ", fog(test_x))    # x² + 1
print("(g∘f)(x): ", gof(test_x))    # (x+1)²
print("(f∘g∘h)(x):", fogoh(test_x)) # |x|² + 1

# Verify associativity: f∘(g∘h) == (f∘g)∘h
fog_h   = compose(compose(f, g), h)
f_goh   = compose(f, compose(g, h))
print("\nAssociativity check (should be 0):",
      np.max(np.abs(fog_h(test_x.astype(float)) - f_goh(test_x.astype(float)))))

x:         [-2 -1  0  1  2]
(f∘g)(x):  [5 2 1 2 5]
(g∘f)(x):  [1 0 1 4 9]
(f∘g∘h)(x): [5 2 1 2 5]

Associativity check (should be 0): 0.0

5. Python Implementation¶

# --- Implementation: Neural network as composed functions ---
# A neural network layer is: output = activation(weights @ input + bias)
# This is a composition of 3 operations: linear transform, add bias, activate.

import numpy as np

def linear(W, b):
    """
    Return a linear function: x → W @ x + b.
    Args:
        W: np.ndarray, weight matrix (output_dim, input_dim)
        b: np.ndarray, bias vector (output_dim,)
    Returns:
        callable: x → W @ x + b
    """
    return lambda x: W @ x + b

def relu(x):
    """ReLU activation: max(0, x) element-wise."""
    return np.maximum(0, x)

def sigmoid(x):
    """Sigmoid activation: 1 / (1 + e^(-x))."""
    return 1.0 / (1.0 + np.exp(-x))

def make_layer(W, b, activation):
    """Compose linear transform and activation into one function."""
    lin = linear(W, b)
    return compose(activation, lin)  # activation(lin(x))

# Build a 3-layer network as composition
np.random.seed(0)
layer1 = make_layer(np.random.randn(4, 2) * 0.3, np.zeros(4), relu)
layer2 = make_layer(np.random.randn(4, 4) * 0.3, np.zeros(4), relu)
layer3 = make_layer(np.random.randn(1, 4) * 0.3, np.zeros(1), sigmoid)

network = compose(layer3, layer2, layer1)  # layer3(layer2(layer1(x)))

# Test
x_input = np.array([1.5, -0.5])
output = network(x_input)
print(f"Input: {x_input}")
print(f"Network output (sigmoid → [0,1]): {output}")
print(f"This is f3(f2(f1(x))) — pure function composition.")

# The 'compose' function defined in Section 4 is reused here.

Input: [ 1.5 -0.5]
Network output (sigmoid → [0,1]): [0.50210612]
This is f3(f2(f1(x))) — pure function composition.

6. Experiments¶

# --- Experiment 1: How deep composition changes a function's behavior ---
# Hypothesis: Composing a function with itself repeatedly creates increasingly complex shapes.
# Try: change the base function to x*(1-x) (logistic map — reappears in ch077)

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

PARAM = 3.5  # <-- try changing this between 0 and 4

def logistic(x):
    return PARAM * x * (1 - x)

def iterate(f, n_times):
    """Return f composed with itself n_times."""
    def iterated(x):
        result = x
        for _ in range(n_times):
            result = f(result)
        return result
    return iterated

x = np.linspace(0, 1, 500)
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

for ax, n in zip(axes, [1, 2, 4, 8]):
    fn = iterate(logistic, n)
    with np.errstate(invalid='ignore'):
        y = fn(x)
    ax.plot(x, np.clip(y, 0, 1), color='steelblue', linewidth=1)
    ax.set_title(f'f applied {n} time(s)')
    ax.set_xlabel('x')
    ax.set_ylabel('f^n(x)')

plt.suptitle(f'Logistic Map Self-Composition (PARAM = {PARAM})', fontweight='bold')
plt.tight_layout()
plt.show()
# With PARAM=3.5, you see chaos emerging — this is ch077's preview.

7. Exercises¶

Easy 1. If f(x) = 2x and g(x) = x + 3, compute (f∘g)(5) and (g∘f)(5) by hand. Verify in Python. (Expected: f∘g=16, g∘f=13)

Easy 2. Find two functions f and g such that f∘g = g∘f. (Hint: try both as linear functions ax+b with specific a and b values.) (Expected: e.g. both f(x)=2x, g(x)=3x)

Medium 1. Extend the compose function from Section 4 to support composition of two functions symbolically: given string representations 'x+1' and 'x**2', produce the string '(x**2)+1' for (f∘g). (Hint: string substitution)

Medium 2. A data preprocessing pipeline has these steps: (1) subtract mean, (2) divide by std, (3) clip to [-3, 3], (4) apply tanh. Build this as a composition and apply it to an array of 1000 random values. Show before/after histograms. (Hint: use compose with lambdas that close over computed mean and std)

Hard. Show that for differentiable functions, the chain rule is the derivative of a composition: (f∘g)‘(x) = f’(g(x)) · g’(x). Verify this numerically using finite differences for f(x)=sin(x), g(x)=x², at x=1.0. Compare your numerical (f∘g)’ against f’(g(x))·g’(x) computed analytically. (Challenge: This is the basis of backpropagation in neural networks — reappears in ch216)

8. Mini Project¶

# --- Mini Project: Image Processing Pipeline as Function Composition ---
# Problem: Build an image preprocessing pipeline using pure function composition.
# Dataset: Simulated grayscale image (2D array).
# Task: Build composable image transforms and visualize the pipeline stages.

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

# Generate synthetic grayscale image
np.random.seed(42)
SIZE = 64
x_grid, y_grid = np.meshgrid(np.linspace(0, 1, SIZE), np.linspace(0, 1, SIZE))
image = (np.sin(8 * np.pi * x_grid) * np.cos(6 * np.pi * y_grid)
         + 0.3 * np.random.randn(SIZE, SIZE))
image = (image - image.min()) / (image.max() - image.min())  # normalize to [0,1]

# Pure image transform functions (all take and return 2D arrays)
def add_noise(sigma=0.05):
    def _fn(img):
        np.random.seed(99)
        return np.clip(img + np.random.normal(0, sigma, img.shape), 0, 1)
    return _fn

def threshold(cutoff=0.5):
    """Binarize: pixels > cutoff → 1, else → 0."""
    return lambda img: (img > cutoff).astype(float)

def invert(img):
    return 1 - img

def blur(kernel_size=3):
    """Simple box blur using convolution."""
    def _fn(img):
        kernel = np.ones((kernel_size, kernel_size)) / kernel_size**2
        from numpy.fft import fft2, ifft2
        # Pad kernel to image size
        k_pad = np.zeros_like(img)
        ks = kernel_size // 2
        k_pad[:kernel_size, :kernel_size] = kernel
        blurred = np.real(ifft2(fft2(img) * fft2(k_pad)))
        return np.clip(blurred, 0, 1)
    return _fn

# Build pipeline
pipeline_stages = [
    ('Original',         lambda img: img),
    ('Blurred',          blur(5)),
    ('Noisy',            add_noise(0.1)),
    ('Thresholded',      threshold(0.5)),
    ('Inverted',         invert),
]

fig, axes = plt.subplots(1, 5, figsize=(18, 4))
current = image.copy()

for ax, (label, fn) in zip(axes, pipeline_stages):
    current = fn(current)
    ax.imshow(current, cmap='gray', vmin=0, vmax=1)
    ax.set_title(label)
    ax.axis('off')

plt.suptitle('Image Processing as Function Composition', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

9. Chapter Summary & Connections¶

What we covered:

Composition (f∘g)(x) = f(g(x)) chains functions; g is applied first
Composition is not commutative: f∘g ≠ g∘f in general
Composition is associative: grouping does not change result
Complex transformations (neural networks, pipelines) are compositions of simpler functions

Backward connection: Extends ch052’s pipeline pattern with formal mathematical structure.

Forward connections:

In ch055 (Inverse Functions), we ask: when does the composition f∘g = identity? That’s when g is the inverse of f.
The chain rule (ch215) is the derivative rule for composed functions — this is the mathematical foundation of backpropagation
In ch168 (Linear Transformations), matrix multiplication is precisely composition of linear functions