Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Prerequisites: Matrix representation (ch152), vector addition (ch125)
You will learn:

  • Matrix addition as elementwise operation

  • Scalar multiplication of matrices

  • Broadcasting in NumPy

  • What matrix addition means geometrically (combining transformations)

Environment: Python 3.x, numpy, matplotlib


1. Concept

Matrix addition is the simplest matrix operation: add corresponding entries. Two matrices can be added only if they have the same shape.

This is structurally identical to vector addition (ch125) — a matrix is just a vector that happens to be arranged in two dimensions.

More importantly: adding two transformation matrices gives a transformation that is the sum of the two transformations. This is a direct consequence of linearity.

Common misconceptions:

  • “You can add any two matrices.” — Only if they have identical shape.

  • “Matrix addition and matrix multiplication are similar operations.” — They are not. Addition is elementwise and commutative; multiplication is not.


2. Intuition & Mental Models

Geometric: If A stretches the x-axis and B rotates, then A+B is a new transformation that does both effects simultaneously, applied to each basis vector independently. The output of (A+B)v equals Av + Bv — both transformations applied and summed.

Computational: Think of matrix addition as a vectorized for-loop over all entries. C[i,j] = A[i,j] + B[i,j] for every (i,j).

Recall from ch125 (Vector Addition) that adding vectors is tip-to-tail. Matrix addition is the same thing — but done simultaneously for every row of each matrix considered as a vector.


3. Visualization

# --- Visualization: (A+B)v = Av + Bv ---
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

A = np.array([[1.0, 0.5], [0.0, 1.0]])   # shear
B = np.array([[0.5, 0.0], [0.0, -0.5]])  # scale y by -0.5
C = A + B

# Test vectors
vectors = np.array([[1,0],[0,1],[1,1],[2,1]], dtype=float).T  # 2x4

Av = A @ vectors
Bv = B @ vectors
Cv = C @ vectors
AvpBv = Av + Bv   # should equal Cv

print("Verifying (A+B)v = Av + Bv for all test vectors:")
print(f"Max difference: {np.max(np.abs(Cv - AvpBv)):.2e}")

fig, axes = plt.subplots(1, 3, figsize=(14, 5))
colors = ['crimson','steelblue','darkgreen','orange']
labels = ['e₁','e₂','e₁+e₂','2e₁+e₂']

for ax, mat, title in zip(axes, [A, B, C], ['A (shear)', 'B (y-scale)', 'A+B (combined)']):
    result = mat @ vectors
    for k in range(vectors.shape[1]):
        ax.annotate('', xy=result[:,k], xytext=(0,0),
                    arrowprops=dict(arrowstyle='->', color=colors[k], lw=2))
        ax.text(result[0,k]+0.05, result[1,k]+0.05, labels[k], fontsize=9, color=colors[k])
    ax.set_xlim(-1, 4); ax.set_ylim(-2, 3); ax.set_aspect('equal')
    ax.axhline(0, color='k', lw=0.5); ax.axvline(0, color='k', lw=0.5)
    ax.set_title(title)

plt.suptitle('Matrix Addition: (A+B)v = Av + Bv', fontsize=12)
plt.tight_layout()
plt.show()
Verifying (A+B)v = Av + Bv for all test vectors:
Max difference: 0.00e+00
C:\Users\user\AppData\Local\Temp\ipykernel_13560\2149386769.py:36: UserWarning: Glyph 8321 (\N{SUBSCRIPT ONE}) missing from font(s) Arial.
  plt.tight_layout()
C:\Users\user\AppData\Local\Temp\ipykernel_13560\2149386769.py:36: UserWarning: Glyph 8322 (\N{SUBSCRIPT TWO}) missing from font(s) Arial.
  plt.tight_layout()
c:\Users\user\OneDrive\Documents\book\.venv\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 8321 (\N{SUBSCRIPT ONE}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)
c:\Users\user\OneDrive\Documents\book\.venv\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 8322 (\N{SUBSCRIPT TWO}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)
<Figure size 1400x500 with 3 Axes>

4. Mathematical Formulation

Matrix addition (A, B same shape m×n):
  (A + B)[i,j] = A[i,j] + B[i,j]   for all i, j

Scalar multiplication:
  (αA)[i,j] = α * A[i,j]            for all i, j

Key properties:
  Commutativity:    A + B = B + A
  Associativity:    (A+B)+C = A+(B+C)
  Zero matrix:      A + 0 = A  (0 is the m×n zero matrix)
  Distributivity:   α(A+B) = αA + αB
  Linearity proof:  (A+B)v = Av + Bv  (follows directly from distributivity of dot product)
# --- Implementation: Matrix arithmetic from scratch and with NumPy ---
import numpy as np

def matrix_add(A, B):
    """
    Add two matrices elementwise.

    Args:
        A, B: 2D numpy arrays of the same shape

    Returns:
        C: 2D numpy array, A + B
    """
    if A.shape != B.shape:
        raise ValueError(f"Shape mismatch: {A.shape} vs {B.shape}")
    m, n = A.shape
    C = np.empty((m, n), dtype=float)
    for i in range(m):
        for j in range(n):
            C[i, j] = A[i, j] + B[i, j]
    return C

def scalar_mult(alpha, A):
    """
    Multiply matrix A by scalar alpha.

    Args:
        alpha: scalar
        A: 2D numpy array

    Returns:
        2D numpy array, alpha * A
    """
    return np.vectorize(lambda x: alpha * x)(A)

# Test
A = np.array([[1, 2], [3, 4]], dtype=float)
B = np.array([[5, -1], [0, 2]], dtype=float)

C_scratch = matrix_add(A, B)
C_numpy   = A + B
print(f"From scratch: {C_scratch}")
print(f"NumPy A+B:    {C_numpy}")
print(f"Match: {np.allclose(C_scratch, C_numpy)}")
print()

# Broadcasting: add a vector to every row of a matrix
A = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype=float)
bias = np.array([10.0, 0.0, -10.0])  # shape (3,) — broadcasts over rows
print(f"A + bias (broadcast over rows):\n{A + bias}")

# Add a column vector to every column
col_bias = np.array([[100.0], [0.0], [-100.0]])  # shape (3,1)
print(f"\nA + col_bias (broadcast over columns):\n{A + col_bias}")
From scratch: [[6. 1.]
 [3. 6.]]
NumPy A+B:    [[6. 1.]
 [3. 6.]]
Match: True

A + bias (broadcast over rows):
[[11.  2. -7.]
 [14.  5. -4.]
 [17.  8. -1.]]

A + col_bias (broadcast over columns):
[[101. 102. 103.]
 [  4.   5.   6.]
 [-93. -92. -91.]]

5. Python Implementation — Broadcasting Rules

# --- Broadcasting: the general rules ---
import numpy as np

# Rule: shapes are compared right-to-left.
# Dimensions are compatible if: equal, or one of them is 1.
# If compatible, the smaller dimension is 'stretched' to match.

examples = [
    (np.ones((3,4)),   np.ones((4,)),    "(3,4) + (4,)   → (3,4)"),
    (np.ones((3,1)),   np.ones((1,4)),   "(3,1) + (1,4)  → (3,4)"),
    (np.ones((5,3,4)), np.ones((3,4)),   "(5,3,4)+(3,4)  → (5,3,4)"),
]

for A, B, desc in examples:
    C = A + B
    print(f"{desc}  result shape: {C.shape}")

# Failure case
try:
    _ = np.ones((3,4)) + np.ones((3,))
except ValueError as e:
    print(f"\n(3,4) + (3,) → Error: {e}")

print("\nML context: adding a bias vector (shape [output_dim]) to a batch")
print("of activations (shape [batch, output_dim]) uses broadcasting.")
batch_size, output_dim = 32, 10
activations = np.random.randn(batch_size, output_dim)
bias = np.random.randn(output_dim)                    # shape (10,)
result = activations + bias                            # broadcasts over batch
print(f"activations {activations.shape} + bias {bias.shape} → {result.shape}")
(3,4) + (4,)   → (3,4)  result shape: (3, 4)
(3,1) + (1,4)  → (3,4)  result shape: (3, 4)
(5,3,4)+(3,4)  → (5,3,4)  result shape: (5, 3, 4)

(3,4) + (3,) → Error: operands could not be broadcast together with shapes (3,4) (3,) 

ML context: adding a bias vector (shape [output_dim]) to a batch
of activations (shape [batch, output_dim]) uses broadcasting.
activations (32, 10) + bias (10,) → (32, 10)

6. Experiments

# --- Experiment: Matrix addition as superposition of effects ---
# Hypothesis: Applying A+B to a set of points produces the vector sum of A and B effects.
# Try changing: ALPHA, BETA — the mixing coefficients.
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

ALPHA = 0.7   # <-- modify: weight on A (rotation)
BETA  = 0.3   # <-- modify: weight on B (scaling)

# A: 30-degree rotation
theta = np.pi / 6
A = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
# B: non-uniform scaling
B = np.array([[1.5, 0.0], [0.0, 0.5]])

# Mix
M = ALPHA * A + BETA * B

# Apply to unit circle
t = np.linspace(0, 2*np.pi, 200)
circle = np.row_stack([np.cos(t), np.sin(t)])

fig, ax = plt.subplots(figsize=(7, 7))
for mat, color, label in [(A, 'crimson', f'A (rotation, α={ALPHA})'),
                            (B, 'steelblue', f'B (scaling, β={BETA})'),
                            (M, 'darkgreen', f'αA + βB (mixed)'),
                            (np.eye(2), 'gray', 'unit circle')]:
    result = mat @ circle
    ax.plot(result[0], result[1], color=color, label=label, linewidth=2)

ax.set_aspect('equal')
ax.axhline(0, color='k', lw=0.5); ax.axvline(0, color='k', lw=0.5)
ax.legend(); ax.set_title('Unit Circle Under Different Matrix Mixtures')
plt.tight_layout(); plt.show()
C:\Users\user\AppData\Local\Temp\ipykernel_13560\1896582526.py:23: DeprecationWarning: `row_stack` alias is deprecated. Use `np.vstack` directly.
  circle = np.row_stack([np.cos(t), np.sin(t)])
<Figure size 700x700 with 1 Axes>

7. Exercises

Easy 1. Compute A + B by hand: A = [[1,2],[3,4]], B = [[-1,0],[2,-3]]. What is 2A - B?

Easy 2. Why can you not add a (3×4) matrix to a (4×3) matrix? What would you need to do first?

Medium 1. Write a function linear_combo_matrices(matrices, coefficients) that computes a weighted sum of a list of matrices. Use it to interpolate between a rotation matrix and a scaling matrix with α from 0 to 1 in 5 steps. Plot the unit circle under each interpolated matrix.

Medium 2. In a neural network, the forward pass is Z = W @ X + b where W is (m,n), X is (n, batch), and b is (m,). What is the shape of Z? Why does b broadcast correctly here? Write it out.

Hard. Prove that the set of all m×n matrices forms a vector space under matrix addition and scalar multiplication. Identify the zero vector, additive inverse, and verify all 8 axioms with short NumPy demonstrations.


8. Mini Project

# --- Mini Project: Image blending via matrix addition ---
# Problem: Two 'images' (grayscale matrices) should be blended with
# a varying alpha parameter. This is the standard alpha-compositing operation.
# Task: implement blend(A, B, alpha) and show the result for alpha in [0, 0.25, 0.5, 0.75, 1.0]

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

SIZE = 64
x = np.linspace(0, 4*np.pi, SIZE)
X, Y = np.meshgrid(x, x)

# Image A: horizontal stripes
img_A = (np.sin(Y * 2) + 1) / 2
# Image B: concentric circles
cx, cy = SIZE//2, SIZE//2
r = np.sqrt((np.arange(SIZE)[:, None] - cy)**2 + (np.arange(SIZE)[None, :] - cx)**2)
img_B = (np.sin(r * 0.5) + 1) / 2

def blend(A, B, alpha):
    """
    Alpha-blend two same-shape matrices.
    alpha=0 → pure A, alpha=1 → pure B.

    Args:
        A, B: 2D numpy arrays (same shape)
        alpha: float in [0, 1]

    Returns:
        2D numpy array
    """
    return (1 - alpha) * A + alpha * B   # pure matrix arithmetic

alphas = [0.0, 0.25, 0.5, 0.75, 1.0]
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for ax, alpha in zip(axes, alphas):
    blended = blend(img_A, img_B, alpha)
    ax.imshow(blended, cmap='gray', vmin=0, vmax=1)
    ax.set_title(f'α = {alpha}')
    ax.axis('off')

plt.suptitle('Image Alpha Blending: (1-α)A + αB', fontsize=12)
plt.tight_layout()
plt.show()
<Figure size 1500x300 with 5 Axes>

9. Chapter Summary & Connections

  • Matrix addition is elementwise: (A+B)[i,j] = A[i,j] + B[i,j]. Shapes must match.

  • (A+B)v = Av + Bv — adding transformation matrices adds their effects on vectors.

  • NumPy broadcasting allows adding matrices with compatible (not necessarily identical) shapes.

  • The bias term b in every neural network layer is added via broadcasting (reappears in ch178 — Linear Layers in Deep Learning).

Backward connection: Matrix addition inherits all properties from vector addition (ch125) — matrices are just vectors stored in a 2D grid.

Forward connections:

  • In ch154 (Matrix Multiplication), we encounter the non-trivial operation — composition, not just elementwise combination.

  • In ch176 (Matrix Calculus Introduction), the gradient of a scalar-valued matrix function f(A) is itself a matrix of the same shape — added back to A in gradient descent updates.