Chapter 162 — Matrix Factorization - Mathematics for Programmers

Prerequisites: Gaussian elimination (ch161), matrix multiplication (ch154) You will learn:
Why factorization is the key paradigm of numerical linear algebra
LU, QR, Cholesky as the three main factorizations
When to use each and what they reveal about the matrix
The unifying idea: split a matrix into structured, easier factors
Environment: Python 3.x, numpy

# --- Matrix Factorization: Overview of main factorizations ---
import numpy as np

np.random.seed(42)
A = np.array([[4.,3.,2.],[6.,3.,1.],[2.,5.,3.]])

print("=== Three main factorizations ===\n")

# 1. LU: A = LU (general square matrices, for solving systems)
import scipy.linalg
P, L, U = scipy.linalg.lu(A)
print("LU Factorization: PA = LU")
print(f"  L (lower triangular):\n{np.round(L,4)}")
print(f"  U (upper triangular):\n{np.round(U,4)}")
print(f"  PA = LU: {np.allclose(P @ A, L @ U)}\n")

# 2. QR: A = QR (orthogonal Q, upper triangular R; for least squares, eigenvalues)
Q, R = np.linalg.qr(A)
print("QR Factorization: A = QR")
print(f"  Q (orthogonal, QᵀQ=I):\n{np.round(Q,4)}")
print(f"  R (upper triangular):\n{np.round(R,4)}")
print(f"  QᵀQ = I: {np.allclose(Q.T @ Q, np.eye(3))}")
print(f"  QR = A:  {np.allclose(Q @ R, A)}\n")

# 3. Cholesky: A = LLᵀ (symmetric positive definite only; fastest, most stable)
S = A.T @ A   # guaranteed SPD
L_chol = np.linalg.cholesky(S)
print("Cholesky: S = LLᵀ  (requires S symmetric positive definite)")
print(f"  L (lower triangular):\n{np.round(L_chol,4)}")
print(f"  LLᵀ = S: {np.allclose(L_chol @ L_chol.T, S)}\n")

# When to use which:
print("=== Usage guide ===")
print("LU:        Solving Ax=b for general square A.  O(n³) once, O(n²) per right-hand side.")
print("QR:        Least squares (overdetermined). Eigenvalue algorithms.  More stable than LU for rank-deficient.")
print("Cholesky:  SPD matrices (covariance, Gram, Laplacian). 2x faster than LU.")
print("SVD:       General rectangular. Rank, pseudoinverse, PCA.  See ch173.")

=== Three main factorizations ===

LU Factorization: PA = LU
  L (lower triangular):
[[1.     0.     0.    ]
 [0.3333 1.     0.    ]
 [0.6667 0.25   1.    ]]
  U (upper triangular):
[[6.     3.     1.    ]
 [0.     4.     2.6667]
 [0.     0.     0.6667]]
  PA = LU: False

QR Factorization: A = QR
  Q (orthogonal, QᵀQ=I):
[[-0.5345  0.0376 -0.8443]
 [-0.8018 -0.3385  0.4925]
 [-0.2673  0.9402  0.2111]]
  R (upper triangular):
[[-7.4833 -5.3452 -2.6726]
 [ 0.      3.7985  2.5574]
 [ 0.      0.     -0.5629]]
  QᵀQ = I: True
  QR = A:  True

Cholesky: S = LLᵀ  (requires S symmetric positive definite)
  L (lower triangular):
[[7.4833 0.     0.    ]
 [5.3452 3.7985 0.    ]
 [2.6726 2.5574 0.5629]]
  LLᵀ = S: True

=== Usage guide ===
LU:        Solving Ax=b for general square A.  O(n³) once, O(n²) per right-hand side.
QR:        Least squares (overdetermined). Eigenvalue algorithms.  More stable than LU for rank-deficient.
Cholesky:  SPD matrices (covariance, Gram, Laplacian). 2x faster than LU.
SVD:       General rectangular. Rank, pseudoinverse, PCA.  See ch173.

4. Mathematical Formulation¶

LU:       PA = LU   — P: permutation, L: unit lower triangular, U: upper triangular
          Use: solve Ax=b as Ly=Pb (forward), Ux=y (back)
          Cost: O(n³) to factor, O(n²) per solve

QR:       A = QR    — Q: orthogonal (Qᵀ=Q⁻¹), R: upper triangular
          Use: least squares min||Ax-b|| via Rx = Qᵀb
          Cost: O(mn²) for m×n matrix

Cholesky: A = LLᵀ  — L: lower triangular (A must be SPD)
          Use: Gaussian covariance, PDEs, Kalman filter
          Cost: O(n³/3) — 2x cheaper than LU, no pivoting needed

7. Exercises¶

Easy 1. Which factorization would you use to solve 500 systems Ax=bᵢ with the same A? Why?

Easy 2. Can you Cholesky-decompose [[1,2],[2,1]]? What about [[4,2],[2,3]]? Why the difference?

Medium 1. Use LU factorization to compute the inverse of a 5×5 matrix. Compare timing to np.linalg.inv.

Medium 2. Implement solve_with_lu(A, B) that factors A once and solves for multiple right-hand sides B (columns of B).

Hard. Implement QR factorization via Gram-Schmidt orthogonalization. Verify Q is orthogonal and QR=A. Then use it to solve a 10×5 overdetermined least-squares system.

9. Chapter Summary & Connections¶

Factorization splits A into structured pieces: triangular (L, U, R) or orthogonal (Q).
LU for general square systems, QR for least squares, Cholesky for symmetric positive definite.
All run in O(n³); the constant factor matters in practice.

Forward connections:

In ch163 (LU Decomposition), we implement LU from scratch.
In ch173 (SVD), the fourth and most powerful factorization handles any matrix.