Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 123 — Vectors in Programming

Prerequisites: What is a Vector? (ch121), Geometric Interpretation (ch122), lists and arrays in Python You will learn:

  • How vectors map to Python data structures

  • Why NumPy arrays are the correct representation

  • The difference between Python lists and NumPy vectors in terms of behavior

  • Memory layout and why it matters for performance

Environment: Python 3.x, numpy, matplotlib


1. Concept

A vector in mathematics is an abstract object. A vector in a program is a concrete data structure stored in memory.

In Python, three representations are commonly used:

  1. Python list: [1.0, 2.0, 3.0] — general purpose, flexible, slow for math

  2. NumPy array: np.array([1.0, 2.0, 3.0]) — fixed dtype, contiguous memory, fast

  3. Custom class: builds on one of the above — useful for learning, not for production

The decision is not aesthetic. NumPy arrays are designed around the mathematical definition of a vector: fixed size, homogeneous type, and a defined set of operations (+, *, dot). Python lists are not. Using a list as a vector is like using a hammer as a screwdriver — it might work, but you will regret it.

Common misconception: [1, 2, 3] + [4, 5, 6] in Python produces [1, 2, 3, 4, 5, 6], not [5, 7, 9]. Python lists concatenate. NumPy arrays add.


2. Intuition & Mental Models

Memory model: A NumPy array stores its numbers in a contiguous block of memory, all of the same type. Accessing element i is a single pointer offset — O(1) and cache-friendly. A Python list stores pointers to Python objects. Each element access requires a pointer dereference and a type check.

Operation model: NumPy operations execute in compiled C code, looping over the contiguous buffer. Python list operations execute the Python interpreter overhead per element. For 1 million elements, the difference is ~100x.

Mathematical model: Think of np.array as declaring: “this is a vector of doubles.” You are telling Python to treat this object with the rules of vector algebra, not list algebra.

Recall from ch038 (Floating Point Errors) that float64 is the standard 64-bit real number. NumPy defaults to this type for decimal inputs.


3. Visualization

# --- Visualization: Python list vs NumPy array behavior ---
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

# Demonstrate the addition difference
list_a = [1, 2, 3]
list_b = [4, 5, 6]

arr_a = np.array([1.0, 2.0, 3.0])
arr_b = np.array([4.0, 5.0, 6.0])

print("Python list:")
print(f"  [1,2,3] + [4,5,6] = {list_a + list_b}  (concatenation!)")

print("\nNumPy array:")
print(f"  [1,2,3] + [4,5,6] = {arr_a + arr_b}  (vector addition!)")

print("\nScaling:")
print(f"  Python: 2 * [1,2,3] = {2 * list_a}  (repetition!)")
print(f"  NumPy:  2 * [1,2,3] = {2 * arr_a}  (scalar multiplication!)")
Python list:
  [1,2,3] + [4,5,6] = [1, 2, 3, 4, 5, 6]  (concatenation!)

NumPy array:
  [1,2,3] + [4,5,6] = [5. 7. 9.]  (vector addition!)

Scaling:
  Python: 2 * [1,2,3] = [1, 2, 3, 1, 2, 3]  (repetition!)
  NumPy:  2 * [1,2,3] = [2. 4. 6.]  (scalar multiplication!)
# --- Visualization: Memory layout and speed ---
import numpy as np
import time

N = 1_000_000

python_list = list(range(N))
numpy_array = np.arange(N, dtype=np.float64)

# Time summing with Python list
t0 = time.perf_counter()
result_list = sum(python_list)
t1 = time.perf_counter()

# Time summing with NumPy
t2 = time.perf_counter()
result_numpy = numpy_array.sum()
t3 = time.perf_counter()

print(f"Python list sum: {result_list:.0f}  Time: {(t1-t0)*1000:.2f} ms")
print(f"NumPy array sum: {result_numpy:.0f}  Time: {(t3-t2)*1000:.2f} ms")
print(f"Speedup: {(t1-t0)/(t3-t2):.0f}x")
print(f"\nMemory — list: ~{8*N + 56*N:,} bytes approx")
print(f"Memory — numpy: {numpy_array.nbytes:,} bytes")
Python list sum: 499999500000  Time: 11.90 ms
NumPy array sum: 499999500000  Time: 1.48 ms
Speedup: 8x

Memory — list: ~64,000,000 bytes approx
Memory — numpy: 8,000,000 bytes

4. Mathematical Formulation

A vector v in ℝⁿ requires:

  • A fixed dimension nn

  • nn real-valued components

  • Two core operations: addition and scalar multiplication (coming in ch125 and ch126)

NumPy’s ndarray with shape=(n,) and dtype=float64 satisfies all three.

Key NumPy attributes:

v.shape   → (n,)           dimension of the vector
v.dtype   → float64        type of each component
v.ndim    → 1              number of axes (1 for a vector)
v.nbytes  → 8*n            memory in bytes (8 bytes per float64)
# --- Mathematical Formulation: NumPy array anatomy ---
import numpy as np

v = np.array([1.0, -2.5, 0.0, 7.3])

print("Vector v:", v)
print("Shape:   ", v.shape)    # (4,)
print("Dtype:   ", v.dtype)    # float64
print("Ndim:    ", v.ndim)     # 1
print("Size:    ", v.size)     # 4  (number of elements)
print("Nbytes:  ", v.nbytes)   # 32 (4 * 8)

# Specifying dtypes explicitly
v_int   = np.array([1, 2, 3], dtype=np.int32)
v_float = np.array([1, 2, 3], dtype=np.float64)
print("\nint32 vector:",   v_int,   v_int.dtype)
print("float64 vector:", v_float, v_float.dtype)
Vector v: [ 1.  -2.5  0.   7.3]
Shape:    (4,)
Dtype:    float64
Ndim:     1
Size:     4
Nbytes:   32

int32 vector: [1 2 3] int32
float64 vector: [1. 2. 3.] float64

5. Python Implementation

# --- Implementation: Utility functions for vectors ---
import numpy as np

def make_vector(*components):
    """
    Construct a float64 vector from individual components.

    Args:
        *components: scalar values

    Returns:
        ndarray, shape (n,), dtype float64
    """
    return np.array(components, dtype=np.float64)


def assert_vector(v, expected_dim=None):
    """
    Assert that v is a valid 1D numpy vector, optionally check dimension.

    Args:
        v: input to check
        expected_dim: int or None — expected number of components

    Raises:
        TypeError if v is not a 1D ndarray
        ValueError if dimension mismatch
    """
    if not isinstance(v, np.ndarray) or v.ndim != 1:
        raise TypeError(f"Expected 1D ndarray, got {type(v)} with ndim={getattr(v, 'ndim', 'N/A')}")
    if expected_dim is not None and v.shape[0] != expected_dim:
        raise ValueError(f"Expected dimension {expected_dim}, got {v.shape[0]}")


def vectors_same_dim(u, v):
    """
    Return True if u and v have the same dimension.

    Args:
        u, v: ndarrays

    Returns:
        bool
    """
    return u.shape == v.shape


# Test
v = make_vector(1, -2, 3.5)
print("Made vector:", v, "dtype:", v.dtype)

assert_vector(v)           # passes
assert_vector(v, 3)        # passes

try:
    assert_vector([1, 2])  # should fail
except TypeError as e:
    print("TypeError:", e)

try:
    assert_vector(v, 5)    # should fail
except ValueError as e:
    print("ValueError:", e)
Made vector: [ 1.  -2.   3.5] dtype: float64
TypeError: Expected 1D ndarray, got <class 'list'> with ndim=N/A
ValueError: Expected dimension 5, got 3

6. Experiments

# --- Experiment 1: Integer vs float dtype behavior ---
# Hypothesis: integer vectors truncate decimal results, float64 vectors do not.
# Try changing: dtype between int32, float32, float64
import numpy as np

DTYPE = np.int32   # <-- modify: try np.float32, np.float64

v = np.array([1, 2, 3], dtype=DTYPE)
result = v / 2

print(f"dtype={DTYPE.__name__}")
print(f"v = {v}")
print(f"v / 2 = {result}  (dtype: {result.dtype})")
dtype=int32
v = [1 2 3]
v / 2 = [0.5 1.  1.5]  (dtype: float64)
# --- Experiment 2: Mutation and copying ---
# Hypothesis: assigning v = w and then modifying w WILL affect v (they share memory).
# Try changing: use np.copy() to prevent this.
import numpy as np

w = np.array([1.0, 2.0, 3.0])

USE_COPY = False  # <-- modify: try True

if USE_COPY:
    v = w.copy()
else:
    v = w  # just another name for the same array!

w[0] = 999.0

print("After modifying w[0]:")
print("  w =", w)
print("  v =", v)
print("  Same object?", v is w)
After modifying w[0]:
  w = [999.   2.   3.]
  v = [999.   2.   3.]
  Same object? True

7. Exercises

Easy 1. Create a float64 vector of 6 zeros using np.zeros. Then set the third element to 5.0. Print the result. (Expected: [0, 0, 5, 0, 0, 0])

Easy 2. What is np.ones(4).dtype? What is np.ones(4, dtype=int).dtype? Check both in code.

Medium 1. Write a function list_to_vector(lst) that converts a Python list to a float64 NumPy vector, raising a ValueError if any element is not numeric.

Medium 2. Compare the memory usage of np.float64, np.float32, and np.float16 for a 10,000-element vector. For each, compute the maximum representable value using np.finfo(dtype).max.

Hard. Write a benchmark that computes the elementwise square root of a 1M-element array using: (a) a Python list with math.sqrt, (b) a list comprehension, (c) np.sqrt. Report times. What explains the performance hierarchy?


8. Mini Project

# --- Mini Project: Safe Vector Library ---
# Problem: Build a thin wrapper around NumPy that enforces vector semantics:
#          consistent dtype, dimension checking, and informative errors.
# Task: Implement the class below.

import numpy as np

class Vec:
    """
    A safe, typed vector wrapping a NumPy float64 array.
    Enforces dimension consistency and provides informative errors.
    """

    def __init__(self, data):
        """
        Args:
            data: list, tuple, or ndarray of numbers
        """
        # TODO: convert to float64 ndarray
        # TODO: raise TypeError if any element is not numeric
        pass

    @property
    def dim(self):
        """Return the dimension of the vector."""
        # TODO
        pass

    def __repr__(self):
        pass  # TODO: return a nice string

    def same_dim_as(self, other):
        """Return True if other is a Vec of the same dimension."""
        pass  # TODO


# When complete, these should work:
# v = Vec([1, 2, 3])
# print(v)         # Vec([1.0, 2.0, 3.0])
# print(v.dim)     # 3
# w = Vec([4, 5, 6])
# print(v.same_dim_as(w))  # True

9. Summary & Connections

  • NumPy ndarray with shape=(n,) is the correct Python representation of a mathematical vector.

  • Python lists concatenate on +; NumPy arrays add element-wise.

  • NumPy stores data in contiguous memory with fixed dtype — this is why it is fast.

  • Assignment v = w creates an alias; v = w.copy() creates an independent copy.

Backward connection: This applies the floating-point representation from ch038. Every element in a float64 vector is an IEEE 754 double.

Forward connections:

  • This will reappear in ch146 — Vectorization for Performance, where NumPy’s memory model enables operations across entire vectors without Python loops.

  • This will reappear in ch147 — NumPy Vector Operations, where the full broadcasting and indexing machinery is explored.

  • This will reappear in ch151 — Introduction to Matrices (Part VI), where 2D arrays extend the same model.