1. When data is a graph¶
Not all data lives in a grid (images) or sequence (text). Molecules, social networks, knowledge bases, and citation graphs are naturally represented as graphs.
A graph has nodes with features and edges with optional features . The key challenge: define neural operations that respect graph structure.
(Graph embedding: ch195. Matrix operations: ch151.)
2. Message Passing Framework¶
All major GNN variants fit the message passing paradigm:
MSG: each neighbour sends a message to .
AGGREGATE: combine all incoming messages (sum, mean, max).
UPDATE: update ’s representation given its old state and aggregated messages.
After rounds of message passing, each node’s representation incorporates information from its -hop neighbourhood.
3. Graph Convolutional Network (GCN)¶
where (adjacency + self-loops) and is the degree matrix.
The symmetric normalisation is a degree-normalised averaging of neighbour features. Each layer is a linear transformation of the averaged neighbourhood.
(Eigendecomposition motivation: ch175. Adjacency matrix: ch195.)
import numpy as np
import matplotlib.pyplot as plt
def relu(z): return np.maximum(0, z)
def sigmoid(z): return 1/(1+np.exp(-np.clip(z,-500,500)))
class GCNLayer:
"""Single Graph Convolutional Network layer."""
def __init__(self, in_features: int, out_features: int, seed: int = 0):
rng = np.random.default_rng(seed)
scale = np.sqrt(2.0 / in_features)
self.W = rng.normal(0, scale, (in_features, out_features))
self.b = np.zeros(out_features)
def _normalise_adjacency(self, A: np.ndarray) -> np.ndarray:
"""Compute D^{-1/2} (A+I) D^{-1/2}."""
A_hat = A + np.eye(A.shape[0])
D = np.diag(A_hat.sum(axis=1))
D_inv_sqrt = np.diag(1.0 / (np.sqrt(D.diagonal()) + 1e-8))
return D_inv_sqrt @ A_hat @ D_inv_sqrt
def forward(self, H: np.ndarray, A: np.ndarray,
act: bool = True) -> np.ndarray:
"""H: (N, in_features). A: (N,N). Returns (N, out_features)."""
A_norm = self._normalise_adjacency(A)
out = A_norm @ H @ self.W + self.b
return relu(out) if act else out
# Build a small graph (Karate Club-style structure)
rng = np.random.default_rng(42)
N = 20
# Two communities (0-9, 10-19) with sparse cross-community edges
A = np.zeros((N, N))
for i in range(10):
for j in range(i+1, 10):
if rng.random() < 0.4:
A[i,j] = A[j,i] = 1.0
for i in range(10, N):
for j in range(i+1, N):
if rng.random() < 0.4:
A[i,j] = A[j,i] = 1.0
# Few cross-community edges
for _ in range(3):
i, j = rng.integers(0,10), rng.integers(10,N)
A[i,j] = A[j,i] = 1.0
# Node features: random 8-dim
H0 = rng.normal(0, 1, (N, 8))
layer_label = np.array([0]*10 + [1]*10) # community label
# Two GCN layers
gcn1 = GCNLayer(8, 16, seed=0)
gcn2 = GCNLayer(16, 2, seed=1)
H1 = gcn1.forward(H0, A, act=True)
H2 = gcn2.forward(H1, A, act=False) # 2D output for visualisation
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
# Draw graph
def draw_graph(ax, A, colours, title):
pos = {}
for i in range(N):
angle = 2 * np.pi * i / N
r = 1.5 if i < 10 else 1.0
pos[i] = (r * np.cos(angle + np.pi*int(i>=10)), r * np.sin(angle + np.pi*int(i>=10)))
for i in range(N):
for j in range(i+1, N):
if A[i,j] > 0:
ax.plot([pos[i][0], pos[j][0]], [pos[i][1], pos[j][1]], 'gray', lw=0.7, alpha=0.5)
for i in range(N):
ax.scatter(*pos[i], color=colours[i], s=120, zorder=5, edgecolors='black', lw=1)
ax.set_aspect('equal'); ax.set_title(title); ax.axis('off')
node_colors = ['#e74c3c' if l==0 else '#3498db' for l in layer_label]
draw_graph(axes[0], A, node_colors, 'Graph (2 communities)')
axes[1].scatter(H1[:10, 0], H1[:10, 1], color='#e74c3c', s=60, label='Community 0')
axes[1].scatter(H1[10:, 0], H1[10:, 1], color='#3498db', s=60, label='Community 1')
axes[1].set_title('After GCN layer 1 (16-dim → 2 shown)'); axes[1].legend()
axes[2].scatter(H2[:10, 0], H2[:10, 1], color='#e74c3c', s=60, label='Community 0')
axes[2].scatter(H2[10:, 0], H2[10:, 1], color='#3498db', s=60, label='Community 1')
axes[2].set_title('After GCN layer 2 (2-dim output)'); axes[2].legend()
plt.suptitle('GCN: message passing separates community structure', fontsize=11)
plt.tight_layout()
plt.savefig('ch328_gnn.png', dpi=120)
plt.show()4. Graph Attention Networks (GAT)¶
Instead of fixed normalisation weights, GAT learns attention coefficients:
This allows the model to attend differently to different neighbours, capturing heterogeneous relationship strengths.
5. Applications¶
| Domain | Task | GNN variant |
|---|---|---|
| Drug discovery | Molecular property prediction | MPNN, GCN |
| Social networks | Community detection, link prediction | GraphSAGE, GAT |
| Knowledge graphs | Entity classification, relation prediction | R-GCN |
| Recommendation | Item-user interaction graphs | LightGCN |
| Physics simulation | Particle/rigid body dynamics | GNS |
6. Summary¶
GNNs process graph-structured data via message passing: aggregate neighbour features, update node state.
GCN: symmetric normalised adjacency times feature matrix — simple, effective.
GAT: learned attention weights over neighbours — flexible, more expressive.
layers → -hop neighbourhood aggregation.
7. Forward and backward references¶
Used here: graph embeddings (ch195), adjacency matrix (ch195), attention (ch321), layer normalisation (ch310).
This will reappear in ch340 — Capstone II, where graph structure is used to represent relationships between data features.