Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1. When data is a graph

Not all data lives in a grid (images) or sequence (text). Molecules, social networks, knowledge bases, and citation graphs are naturally represented as graphs.

A graph G=(V,E)G = (V, E) has nodes viv_i with features hih_i and edges (vi,vj)(v_i, v_j) with optional features eije_{ij}. The key challenge: define neural operations that respect graph structure.

(Graph embedding: ch195. Matrix operations: ch151.)


2. Message Passing Framework

All major GNN variants fit the message passing paradigm:

hi(l+1)=UPDATE(hi(l),  AGGREGATE({MSG(hi(l),hj(l),eij):jN(i)}))h_i^{(l+1)} = \text{UPDATE}\left(h_i^{(l)},\; \text{AGGREGATE}\left(\{\text{MSG}(h_i^{(l)}, h_j^{(l)}, e_{ij}) : j \in \mathcal{N}(i)\}\right)\right)
  1. MSG: each neighbour sends a message to viv_i.

  2. AGGREGATE: combine all incoming messages (sum, mean, max).

  3. UPDATE: update viv_i’s representation given its old state and aggregated messages.

After LL rounds of message passing, each node’s representation incorporates information from its LL-hop neighbourhood.


3. Graph Convolutional Network (GCN)

H(l+1)=σ(D~1/2A~D~1/2H(l)W(l))H^{(l+1)} = \sigma\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(l)} W^{(l)}\right)

where A~=A+I\tilde{A} = A + I (adjacency + self-loops) and D~\tilde{D} is the degree matrix.

The symmetric normalisation D~1/2A~D~1/2\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} is a degree-normalised averaging of neighbour features. Each layer is a linear transformation of the averaged neighbourhood.

(Eigendecomposition motivation: ch175. Adjacency matrix: ch195.)

import numpy as np
import matplotlib.pyplot as plt


def relu(z): return np.maximum(0, z)
def sigmoid(z): return 1/(1+np.exp(-np.clip(z,-500,500)))


class GCNLayer:
    """Single Graph Convolutional Network layer."""

    def __init__(self, in_features: int, out_features: int, seed: int = 0):
        rng = np.random.default_rng(seed)
        scale = np.sqrt(2.0 / in_features)
        self.W = rng.normal(0, scale, (in_features, out_features))
        self.b = np.zeros(out_features)

    def _normalise_adjacency(self, A: np.ndarray) -> np.ndarray:
        """Compute D^{-1/2} (A+I) D^{-1/2}."""
        A_hat = A + np.eye(A.shape[0])
        D = np.diag(A_hat.sum(axis=1))
        D_inv_sqrt = np.diag(1.0 / (np.sqrt(D.diagonal()) + 1e-8))
        return D_inv_sqrt @ A_hat @ D_inv_sqrt

    def forward(self, H: np.ndarray, A: np.ndarray,
                act: bool = True) -> np.ndarray:
        """H: (N, in_features). A: (N,N). Returns (N, out_features)."""
        A_norm = self._normalise_adjacency(A)
        out = A_norm @ H @ self.W + self.b
        return relu(out) if act else out


# Build a small graph (Karate Club-style structure)
rng = np.random.default_rng(42)
N = 20
# Two communities (0-9, 10-19) with sparse cross-community edges
A = np.zeros((N, N))
for i in range(10):
    for j in range(i+1, 10):
        if rng.random() < 0.4:
            A[i,j] = A[j,i] = 1.0
for i in range(10, N):
    for j in range(i+1, N):
        if rng.random() < 0.4:
            A[i,j] = A[j,i] = 1.0
# Few cross-community edges
for _ in range(3):
    i, j = rng.integers(0,10), rng.integers(10,N)
    A[i,j] = A[j,i] = 1.0

# Node features: random 8-dim
H0 = rng.normal(0, 1, (N, 8))
layer_label = np.array([0]*10 + [1]*10)  # community label

# Two GCN layers
gcn1 = GCNLayer(8, 16, seed=0)
gcn2 = GCNLayer(16, 2, seed=1)

H1 = gcn1.forward(H0, A, act=True)
H2 = gcn2.forward(H1, A, act=False)  # 2D output for visualisation

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Draw graph
def draw_graph(ax, A, colours, title):
    pos = {}
    for i in range(N):
        angle = 2 * np.pi * i / N
        r = 1.5 if i < 10 else 1.0
        pos[i] = (r * np.cos(angle + np.pi*int(i>=10)), r * np.sin(angle + np.pi*int(i>=10)))
    for i in range(N):
        for j in range(i+1, N):
            if A[i,j] > 0:
                ax.plot([pos[i][0], pos[j][0]], [pos[i][1], pos[j][1]], 'gray', lw=0.7, alpha=0.5)
    for i in range(N):
        ax.scatter(*pos[i], color=colours[i], s=120, zorder=5, edgecolors='black', lw=1)
    ax.set_aspect('equal'); ax.set_title(title); ax.axis('off')

node_colors = ['#e74c3c' if l==0 else '#3498db' for l in layer_label]
draw_graph(axes[0], A, node_colors, 'Graph (2 communities)')

axes[1].scatter(H1[:10, 0], H1[:10, 1], color='#e74c3c', s=60, label='Community 0')
axes[1].scatter(H1[10:, 0], H1[10:, 1], color='#3498db', s=60, label='Community 1')
axes[1].set_title('After GCN layer 1 (16-dim → 2 shown)'); axes[1].legend()

axes[2].scatter(H2[:10, 0], H2[:10, 1], color='#e74c3c', s=60, label='Community 0')
axes[2].scatter(H2[10:, 0], H2[10:, 1], color='#3498db', s=60, label='Community 1')
axes[2].set_title('After GCN layer 2 (2-dim output)'); axes[2].legend()

plt.suptitle('GCN: message passing separates community structure', fontsize=11)
plt.tight_layout()
plt.savefig('ch328_gnn.png', dpi=120)
plt.show()

4. Graph Attention Networks (GAT)

Instead of fixed normalisation weights, GAT learns attention coefficients:

αij=exp(LeakyReLU(a[WhiWhj]))kN(i)exp(LeakyReLU(a[WhiWhk]))\alpha_{ij} = \frac{\exp(\text{LeakyReLU}(a^\top [W h_i \| W h_j]))}{\sum_{k \in \mathcal{N}(i)} \exp(\text{LeakyReLU}(a^\top [W h_i \| W h_k]))}

This allows the model to attend differently to different neighbours, capturing heterogeneous relationship strengths.


5. Applications

DomainTaskGNN variant
Drug discoveryMolecular property predictionMPNN, GCN
Social networksCommunity detection, link predictionGraphSAGE, GAT
Knowledge graphsEntity classification, relation predictionR-GCN
RecommendationItem-user interaction graphsLightGCN
Physics simulationParticle/rigid body dynamicsGNS

6. Summary

  • GNNs process graph-structured data via message passing: aggregate neighbour features, update node state.

  • GCN: symmetric normalised adjacency times feature matrix — simple, effective.

  • GAT: learned attention weights over neighbours — flexible, more expressive.

  • LL layers → LL-hop neighbourhood aggregation.


7. Forward and backward references

Used here: graph embeddings (ch195), adjacency matrix (ch195), attention (ch321), layer normalisation (ch310).

This will reappear in ch340 — Capstone II, where graph structure is used to represent relationships between data features.