# MATH2603 Lab 5 — Spectrum & Centrality
## Laplacian spectrum (Fiedler value/vector) and network centrality measures

**Aligned with the lecture (PPT 5):** we connect **graph structure** to:
- **Spectrum** of the Laplacian \(L = D - A\): connectivity, components, Fiedler value/vector
- **Centrality**: degree, closeness, betweenness, eigenvector centrality, PageRank

### What you will do 
**Part A : Spectrum**
1. Build graphs and compute \(A, D, L\)
2. Compute eigenvalues of \(L\) and relate them to connected components
3. Visualise the **Fiedler vector** and use it for a simple 2-way split

**Part B : Centrality**
1. Compare centralities on different graph shapes
2. Explain why measures disagree
3. Run PageRank and vary damping factor

> **How to run:** click a code cell and press **Shift + Enter**.


## 0) Setup check (run first)

If you see `ModuleNotFoundError`, install packages in a terminal:

```bash
pip install numpy matplotlib networkx
```


In [None]:
import sys
print("Python:", sys.version.split()[0])

import numpy as np

try:
    import networkx as nx
    import matplotlib.pyplot as plt
    print("numpy:", np.__version__)
    print("networkx:", nx.__version__)
    print("matplotlib:", plt.matplotlib.__version__)
except Exception as e:
    print("Missing packages. Error:", e)


# Part A — Spectrum (Laplacian)

We work with:
- adjacency matrix \(A\)
- degree matrix \(D\)
- Laplacian \(L = D - A\)

Key fact:
- number of zero eigenvalues of \(L\) = number of connected components


## A1) Create a small graph and compute A, D, L

### Task A1
Run the next cell, then check:
1. Is \(L = D - A\)?
2. Do rows of \(L\) sum to 0?


In [None]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
edges = [(0,1),(1,2),(2,3),(3,0),
         (2,4),(4,5),(5,6),(6,4),
         (6,7)]
G.add_edges_from(edges)

print("Nodes:", G.number_of_nodes(), "Edges:", G.number_of_edges())
print("Connected components:", nx.number_connected_components(G))

A = nx.to_numpy_array(G, nodelist=sorted(G.nodes()), dtype=float)
D = np.diag(A.sum(axis=1))
L = D - A

print("Check L = D - A:", np.allclose(L, D - A))
print("Row sums of L (should be ~0):", np.round(L.sum(axis=1), 10))


### (Optional) Draw the graph

In [None]:
plt.figure(figsize=(6,4))
pos = nx.spring_layout(G, seed=2)
nx.draw(G, pos, with_labels=True, node_size=700)
plt.title("Graph for Laplacian spectrum experiments")
plt.show()


## A2) Eigenvalues of the Laplacian and connected components

### Task A2
1. Compute eigenvalues of \(L\) and sort them.
2. Count how many are “zero” (tolerance 1e-8).
3. Compare with `nx.number_connected_components(G)`.


In [None]:
eigvals, eigvecs = np.linalg.eigh(L)
eigvals_sorted = np.sort(eigvals)

tol = 1e-8
num_zero = int(np.sum(np.abs(eigvals_sorted) < tol))

print("Eigenvalues of L (sorted):")
print(np.round(eigvals_sorted, 6))
print("Number of ~zero eigenvalues:", num_zero)
print("Connected components (NetworkX):", nx.number_connected_components(G))


### Task A2B: Make the graph disconnected

Remove one edge to split the graph, then recompute.
Try removing edge `(2,4)` or `(6,7)` and see what changes.


In [None]:
G2 = G.copy()
G2.remove_edge(2, 4)  # TODO: try (6,7) too

A2 = nx.to_numpy_array(G2, nodelist=sorted(G2.nodes()), dtype=float)
D2 = np.diag(A2.sum(axis=1))
L2 = D2 - A2

eigvals2, eigvecs2 = np.linalg.eigh(L2)
eigvals2_sorted = np.sort(eigvals2)

num_zero2 = int(np.sum(np.abs(eigvals2_sorted) < tol))

print("Connected components (G2):", nx.number_connected_components(G2))
print("Eigenvalues (G2) sorted:")
print(np.round(eigvals2_sorted, 6))
print("Number of ~zero eigenvalues:", num_zero2)

plt.figure(figsize=(6,4))
pos2 = nx.spring_layout(G2, seed=2)
nx.draw(G2, pos2, with_labels=True, node_size=700)
plt.title("Disconnected graph G2")
plt.show()


## A3) Fiedler value and Fiedler vector

For a connected graph:
- smallest eigenvalue \(\lambda_1 = 0\)
- second smallest eigenvalue \(\lambda_2\) = **Fiedler value**
- eigenvector for \(\lambda_2\) = **Fiedler vector**

We will colour nodes by Fiedler vector values and do a simple 2-way split.


In [None]:
nodelist = sorted(G.nodes())
A = nx.to_numpy_array(G, nodelist=nodelist, dtype=float)
D = np.diag(A.sum(axis=1))
L = D - A

eigvals, eigvecs = np.linalg.eigh(L)

lambda2 = eigvals[1]
v2 = eigvecs[:, 1]

print("Fiedler value lambda2 =", float(lambda2))

plt.figure(figsize=(6,4))
pos = nx.spring_layout(G, seed=2)
nx.draw(G, pos, with_labels=True, node_size=750, node_color=v2, cmap=plt.cm.coolwarm)
plt.title("Nodes coloured by Fiedler vector v2")
plt.show()


In [None]:
group_pos = [nodelist[i] for i, val in enumerate(v2) if val >= 0]
group_neg = [nodelist[i] for i, val in enumerate(v2) if val < 0]

print("Group v2 >= 0:", group_pos)
print("Group v2 <  0:", group_neg)

plt.figure(figsize=(6,4))
pos = nx.spring_layout(G, seed=2)
colors = ["tab:blue" if n in group_pos else "tab:orange" for n in nodelist]
nx.draw(G, pos, with_labels=True, node_size=750, node_color=colors)
plt.title("Simple spectral split by sign of Fiedler vector")
plt.show()


### Short answers (Part A)

Write 2–4 sentences each.

1. What did the number of zero eigenvalues tell you about connected components?
2. What is the Fiedler value/vector, and what does it help us do?


**Your answers here:**

1.  
2.  


# Part B — Centrality

We will compute and compare:
- Degree centrality
- Closeness centrality
- Betweenness centrality
- Eigenvector centrality
- PageRank

Different centralities capture **different meanings** of “importance”.


## B1) Compare centralities on three classic graphs

Graphs:
1. **Star** (one hub)
2. **Path** (a line)
3. **Barbell** (two dense groups connected by a bridge)

Run the cell and compare which nodes are “most central” under each measure.


In [None]:
def compute_centralities(H):
    deg = nx.degree_centrality(H)
    clo = nx.closeness_centrality(H)
    bet = nx.betweenness_centrality(H, normalized=True)
    try:
        eig = nx.eigenvector_centrality(H, max_iter=1000)
    except Exception as e:
        eig = None
        print("Eigenvector centrality did not converge:", e)
    return deg, clo, bet, eig

def draw_with_node_size(H, scores, title):
    vals = np.array([scores[n] for n in H.nodes()])
    sizes = 300 + 2000 * (vals - vals.min()) / (vals.max() - vals.min() + 1e-12)
    plt.figure(figsize=(6,4))
    pos = nx.spring_layout(H, seed=2)
    nx.draw(H, pos, with_labels=True, node_size=sizes)
    plt.title(title)
    plt.show()

def top_k(scores, k=3):
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)[:k]

graphs = {
    "Star (n=10)": nx.star_graph(9),
    "Path (n=10)": nx.path_graph(10),
    "Barbell (m1=5,m2=1)": nx.barbell_graph(5, 1)
}

for name, H in graphs.items():
    print("\n=== ", name, " ===")
    deg, clo, bet, eig = compute_centralities(H)

    print("Top-3 Degree:", top_k(deg))
    print("Top-3 Closeness:", top_k(clo))
    print("Top-3 Betweenness:", top_k(bet))
    if eig is not None:
        print("Top-3 Eigenvector:", top_k(eig))

    draw_with_node_size(H, deg, f"{name} — node size ∝ Degree centrality")
    draw_with_node_size(H, clo, f"{name} — node size ∝ Closeness centrality")
    draw_with_node_size(H, bet, f"{name} — node size ∝ Betweenness centrality")


## B2) Explain differences (short writing)

Write 2–4 sentences each.

1. In the **star graph**, why is the centre node high for degree, closeness, and betweenness?
2. In the **barbell graph**, which nodes have the highest betweenness? Why?
3. What does eigenvector centrality try to capture (in one sentence)?


**Your answers here:**

1.  
2.  
3.  


## B3) PageRank (random surfer)

Compute PageRank for two different `alpha` values (0.85 and 0.50) and compare rankings.


In [None]:
DG = nx.DiGraph()
DG.add_edges_from([
    ("A", "B"),
    ("A", "C"),
    ("B", "C"),
    ("C", "A"),
    ("C", "D"),
    ("D", "C"),
    ("D", "E"),
    ("E", "D"),
])

plt.figure(figsize=(6,4))
pos = nx.spring_layout(DG, seed=3)
nx.draw(DG, pos, with_labels=True, node_size=700, arrows=True, arrowstyle="->", arrowsize=15)
plt.title("Toy directed graph for PageRank")
plt.show()

for alpha in [0.85, 0.50]:
    pr = nx.pagerank(DG, alpha=alpha)
    ranked = sorted(pr.items(), key=lambda x: x[1], reverse=True)
    print(f"\nalpha={alpha}")
    for node, score in ranked:
        print(f"  {node}: {score:.4f}")


### Task B3B (short answer)

1. Conceptually, what does changing `alpha` do?
2. Did the ranking change between alpha = 0.85 and 0.50? Why might it?


**Your answers here:**

1.  
2.  


## B4) Wrap-up reflection

Answer briefly:

1. One situation where **betweenness** is more useful than **degree**.
2. One situation where **PageRank** makes more sense than **eigenvector centrality**.
3. One-sentence main message of this lab.


**Your answers here:**

1.  
2.  
3.  


---
## Troubleshooting

- Eigenvector centrality may fail to converge: increase `max_iter` or treat it as optional.
- Tiny negative eigenvalues (e.g. -1e-12) are numerical rounding; treat as 0.
- If plots do not appear in VS Code, try running in browser Jupyter Notebook.
