# **Graph Classification Using Graph Neural Networks (GNNs)**
The **goal** of **this project** is:

To automatically identify the type of a graph
(Path, Cycle, Complete, Star, or Tree)
using a Graph Neural Network (GNN).

So:

Input â†’ a graph (nodes + edges)

Output â†’ graph class (0 to 4)

This is graph-level classification, not node-level.
**This project generates different graph structures, extracts meaningful node features, converts them into PyTorch Geometric format, and trains a Graph Neural Network to correctly classify the entire graph into its structural category.**

In [None]:

!pip -q install torch_geometric

Purpose of this cell

This cell creates graphs and prepares the dataset.

Saves all the code below into a file named graph_generator.py

Allows reuse later using import

ðŸ“Œ Why needed

Clean project structure

Separates data generation from model code
This class is responsible for:

Creating graphs

Adding features

Converting graphs into ML format
Each node gets 3 features

Degree

Clustering coefficient

Betweenness centrality

ðŸ“Œ These are the input features to the GNNLabel tells the model:

0 â†’ Path

1 â†’ Cycle

2 â†’ Complete

3 â†’ Star

4 â†’ Tree
pyTorch Geometric requires:

x â†’ node features

edge_index â†’ connections

y â†’ graph label
Creates:

100 Path

100 Cycle

100 Complete

100 Star

100 Tree

ðŸ“Œ Total = 500 graphs

cell 2 Defines GNN architecture + training + testing
This is a Graph Neural Network
GCN = Graph Convolution

Learns from neighbors

Updates node representations
Converts node features â†’ graph feature

ðŸ“Œ Very important:

We classify whole graph, not nodes
Steps inside training:

Forward pass

Compute loss

Backpropagation

Update weights

Calculate accuracy
Measures:

Overall accuracy

Per-class accuracy

In [None]:
%%writefile graph_generator.py
import networkx as nx
import torch
import numpy as np

class GraphGenerator:
    def __init__(self, seed=42):
        self.seed = seed
        np.random.seed(seed)
        torch.manual_seed(seed)

    def generate_path_graph(self, n: int = 10) -> nx.Graph:
        G = nx.path_graph(n)
        return self._add_features(G, graph_type=0, graph_name='Path')

    def generate_cycle_graph(self, n: int = 10) -> nx.Graph:
        G = nx.cycle_graph(n)
        return self._add_features(G, graph_type=1, graph_name='Cycle')

    def generate_complete_graph(self, n: int = 10) -> nx.Graph:
        G = nx.complete_graph(n)
        return self._add_features(G, graph_type=2, graph_name='Complete')

    def generate_star_graph(self, n: int = 10) -> nx.Graph:
        G = nx.star_graph(n - 1)
        return self._add_features(G, graph_type=3, graph_name='Star')

    def generate_tree_graph(self, height: int = 3, branching: int = 2) -> nx.Graph:
        G = nx.balanced_tree(branching, height)
        return self._add_features(G, graph_type=4, graph_name='Tree')

    def _add_features(self, G: nx.Graph, graph_type: int, graph_name: str) -> nx.Graph:
        clustering = nx.clustering(G)
        try:
            betweenness = nx.betweenness_centrality(G, normalized=True)
        except:
            betweenness = {node: 0.0 for node in G.nodes()}

        for node in G.nodes():
            degree = G.degree(node)
            G.nodes[node]['degree'] = degree
            G.nodes[node]['clustering'] = clustering[node]
            G.nodes[node]['betweenness'] = betweenness[node]
            G.nodes[node]['feature'] = [float(degree), float(clustering[node]), float(betweenness[node])]

        G.graph['label'] = graph_type
        G.graph['name'] = graph_name
        G.graph['num_nodes'] = G.number_of_nodes()
        G.graph['num_edges'] = G.number_of_edges()
        return G

    def networkx_to_pyg(self, G: nx.Graph):
        from torch_geometric.data import Data

        node_list = sorted(G.nodes())
        features = [G.nodes[node]['feature'] for node in node_list]
        x = torch.tensor(features, dtype=torch.float)

        node_map = {node: idx for idx, node in enumerate(node_list)}
        edges = [(node_map[u], node_map[v]) for u, v in G.edges()]

        if len(edges) > 0:
            edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
            edge_index = torch.cat([edge_index, edge_index.flip(0)], dim=1)  # undirected
        else:
            edge_index = torch.empty((2, 0), dtype=torch.long)

        y = torch.tensor([G.graph['label']], dtype=torch.long)
        return Data(x=x, edge_index=edge_index, y=y, num_nodes=len(node_list))

    def generate_dataset(self, graphs_per_type: int = 100, min_nodes: int = 8, max_nodes: int = 15) -> list:
        dataset = []

        for _ in range(graphs_per_type):
            n = np.random.randint(min_nodes, max_nodes + 1)
            dataset.append(self.networkx_to_pyg(self.generate_path_graph(n)))

        for _ in range(graphs_per_type):
            n = np.random.randint(min_nodes, max_nodes + 1)
            dataset.append(self.networkx_to_pyg(self.generate_cycle_graph(n)))

        for _ in range(graphs_per_type):
            n = np.random.randint(min_nodes, max_nodes + 1)
            dataset.append(self.networkx_to_pyg(self.generate_complete_graph(n)))

        for _ in range(graphs_per_type):
            n = np.random.randint(min_nodes, max_nodes + 1)
            dataset.append(self.networkx_to_pyg(self.generate_star_graph(n)))

        for _ in range(graphs_per_type):
            dataset.append(self.networkx_to_pyg(self.generate_tree_graph(height=3, branching=2)))

        return dataset


Overwriting graph_generator.py


CELL 3 â€” Reproducibility & Visualization
Ensures:

Same results every run
Graph examples
example_graphs = {...}


ðŸ“Œ Generates one example of each graph type

Graph statistics
density = m / max_edges


ðŸ“Œ Used to compare graph structures



CELL 4 â€” Model Setup
model = GraphClassifier(...).to(device)


ðŸ“Œ Creates the GNN model

Optimizer
optimizer = torch.optim.Adam(...)


ðŸ“Œ Adam optimizer:

Fast convergence

Common in deep learning
Parameter count
sum(p.numel() for p in model.parameters())


ðŸ“Œ Shows model complexity

In [None]:
%%writefile gnn_model.py
import torch
import torch.nn.functional as F
from torch.nn import Linear, Dropout
from torch_geometric.nn import GCNConv, global_mean_pool

class GraphClassifier(torch.nn.Module):
    def __init__(self, num_node_features: int = 3, hidden_channels: int = 64, num_classes: int = 5, dropout_rate: float = 0.5):
        super().__init__()
        torch.manual_seed(42)

        self.conv1 = GCNConv(num_node_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.conv3 = GCNConv(hidden_channels, hidden_channels)

        self.dropout = Dropout(dropout_rate)
        self.lin1 = Linear(hidden_channels, hidden_channels // 2)
        self.lin2 = Linear(hidden_channels // 2, num_classes)

    def forward(self, x, edge_index, batch):
        h = self.conv1(x, edge_index).relu()
        h = self.dropout(h)

        h = self.conv2(h, edge_index).relu()
        h = self.dropout(h)

        h = self.conv3(h, edge_index).relu()

        h_graph = global_mean_pool(h, batch)

        h_graph = self.lin1(h_graph).relu()
        h_graph = self.dropout(h_graph)
        out = self.lin2(h_graph)
        return out, h_graph

def train_model(model, train_loader, optimizer, device):
    model.train()
    total_loss, correct, total = 0.0, 0, 0

    for data in train_loader:
        data = data.to(device)
        optimizer.zero_grad()

        out, _ = model(data.x, data.edge_index, data.batch)
        loss = F.cross_entropy(out, data.y)

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        pred = out.argmax(dim=1)
        correct += (pred == data.y).sum().item()
        total += data.y.size(0)

    return total_loss / len(train_loader), correct / total

def test_model(model, loader, device, num_classes=5):
    model.eval()
    correct, total = 0, 0
    class_correct = [0]*num_classes
    class_total = [0]*num_classes

    with torch.no_grad():
        for data in loader:
            data = data.to(device)
            out, _ = model(data.x, data.edge_index, data.batch)
            pred = out.argmax(dim=1)

            correct += (pred == data.y).sum().item()
            total += data.y.size(0)

            for i in range(data.y.size(0)):
                y = data.y[i].item()
                class_total[y] += 1
                if pred[i].item() == y:
                    class_correct[y] += 1

    overall = correct / total
    per_class = [(class_correct[i]/class_total[i] if class_total[i] else 0.0) for i in range(num_classes)]
    return overall, per_class

def predict_single_graph(model, graph_data, device):
    model.eval()
    with torch.no_grad():
        graph_data = graph_data.to(device)
        batch = torch.zeros(graph_data.x.size(0), dtype=torch.long, device=device)
        out, _ = model(graph_data.x, graph_data.edge_index, batch)
        probs = F.softmax(out, dim=1)
        pred = out.argmax(dim=1).item()
        conf = probs[0, pred].item()
    return pred, conf, probs[0].detach().cpu().numpy()


Overwriting gnn_model.py


In [None]:
%%writefile visualizer.py
from pyvis.network import Network
import os

class GraphVisualizer:
    def __init__(self, height='700px', width='100%', output_dir='outputs'):
        self.height = height
        self.width = width
        self.output_dir = output_dir
        self.colors = {
            'path': '#FF6B6B',
            'cycle': '#4ECDC4',
            'complete': '#45B7D1',
            'star': '#FFA07A',
            'tree': '#98D8C8'
        }
        os.makedirs(output_dir, exist_ok=True)

    def visualize_graph(self, G, graph_name, output_file, prediction=None, confidence=None):
        net = Network(height=self.height, width=self.width, notebook=True, heading='')
        net.toggle_physics(True)

        color = self.colors.get(graph_name.lower(), '#95E1D3')

        for node in G.nodes():
            degree = G.degree(node)
            clustering = G.nodes[node].get('clustering', 0.0)
            betweenness = G.nodes[node].get('betweenness', 0.0)

            title = f"""
            <b>Node {node}</b><br>
            Degree: {degree}<br>
            Clustering: {clustering:.3f}<br>
            Betweenness: {betweenness:.3f}
            """
            size = 15 + degree * 4
            net.add_node(node, label=str(node), title=title, color=color, size=size)

        for u, v in G.edges():
            net.add_edge(u, v, color='#888888', width=2)

        header = f"<h2>{graph_name.capitalize()} Graph</h2>"
        header += f"<p>Vertices: {G.number_of_nodes()} | Edges: {G.number_of_edges()}</p>"
        if prediction is not None:
            header += f"<p><b>Predicted:</b> {prediction}"
            if confidence is not None:
                header += f" (Confidence: {confidence:.2%})"
            header += "</p>"

        outpath = os.path.join(self.output_dir, output_file)
        net.save_graph(outpath)

        # Insert header after <body>
        with open(outpath, "r", encoding="utf-8") as f:
            html = f.read()
        html = html.replace("<body>", f"<body><div style='text-align:center;'>{header}</div>")
        with open(outpath, "w", encoding="utf-8") as f:
            f.write(html)

        print("Saved:", outpath)


Overwriting visualizer.py


In [None]:
import os, random
import numpy as np
import pandas as pd
import torch

from graph_generator import GraphGenerator
from visualizer import GraphVisualizer


In [None]:
# Reproducibility
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

generator = GraphGenerator(seed=seed)
visualizer = GraphVisualizer(output_dir="outputs")

example_graphs = {
    'path': generator.generate_path_graph(10),
    'cycle': generator.generate_cycle_graph(10),
    'complete': generator.generate_complete_graph(8),
    'star': generator.generate_star_graph(10),
    'tree': generator.generate_tree_graph(height=3, branching=2)
}

rows = []
for name, G in example_graphs.items():
    n = G.number_of_nodes()
    m = G.number_of_edges()
    max_edges = n*(n-1)/2
    density = (m/max_edges) if max_edges else 0
    rows.append([name, n, m, density])

df = pd.DataFrame(rows, columns=["Graph Type", "Vertices |V|", "Edges |E|", "Density"])
df


Unnamed: 0,Graph Type,Vertices |V|,Edges |E|,Density
0,path,10,9,0.2
1,cycle,10,10,0.222222
2,complete,8,28,1.0
3,star,10,9,0.2
4,tree,15,14,0.133333


In [None]:
for name, G in example_graphs.items():
    visualizer.visualize_graph(G, name, f"{name}_graph.html")


Saved: outputs/path_graph.html
Saved: outputs/cycle_graph.html
Saved: outputs/complete_graph.html
Saved: outputs/star_graph.html
Saved: outputs/tree_graph.html


In [None]:
from torch_geometric.loader import DataLoader

dataset = generator.generate_dataset(graphs_per_type=100, min_nodes=8, max_nodes=15)
random.shuffle(dataset)

train_size = int(0.8 * len(dataset))
train_dataset = dataset[:train_size]
test_dataset  = dataset[train_size:]

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader  = DataLoader(test_dataset, batch_size=32, shuffle=False)

print("Total graphs:", len(dataset))
print("Train:", len(train_dataset), "Test:", len(test_dataset))
print("Node feature dimension:", dataset[0].x.shape[1])


Total graphs: 500
Train: 400 Test: 100
Node feature dimension: 3


In [None]:
from gnn_model import GraphClassifier, train_model, test_model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GraphClassifier(num_node_features=3, hidden_channels=64, num_classes=5, dropout_rate=0.5).to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

print("Device:", device)
print("Total parameters:", sum(p.numel() for p in model.parameters()))


Device: cuda
Total parameters: 10821


In [None]:
best_acc = 0.0
best_path = "outputs/best_model.pth"
os.makedirs("outputs", exist_ok=True)

EPOCHS = 50  # you can set 100 like your original project

for epoch in range(1, EPOCHS+1):
    train_loss, train_acc = train_model(model, train_loader, optimizer, device)

    if epoch == 1 or epoch % 10 == 0:
        test_acc, _ = test_model(model, test_loader, device, num_classes=5)
        print(f"Epoch {epoch:3d} | loss={train_loss:.4f} | train_acc={train_acc:.4f} | test_acc={test_acc:.4f}")

        if test_acc > best_acc:
            best_acc = test_acc
            torch.save(model.state_dict(), best_path)

print("Best saved test accuracy:", best_acc)


Epoch   1 | loss=1.4431 | train_acc=0.2925 | test_acc=0.3300
Epoch  10 | loss=0.4789 | train_acc=0.7600 | test_acc=0.8400
Epoch  20 | loss=0.4365 | train_acc=0.7475 | test_acc=1.0000
Epoch  30 | loss=0.3515 | train_acc=0.8450 | test_acc=0.8200
Epoch  40 | loss=0.3552 | train_acc=0.8075 | test_acc=0.8200
Epoch  50 | loss=0.4326 | train_acc=0.7925 | test_acc=0.9500
Best saved test accuracy: 1.0


In [None]:
class_names = ['Path', 'Cycle', 'Complete', 'Star', 'Tree']

model.load_state_dict(torch.load(best_path, map_location=device))
test_acc, class_accs = test_model(model, test_loader, device, num_classes=5)

print("Overall test accuracy:", test_acc)
for name, acc in zip(class_names, class_accs):
    print(f"{name:10s}: {acc:.4f}")


Overall test accuracy: 1.0
Path      : 1.0000
Cycle     : 1.0000
Complete  : 1.0000
Star      : 1.0000
Tree      : 1.0000


In [None]:
from gnn_model import predict_single_graph

test_graphs = {
    'path_test': generator.generate_path_graph(12),
    'cycle_test': generator.generate_cycle_graph(11),
    'complete_test': generator.generate_complete_graph(7),
    'star_test': generator.generate_star_graph(13),
    'tree_test': generator.generate_tree_graph(height=3, branching=2)
}

for name, G in test_graphs.items():
    data = generator.networkx_to_pyg(G)
    pred_class, conf, probs = predict_single_graph(model, data, device)

    true_label = class_names[G.graph['label']]
    pred_label = class_names[pred_class]

    print(f"{name:12s} | true={true_label:8s} | pred={pred_label:8s} | conf={conf:.2%}")

    visualizer.visualize_graph(G, name, f"{name}_predicted.html", prediction=pred_label, confidence=conf)


path_test    | true=Path     | pred=Path     | conf=99.85%
Saved: outputs/path_test_predicted.html
cycle_test   | true=Cycle    | pred=Cycle    | conf=50.86%
Saved: outputs/cycle_test_predicted.html
complete_test | true=Complete | pred=Complete | conf=100.00%
Saved: outputs/complete_test_predicted.html
star_test    | true=Star     | pred=Star     | conf=98.15%
Saved: outputs/star_test_predicted.html
tree_test    | true=Tree     | pred=Tree     | conf=50.81%
Saved: outputs/tree_test_predicted.html


In [None]:
from google.colab import files
files.download("outputs/path_graph.html")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>