# **Exercise 4: Graph convolution networks**

In this exercise, we will construct our own graph neural network by using PyTorch Geometric (PyG) and apply the model on one of Open Graph Benchmark (OGB) datasets. Those two datasets are used to benchmark the model performance on node property prediction, predicting properties of single nodes graph-related tasks.

  We will build our own graph neural networks by using PyTorch Geometric. And then apply and evaluate the models on node property prediction and grpah property prediction tasks.

**Note**: Make sure to **sequentially run all the cells in each section**, so that the intermediate variables / packages will carry over to the next cell.
You would need to use **GPU** for the exercise so please check the runtime.

PyTorch Geometric generally has two classes for storing or transforming the graphs into tensor format. One is the `torch_geometric.datasets`, which contains a variety of common graph datasets. Another one is `torch_geometric.data` that provides the data handling of graphs in PyTorch tensors.

### Open Graph Benchmark (OGB)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. Its datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can also be evaluated by using the OGB Evaluator in a unified manner.

In [12]:
# Do not delete this cell

## 1. Import Necessary Libraries

In this section, we import the essential Python libraries required for building, training, and evaluating our convolutional residual networks.

We will use:
- **PyTorch** for model definition, training, and evaluation.  
- **Torch_geometric** for loading and transforming the dataset.  
- **NumPy** for  numerical operations.  
- **tqdm** for tracking training progress.

Make sure all required packages are installed before proceeding.

In [13]:
!pip install torch_geometric
!pip install ogb



In [14]:
import torch
import numpy as np
import copy
import torch_geometric
from torch_geometric.datasets import TUDataset
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset, Evaluator
from torch_geometric.nn import GCNConv
# Optional: For reproducibility
import random

# Allowlist the required PyG classes
torch.serialization.add_safe_globals([
    torch_geometric.data.storage.GlobalStorage,
    torch_geometric.data.data.DataEdgeAttr,   # from previous error
    torch_geometric.data.data.DataTensorAttr  # from current error
])

# Set seeds
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)


<torch._C.Generator at 0x7b67f51821b0>

# GNN: Node Property Prediction

In this section we will build our first graph neural network by using PyTorch Geometric and apply it on node property prediction (node classification).

We will build the graph neural network by using GCN operator ([Kipf et al. (2017)](https://arxiv.org/pdf/1609.02907.pdf)).

You should use the PyG built-in `GCNConv` layer directly.

## 2. Download and Load the MNIST Dataset
In this section we will load ogbn-arxiv. This dataset contains the graph network for published research papers and the papers are sorted into different categories based on the provided embeddings and the citation network.

In [15]:
dataset_name = 'ogbn-arxiv'
# dataset = PygNodePropPredDataset(name=dataset_name, transform=T.ToSparseTensor())  # old version of PyG
dataset=PygNodePropPredDataset(name=dataset_name, transform=T.Compose([T.ToUndirected(), T.ToSparseTensor()]))

data = dataset[0]

# Make the adjacency matrix to symmetric
#data.adj_t = data.adj_t.to_symmetric()  # old version of PyG

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# If you use GPU, the device should be cuda
print('Device: {}'.format(device))


data.to(device)
split_idx = dataset.get_idx_split()
train_idx = split_idx['train'].to(device)


Downloading http://snap.stanford.edu/ogb/data/nodeproppred/arxiv.zip


Downloaded 0.08 GB: 100%|██████████| 81/81 [00:01<00:00, 71.29it/s]


Extracting dataset/arxiv.zip


Processing...


Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 1/1 [00:00<00:00, 12671.61it/s]


Converting graphs into PyG objects...


100%|██████████| 1/1 [00:00<00:00, 552.39it/s]

Saving...



Done!
  adj = torch.sparse_csr_tensor(


Device: cuda


In [16]:
print('The {} dataset has {} graph'.format(dataset_name, len(dataset)))
print(data)

The ogbn-arxiv dataset has 1 graph
Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343])


| Attribute   | Shape            | Meaning                                                                |
| ----------- | ---------------- | ---------------------------------------------------------------------- |
| `num_nodes` | 169,343          | Total number of nodes (papers)                                         |
| `x`         | [169343, 128]    | Node features: 128-dimensional embeddings for each paper               |
| `node_year` | [169343, 1]      | Year of publication for each paper (used as a label)                   |
| `y`         | [169343, 1]      | Node labels        |
| `adj_t`     | [169343, 169343] | Sparse adjacency matrix of the citation network (sparse tensor format) |


# 3. Model Definition and Training for GCN Model

In the following section we will now we will design implement our GCN model!
The following model contains convolution layers, batch normalization layers and softmax layer.

Please follow the figure below to implement your `forward` function.


![model architecture](model_arch.png)

In [17]:
import torch.nn.functional as F

class GCN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers,
                 dropout, return_embeds=False):
        # TODO: Implement this function that initializes self.convs,
        # self.bns, and self.softmax.

        super(GCN, self).__init__()

        """
        TODO: Implement this function that initializes self.convs,
        self.bns, and self.softmax.
        Note:
        1. You should use torch.nn.ModuleList for self.convs and self.bns
        2. self.convs has num_layers GCNConv layers. use input_dim, hidden_dim, and output_dim for in_channels and out_channels.
        First layer takes input_dim, hidden_dim and last layer outputs hidden_dim, output_dim
        3. self.bns has num_layers - 1 BatchNorm1d layers
        4. You should use torch.nn.LogSoftmax for self.softmax
        5. The parameters you can set for GCNConv include 'in_channels' and
        'out_channels'.
        6. The only parameter you need to set for BatchNorm1d is 'num_features'
        """
        # YOUR CODE HERE
        # A list of GCNConv layers
        self.convs = torch.nn.ModuleList()
        # first layer
        self.convs.append(GCNConv(input_dim, hidden_dim))
        # middle layers (if any)
        for _ in range(num_layers - 2):
            self.convs.append(GCNConv(hidden_dim, hidden_dim))
        # last layer
        self.convs.append(GCNConv(hidden_dim, output_dim))

        # A list of 1D batch normalization layers
        self.bns = torch.nn.ModuleList()
        for _ in range(num_layers - 1):
            self.bns.append(torch.nn.BatchNorm1d(hidden_dim))

        # The log softmax layer
        self.softmax = torch.nn.LogSoftmax(dim=-1)

        # Probability of an element to be zeroed
        self.dropout = dropout

        # Skip classification layer and return node embeddings
        self.return_embeds = return_embeds

    def reset_parameters(self):
        for conv in self.convs:
            conv.reset_parameters()
        for bn in self.bns:
            bn.reset_parameters()

    def forward(self, x, adj_t):


        out = x
        """
        TODO: Implement this function that takes the feature tensor x,
        edge_index tensor adj_t and returns the output tensor as
        shown in the figure.
        Note:
        1. Construct the network as showing in the figure
        2. torch.nn.functional.relu and torch.nn.functional.dropout are useful
        3. Don't forget to set F.dropout training to self.training
        4. If return_embeds is True, then skip the last softmax layer
        """
        # YOUR CODE HERE
        for i in range(len(self.convs) - 1):
            out = self.convs[i](out, adj_t)
            out = self.bns[i](out)
            out = F.relu(out)
            out = F.dropout(out, p=self.dropout, training=self.training)
        out = self.convs[-1](out, adj_t)
        if self.return_embeds:
            return out
        out = self.softmax(out)

        return out

In [18]:
def train(model, data, train_idx, optimizer, loss_fn):

    model.train()
    loss = 0

    """
    TODO: Implement this function that trains the model by
    using the given optimizer and loss_fn.
    Note:
    1. Zero grad the optimizer
    2. Feed the data into the model
    3. Slicing the model output and label by train_idx
    4. Feed the sliced output and label to loss_fn
    """
    # YOUR CODE HERE
    optimizer.zero_grad()
    out = model(data.x, data.adj_t)
    out_train = out[train_idx]
    y_train = data.y[train_idx].squeeze(1)
    loss = loss_fn(out_train, y_train)

    loss.backward()
    optimizer.step()

    return loss.item()

In [19]:
# Test function here
@torch.no_grad()
def test(model, data, split_idx, evaluator):

    model.eval()

    # The output of model on all data
    out = None


    """
    TODO: Implement this function that tests the model by
    using the given split_idx and evaluator.
    Note:
    1. No index slicing here
    """
    # YOUR CODE HERE
    out = model(data.x, data.adj_t)



    y_pred = out.argmax(dim=-1, keepdim=True)

    train_acc = evaluator.eval({
        'y_true': data.y[split_idx['train']],
        'y_pred': y_pred[split_idx['train']],
    })['acc']
    valid_acc = evaluator.eval({
        'y_true': data.y[split_idx['valid']],
        'y_pred': y_pred[split_idx['valid']],
    })['acc']
    test_acc = evaluator.eval({
        'y_true': data.y[split_idx['test']],
        'y_pred': y_pred[split_idx['test']],
    })['acc']

    return train_acc, valid_acc, test_acc

In [20]:
# Please do not change the args
device = 'cuda' if torch.cuda.is_available() else 'cpu'
args = {
    'device': device,
    'num_layers': 3,
    'hidden_dim': 256,
    'dropout': 0.5,
    'lr': 0.01,
    'epochs': 100,
}
model = GCN(data.num_features, args['hidden_dim'],
            dataset.num_classes, args['num_layers'],
            args['dropout']).to(device)
evaluator = Evaluator(name='ogbn-arxiv')

In [21]:
# reset the parameters to initial random value
model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = F.nll_loss

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
  loss = train(model, data, train_idx, optimizer, loss_fn)
  result = test(model, data, split_idx, evaluator)
  train_acc, valid_acc, test_acc = result
  if valid_acc > best_valid_acc:
      best_valid_acc = valid_acc
      best_model = copy.deepcopy(model)
  if epoch%10==0:
    print(f'Epoch: {epoch:02d}, 'f'Loss: {loss:.4f}, 'f'Train: {100 * train_acc:.2f}%, 'f'Valid: {100 * valid_acc:.2f}% 'f'Test: {100 * test_acc:.2f}%')

Epoch: 10, Loss: 1.3733, Train: 41.49%, Valid: 35.19% Test: 37.99%
Epoch: 20, Loss: 1.1861, Train: 53.97%, Valid: 52.95% Test: 52.70%
Epoch: 30, Loss: 1.0919, Train: 64.93%, Valid: 64.83% Test: 65.32%
Epoch: 40, Loss: 1.0443, Train: 69.47%, Valid: 68.41% Test: 67.57%
Epoch: 50, Loss: 1.0108, Train: 70.84%, Valid: 70.35% Test: 69.95%
Epoch: 60, Loss: 0.9848, Train: 71.88%, Valid: 70.90% Test: 70.47%
Epoch: 70, Loss: 0.9611, Train: 72.64%, Valid: 71.35% Test: 70.58%
Epoch: 80, Loss: 0.9423, Train: 73.11%, Valid: 71.51% Test: 70.57%
Epoch: 90, Loss: 0.9260, Train: 73.43%, Valid: 71.59% Test: 70.75%
Epoch: 100, Loss: 0.9101, Train: 73.82%, Valid: 71.45% Test: 70.32%


In [22]:
best_result = test(model, data, split_idx, evaluator)
train_acc, valid_acc, test_acc = best_result
print(f'Best model: '
      f'Train: {100 * train_acc:.2f}%, '
      f'Valid: {100 * valid_acc:.2f}% '
      f'Test: {100 * test_acc:.2f}%')

Best model: Train: 73.82%, Valid: 71.45% Test: 70.32%


In [23]:
# This cell contains hidden test cases that will be evaluated after submission

In [24]:
# This cell contains hidden test cases that will be evaluated after submission