## Third Session

----------------------------------

## Node Classification with [Deep Graph Library (DGL)](https://docs.dgl.ai/index.html) for the graduate course "[Graph Machine learning](https://github.com/zahta/graph_ml)"
##### by [Zahra Taheri](https://github.com/zahta), 06 June 2023

----------------------------------

### This Tutorial Is Prepared Based on the Following References

- [A Blitz Introduction to DGL](https://docs.dgl.ai/en/latest/tutorials/blitz/index.html)
  
- [User Guide](https://docs.dgl.ai/en/latest/guide/index.html#)


In [1]:
%matplotlib inline
import os

os.environ["DGLBACKEND"] = "pytorch"
import dgl
import numpy as np
import networkx as nx
import torch
import torch.nn as nn
import dgl.function as fn
import torch.nn.functional as F

### Overview of Node Classification with GNN

One of the most popular and widely adopted tasks on graph data is node
classification, where a model needs to predict the ground truth category
of each node. Before graph neural networks, many proposed methods are
using either connectivity alone (such as DeepWalk or node2vec), or simple
combinations of connectivity and the node's own features.  GNNs, by
contrast, offers an opportunity to obtain node representations by
combining the connectivity and features of a *local neighborhood*.

[A paper by Kipf et.al.](https://arxiv.org/abs/1609.02907) is an example that formulates
the node classification problem as a semi-supervised node classification
task. With the help of only a small portion of labeled nodes, a graph
neural network (GNN) can accurately predict the node category of the
others.

This tutorial will show how to build such a GNN for semi-supervised node
classification with only a small number of labels on the Cora
dataset,
a citation network with papers as nodes and citations as edges. The task
is to predict the category of a given paper. Each paper node contains a
word count vector as its features, normalized so that they sum up to one,
as described in Section 5.2 of [the paper](https://arxiv.org/abs/1609.02907).

#### Loading Cora Dataset

In [2]:
dataset = dgl.data.CoraGraphDataset()
print("Number of categories:", dataset.num_classes)

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
Number of categories: 7


A DGL Dataset object may contain one or multiple graphs. The Cora
dataset used in this tutorial only consists of one single graph.




In [3]:
g = dataset[0]

### How can we split the graph?

 <img src="./node_classification.png" alt="app-screen" width="900" />

A DGL graph can store node features and edge features in two
dictionary-like attributes called ``ndata`` and ``edata``.
In the DGL Cora dataset, the graph contains the following node features:

- ``train_mask``: A boolean tensor indicating whether the node is in the
  training set.

- ``val_mask``: A boolean tensor indicating whether the node is in the
  validation set.

- ``test_mask``: A boolean tensor indicating whether the node is in the
  test set.

- ``label``: The ground truth node category.

-  ``feat``: The node features.

In [4]:
print("Node features")
print(g.ndata)
print("Edge features")
print(g.edata)

Node features
{'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'val_mask': tensor([False, False, False,  ..., False, False, False]), 'train_mask': tensor([ True,  True,  True,  ..., False, False, False])}
Edge features
{}


#### Defining a Graph Convolutional Network (GCN)

This tutorial will build a two-layer [Graph Convolutional Network
(GCN)](http://tkipf.github.io/graph-convolutional-networks/). Each
layer computes new node representations by aggregating neighbor
information.

To build a multi-layer GCN you can simply stack ``dgl.nn.GraphConv``
modules, which inherit ``torch.nn.Module``.




In [5]:
from dgl.nn import GraphConv


class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h


# Create the model with given dimensions
model = GCN(g.ndata["feat"].shape[1], 16, dataset.num_classes)

DGL provides implementation of many popular neighbor aggregation
modules. You can easily invoke them with one line of code.




#### Training the GCN

Training this GCN is similar to training other PyTorch neural networks.




In [6]:
def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    val_mask = g.ndata["val_mask"]
    test_mask = g.ndata["test_mask"]
    for e in range(100):
        # Forward
        logits = model(g, features)

        # Compute prediction
        pred = logits.argmax(1)

        # Compute loss
        # Note that you should only compute the losses of the nodes in the training set.
        # loss = F.cross_entropy(F.log_softmax(logits[train_mask], 1), labels[train_mask])
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])


        # Compute accuracy on training/validation/test
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test accuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print(
                "In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})".format(
                    e, loss, val_acc, best_val_acc, test_acc, best_test_acc
                )
            )

In [7]:
model = GCN(g.ndata["feat"].shape[1], 16, dataset.num_classes)
train(g, model)

In epoch 0, loss: 1.945, val acc: 0.114 (best 0.114), test acc: 0.123 (best 0.123)
In epoch 5, loss: 1.888, val acc: 0.416 (best 0.476), test acc: 0.433 (best 0.467)
In epoch 10, loss: 1.802, val acc: 0.554 (best 0.554), test acc: 0.594 (best 0.594)
In epoch 15, loss: 1.695, val acc: 0.636 (best 0.636), test acc: 0.664 (best 0.664)
In epoch 20, loss: 1.566, val acc: 0.658 (best 0.658), test acc: 0.704 (best 0.704)
In epoch 25, loss: 1.418, val acc: 0.700 (best 0.700), test acc: 0.739 (best 0.739)
In epoch 30, loss: 1.257, val acc: 0.714 (best 0.714), test acc: 0.753 (best 0.753)
In epoch 35, loss: 1.089, val acc: 0.728 (best 0.728), test acc: 0.763 (best 0.763)
In epoch 40, loss: 0.925, val acc: 0.738 (best 0.738), test acc: 0.767 (best 0.767)
In epoch 45, loss: 0.772, val acc: 0.752 (best 0.754), test acc: 0.772 (best 0.771)
In epoch 50, loss: 0.636, val acc: 0.770 (best 0.770), test acc: 0.775 (best 0.775)
In epoch 55, loss: 0.519, val acc: 0.772 (best 0.772), test acc: 0.778 (best 0

#### Training on GPU

Training on GPU requires to put both the model and the graph onto GPU
with the ``to`` method, similar to what you will do in PyTorch.
```python

   g = g.to('cuda')
   model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
   train(g, model)
```