# Applying GNN Models

In this lecture, we will continue using the Cora example from the previous lesson. You will learn about:

- Unsupervised GRL
- GNNs for Supervised Downstream Tasks

We will also compare these methods with the approach discussed in the previous lesson.

In [1]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)

# !pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
# !pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
# !pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
# !pip install -q torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}.html

import os.path as osp
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

import torch.nn as nn
import torch.nn.functional as F
from sklearn.linear_model import LogisticRegression
from torch_geometric.loader import LinkNeighborLoader

2.3.0


## Unsupervised Graph Representation Learning with GraphSAGE
Since we aim to learn graph representations through an unsupervised method, we do not use node labels for training.<br>

We assume that if there is a link between a pair of nodes, those nodes should have similar embeddings. Conversely, if there is no link between a pair of nodes, their embeddings should be dissimilar.

Based on this assumption, we can define the following loss function:
 
\begin{equation}
\text{Loss} = - \left( \log \left( \sigma(h_u^{\top} h_v) \right) - \sum_{i=1}^k \log \left( \sigma(h_u^{\top} h_{n_i}) \right) \right) , n_i \sim P_V
\end{equation}

- $\log \left( \sigma (h_u^{\top} h_v) \right)$:  The similarity between the positive sample pair (i.e., true neighbors). Maximizing this term means you want the similarity of positive samples to be as high as possible.
  
- $- \sum_{i=1}^k \log \left( \sigma (h_u^{\top} h_{n_i}) \right)$: The similarity between the negative sample pairs (i.e., non-neighbors). Minimizing this term means you want the similarity of negative samples to be as low as possible.

Once the embeddings are obtained, they are fed into an additional classifier for node classification.

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

dataset = 'Cora'
path = osp.join('.', 'data', dataset)
dataset = Planetoid(root=path, name='Cora', transform=NormalizeFeatures())
data = dataset[0]
print(data)

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])


In [3]:
from torch_geometric.nn import SAGEConv

class GraphSAGE(nn.Module):
    def __init__(self, in_channels, hidden_channels, num_layers):
        super().__init__()
        self.num_layers = num_layers
        self.convs = nn.ModuleList()
        for i in range(num_layers):
            in_channels = in_channels if i == 0 else hidden_channels
            self.convs.append(SAGEConv(in_channels, hidden_channels))

    def forward(self, x, edge_index):
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index)
            if i != self.num_layers - 1:
                x = F.relu(x)
                x = F.dropout(x, p=0.5, training=self.training)
        return x

In [4]:
# define neighbor sampler
train_loader = LinkNeighborLoader(
    data,
    batch_size=256,
    shuffle=True,
    neg_sampling_ratio=1.0,
    num_neighbors=[10, 10],
)

In [5]:
for batch in train_loader:
    print(batch)
    break

Data(x=[2427, 1433], edge_index=[2, 7971], y=[2427], train_mask=[2427], val_mask=[2427], test_mask=[2427], n_id=[2427], e_id=[7971], num_sampled_nodes=[3], num_sampled_edges=[2], input_id=[256], edge_label_index=[2, 512], edge_label=[512])


In [6]:
print("Edge label index: containing both positive and negative edges")
print(batch.edge_label_index)

print("Edge label: 1 stands for positive and 0 stands for negative node pair(edge)")
print(batch.edge_label)

Edge label index: containing both positive and negative edges
tensor([[152, 375, 228,  ..., 377, 495, 683],
        [394, 739, 698,  ..., 336, 650, 794]])
Edge label: 1 stands for positive and 0 stands for negative node pair(edge)
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1

In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GraphSAGE(data.num_node_features, hidden_channels=64, num_layers=2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=1e-4)
model = model.to(device)
x, edge_index = data.x.to(device), data.edge_index.to(device)

In [8]:
# define training and testing functions
def train():
    model.train()

    total_loss = 0
    for batch in train_loader:
        batch = batch.to(device)
        optimizer.zero_grad()
        embedding = model(batch.x, batch.edge_index)
        embedding_src = embedding[batch.edge_label_index[0]]
        embedding_dst = embedding[batch.edge_label_index[1]]
        pred = (embedding_src * embedding_dst).sum(dim=-1)
        loss = F.binary_cross_entropy_with_logits(pred, batch.edge_label)
        loss.backward()
        optimizer.step()

        total_loss += float(loss) * pred.size(0)

    return total_loss / data.num_nodes


@torch.no_grad()
def test():
    model.eval()
    out = model(data.x.to(device), data.edge_index.to(device)).cpu() 

    clf = LogisticRegression()
    clf.fit(out[data.train_mask], data.y[data.train_mask])

    val_acc = clf.score(out[data.val_mask], data.y[data.val_mask])
    test_acc = clf.score(out[data.test_mask], data.y[data.test_mask])

    return val_acc, test_acc

In [9]:
for epoch in range(1, 101):
    loss = train()
    val_acc, test_acc = test()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, '
          f'Val: {val_acc:.4f}, Test: {test_acc:.4f}')

Epoch: 001, Loss: 5.1687, Val: 0.3480, Test: 0.3140
Epoch: 002, Loss: 4.6655, Val: 0.4300, Test: 0.4080
Epoch: 003, Loss: 4.5153, Val: 0.4480, Test: 0.4540
Epoch: 004, Loss: 4.4635, Val: 0.6300, Test: 0.6110
Epoch: 005, Loss: 4.1887, Val: 0.6700, Test: 0.6590
Epoch: 006, Loss: 3.9948, Val: 0.6840, Test: 0.6680
Epoch: 007, Loss: 3.9501, Val: 0.6880, Test: 0.6910
Epoch: 008, Loss: 3.8574, Val: 0.6980, Test: 0.7070
Epoch: 009, Loss: 3.8513, Val: 0.7140, Test: 0.7330
Epoch: 010, Loss: 3.8072, Val: 0.6960, Test: 0.7230
Epoch: 011, Loss: 3.8090, Val: 0.6940, Test: 0.7220
Epoch: 012, Loss: 3.7499, Val: 0.7160, Test: 0.7350
Epoch: 013, Loss: 3.6991, Val: 0.7020, Test: 0.7370
Epoch: 014, Loss: 3.7330, Val: 0.7060, Test: 0.7420
Epoch: 015, Loss: 3.6669, Val: 0.7060, Test: 0.7350
Epoch: 016, Loss: 3.6889, Val: 0.7160, Test: 0.7390
Epoch: 017, Loss: 3.6774, Val: 0.7120, Test: 0.7370
Epoch: 018, Loss: 3.6493, Val: 0.7060, Test: 0.7320
Epoch: 019, Loss: 3.6760, Val: 0.7000, Test: 0.7360
Epoch: 020, 

## Performance comparison
Recall that in the previous examples, we performed node classification in 3 different ways.
1. Bag of word + MLP with `Accuracy:0.6`
2. Node2vec + Logistic regression with `Accuracy:0.703`
3. Node2vec with Bag of word + Logistic regression with `Accuracy:0.707`
4. GraphSAGE: with Bag of word + Logistic regression with `Accuracy:0.803`

We make use of node feature and graph structure at the same time and boost the accuracy up to **0.791** with a simple two-layer `GraphSAGE`.

## End-to-end semi-supervised learning with Graph Convolution Network(GCN)
Previously, we adopt a two stage classification pipeline where we first extract network feature via unsupervised learning then utilize a classifier to predict its label. <br>
The two-stage design could be suboptimal since the network features were not extracted for specific task. <br>
Therefore, we now employ an end-to-end approach. This ensures that the features learned by the model are directly aligned with the specific task, potentially leading to better performance.

In [10]:
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels, cached=True,
                             normalize=True)
        self.conv2 = GCNConv(hidden_channels, out_channels, cached=True,
                             normalize=True)

    def forward(self, x, edge_index, edge_weight=None):
        x = F.dropout(x, p=0.3, training=self.training)
        x = self.conv1(x, edge_index, edge_weight).relu()
        x = F.dropout(x, p=0.3, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        return x

In [11]:
dim = 64
model = GCN(dataset.num_features, dim, dataset.num_classes)
model, data = model.to(device), data.to(device)
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-4)
print(model)

GCN(
  (conv1): GCNConv(1433, 64)
  (conv2): GCNConv(64, 7)
)


In [12]:
def train():
    model.train()
    optimizer.zero_grad()
    out = model(data.x, data.edge_index, data.edge_weight)
    loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return float(loss)

@torch.no_grad()
def test():
    model.eval()
    pred = model(data.x, data.edge_index, data.edge_weight).argmax(dim=-1)
    mask = data.test_mask
    accs = (int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
    return accs

In [13]:
for epoch in range(200):
    loss = train()
    test_acc = test()
    print(f"Loss:{loss:.4f} Testing accuracy:{test_acc:.4f}")

Loss:1.9463 Testing accuracy:0.1710
Loss:1.9444 Testing accuracy:0.2670
Loss:1.9426 Testing accuracy:0.3370
Loss:1.9412 Testing accuracy:0.3930
Loss:1.9397 Testing accuracy:0.4380
Loss:1.9374 Testing accuracy:0.4690
Loss:1.9362 Testing accuracy:0.4950
Loss:1.9337 Testing accuracy:0.5230
Loss:1.9319 Testing accuracy:0.5370
Loss:1.9300 Testing accuracy:0.5460
Loss:1.9269 Testing accuracy:0.5540
Loss:1.9253 Testing accuracy:0.5570
Loss:1.9225 Testing accuracy:0.5650
Loss:1.9205 Testing accuracy:0.5640
Loss:1.9170 Testing accuracy:0.5700
Loss:1.9142 Testing accuracy:0.5690
Loss:1.9115 Testing accuracy:0.5740
Loss:1.9094 Testing accuracy:0.5690
Loss:1.9062 Testing accuracy:0.5630
Loss:1.9021 Testing accuracy:0.5590
Loss:1.8997 Testing accuracy:0.5580
Loss:1.8959 Testing accuracy:0.5530
Loss:1.8936 Testing accuracy:0.5490
Loss:1.8922 Testing accuracy:0.5450
Loss:1.8871 Testing accuracy:0.5470
Loss:1.8845 Testing accuracy:0.5460
Loss:1.8810 Testing accuracy:0.5460
Loss:1.8764 Testing accuracy

## Performance comparison
Recall that in the previous examples, we performed node classification in 3 different ways.
1. Bag of word + MLP with `Accuracy:0.6`
2. Node2vec + Logistic regression with `Accuracy:0.703`
3. Node2vec with Bag of word + Logistic regression with `Accuracy:0.707`
4. GraphSAGE: with Bag of word + Logistic regression with `Accuracy:0.803`
4. GCN: end-to-end learning with `Accuracy:0.818`

From this example, we clearly figure out that using GCN with end2end training leads to the best performance since the feature extracted could be learned to optimize the node classification task.

## Applying different GNN backbone layer
The full list of implemented GNN could be found in [here.](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#convolutional-layers)

In [14]:
from torch_geometric.nn import GCNConv,GATConv, GraphSAGE
import torch.nn.functional as F

class GNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, gnn_type):
        super().__init__()
        if gnn_type == "GCN":
            self.GNN = GCNConv
        elif gnn_type == "SAGE":
            self.GNN = GraphSAGE
        elif gnn_type == "GAT":
            self.GNN = GATConv
        
        self.conv1 = self.GNN(in_channels, hidden_channels)
        self.conv2 = self.GNN(hidden_channels, out_channels)

    def forward(self, x, edge_index, edge_weight=None):
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv1(x, edge_index, edge_weight).relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        return x

In [15]:
dim = 32
gnn_type = "GAT"
model = GNN(dataset.num_features, dim, dataset.num_classes,gnn_type=gnn_type)
model, data = model.to(device), data.to(device)
optimizer = torch.optim.Adam(model.parameters())
print(model)

GNN(
  (conv1): GATConv(1433, 32, heads=1)
  (conv2): GATConv(32, 7, heads=1)
)


# HW: Link prediction with GNN
1. Try different GNN layer
2. Try to optimize the performance by stacking multiple layers
3. Report the best accuracy on testing set and the best model configuration(e.g., how many layers?)

In [16]:
# Let's practice how to use GNN for link prediction
# First we need to load the Cora dataset

from sklearn.metrics import roc_auc_score
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import negative_sampling


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = T.Compose([
    T.NormalizeFeatures(),
    T.ToDevice(device),
    T.RandomLinkSplit(num_val=0.05, num_test=0.1, is_undirected=True,
                      add_negative_train_samples=True),
])
dataset = Planetoid(path, name='Cora', transform=transform)
train_data, val_data, test_data = dataset[0]

In [17]:
print("--------Training data------")
print(train_data)
print("Training edges:")
print(train_data.edge_label_index)
print("Labels")
print(train_data.edge_label)

print()
print("--------Testing data------")
print(test_data)
print("Testing edges:")
print(test_data.edge_label_index)
print("Labels")
print(test_data.edge_label)

--------Training data------
Data(x=[2708, 1433], edge_index=[2, 8976], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_label=[8976], edge_label_index=[2, 8976])
Training edges:
tensor([[1114, 1574, 1309,  ..., 1106, 2504, 2288],
        [1717, 1986, 2103,  ..., 1287,  539,  837]], device='cuda:0')
Labels
tensor([1., 1., 1.,  ..., 0., 0., 0.], device='cuda:0')

--------Testing data------
Data(x=[2708, 1433], edge_index=[2, 9502], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_label=[1054], edge_label_index=[2, 1054])
Testing edges:
tensor([[ 718,  778, 1624,  ..., 1014, 2392, 1564],
        [2261, 1370, 1779,  ..., 1513, 1025,  763]], device='cuda:0')
Labels
tensor([1., 1., 1.,  ..., 0., 0., 0.], device='cuda:0')


In [18]:
class MyGNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        ############################################################################
        # TODO: Your code here! 
        # create you GNN layer here. 
        # try to use different GNN backbone layer or stacking multiple layer to boost performance
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)
        
        ############################################################################

    def forward(self, x, edge_index):
        ############################################################################
        # TODO: Your code here! 
        # Apply the forward pass according to your GNN layers
        # you shoud return the embedding of each node (x has shape [num_nodes, dim])    
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)  
        ############################################################################
        return x
    
    def get_prediction(self, node_embedding, edges):
        # In this function, we have the node embedding and edges as input
        # Input shapes:
        #      node_embedding: (|V|, out_channels)
        #      edges: (2, number of edges)
        # To generate such output, we use the inner product of embeddings of two nodes
        # The output is to generate a scalar for each pair of edge
        embedding_first_node = node_embedding[edges[0]]
        embedding_second_node = node_embedding[edges[1]]
        ############################################################################
        # TODO: Your code here! 
        # implement the element-wise product as edge feature for link prediction
        inner_product = torch.sum(embedding_first_node * embedding_second_node, dim=-1)
        
        ############################################################################
        return inner_product

In [19]:
############################################################################
# TODO: Your code here! 
# initiate your GNN model and select the criterion for link prediction

model = MyGNN(dataset.num_features, 128, 64).to(device)
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01)
criterion = torch.nn.BCEWithLogitsLoss()
############################################################################

In [20]:
# Implement the train function
def train():
    model.train()
    optimizer.zero_grad()
    embedding = model(train_data.x, train_data.edge_index)

    # We perform a new round of negative sampling for every training epoch:
    neg_edge_index = negative_sampling(
        edge_index=train_data.edge_index, num_nodes=train_data.num_nodes,
        num_neg_samples=train_data.edge_label_index.size(1), method='sparse')

    edge_label_index = torch.cat(
        [train_data.edge_label_index, neg_edge_index],
        dim=-1,
    )
    
    # Please assign the target for negative edges
    edge_label = torch.cat([
        train_data.edge_label,
        train_data.edge_label.new_zeros(neg_edge_index.size(1))
    ], dim=0)
    
    # make prediction
    prediction = model.get_prediction(embedding, edge_label_index).view(-1)
    
    # optimization
    loss = criterion(prediction, edge_label)
    loss.backward()
    optimizer.step()
    return loss

In [21]:
# Implement the test function
@torch.no_grad()
def test(data):
    model.eval()
    embedding = model(data.x, data.edge_index)
    
    # use the sigmoid function to normalize our prediction into [0,1]
    out = model.get_prediction(embedding, data.edge_label_index).view(-1).sigmoid()
    return roc_auc_score(data.edge_label.cpu().numpy(), out.cpu().numpy())

In [22]:
best_val_auc = final_test_auc = 0
for epoch in range(1, 101):
    loss = train()
    val_auc = test(val_data)
    test_auc = test(test_data)
    if val_auc > best_val_auc:
        best_val = val_auc
        final_test_auc = test_auc
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Val: {val_auc:.4f}, '
          f'Test: {test_auc:.4f}')

print(f'Final Test: {final_test_auc:.4f}')


Epoch: 001, Loss: 0.6932, Val: 0.6865, Test: 0.7229
Epoch: 002, Loss: 0.6989, Val: 0.7523, Test: 0.7769
Epoch: 003, Loss: 0.6936, Val: 0.7302, Test: 0.7392
Epoch: 004, Loss: 0.6934, Val: 0.5911, Test: 0.6068
Epoch: 005, Loss: 0.6939, Val: 0.5107, Test: 0.5543
Epoch: 006, Loss: 0.6942, Val: 0.6673, Test: 0.6352
Epoch: 007, Loss: 0.6941, Val: 0.6293, Test: 0.6397
Epoch: 008, Loss: 0.6937, Val: 0.6867, Test: 0.6698
Epoch: 009, Loss: 0.6934, Val: 0.7193, Test: 0.6710
Epoch: 010, Loss: 0.6933, Val: 0.6632, Test: 0.6265
Epoch: 011, Loss: 0.6933, Val: 0.6099, Test: 0.6025
Epoch: 012, Loss: 0.6933, Val: 0.6196, Test: 0.6129
Epoch: 013, Loss: 0.6933, Val: 0.6481, Test: 0.6465
Epoch: 014, Loss: 0.6932, Val: 0.7004, Test: 0.6939
Epoch: 015, Loss: 0.6930, Val: 0.7132, Test: 0.7019
Epoch: 016, Loss: 0.6927, Val: 0.7336, Test: 0.7233
Epoch: 017, Loss: 0.6923, Val: 0.7363, Test: 0.7390
Epoch: 018, Loss: 0.6917, Val: 0.7383, Test: 0.7420
Epoch: 019, Loss: 0.6910, Val: 0.7537, Test: 0.7467
Epoch: 020, 