**GCN**

This is the first GCN ralted paper. It proposed a very easy way to realize algiorithm learning graph embedding.

To simplify, we only discuss the final form of the formula: $Z=f(X, A)=$$softmax(\hat A\ ReLU(\hat AXW^{(0)})W^{(1)})$, where $\hat A=\bar D^{-\frac{1}{2}}\bar A\bar D^{-\frac{1}{2}}$, $\bar A=A+I_N$, $\bar D_{ii}=\sum_j\bar A_{ij}$, A is the adjacency matrix of X.

Now we explain the formula step by step. First, we should add **self-loop** to each node to form $\bar A$. Why we should do this? because we learning node embedding through passing information through edges. To get information from the **node itself**, we should add self-loops. In this case, $\bar A$ means information will pass from neighbors and node itself.

Second, why we need $\hat A$. As we can see, $\bar D$ is the degree of each node(including self-loop). According to experience from real life, the more relations one person has, the **less influence one** relation could have on him or her, so for each node, we should **divide** its adjacency matrix by the its degree(the adjacency matrix actually represents **weights** of edges between nodes)

Third, we multiply $\hat A$ with X. Why should we do this? To understand this formula, we should know the dimension of $\hat A$ and X. $\hat A$ is a **[num of nodes, num of nodes]** dimension vector, and X is **[num of nodes, size of node's feature]** dimension vector. According to matrix multiplication, we use each row of $\hat A$ to multiply with each column of X to get each row of final result. The ith row of $\hat A$ is the **normalized weights of edge to node i**, and the jth column of X is the **jth feature** of all nodes. After multiply ith row and jth column, we get the **(i, j)** cell of the final result, and it presents the **ith node's jth feature**(all nodes' jth feature passes through normalized edges to node i). After we get all these done, we get $\hat X$

Fourth, we use $\hat X$ multiplies with $W^{(0)}$. Also, we should check the dimensions of these two matrix. $\hat X$ is a **[num of nodes, size of node's feature]** dimension vector, and $W^{(0)}$ is a **[size of node's feature, size of hidden feature]**. This contains the same idea of the third step. We just **project** the feature to another space.

Fifth, we use ReLU as **non-Linear** operation to **improve the ability to express information** of the neural network model.

Sixth, we **repeat** step three to four once. Actually, if we add more Linear layer, we should repeat more times, but the author of this paper only use two layers.(Too much layer of NN model not necessarily improve the performance of NN model, because it may lead representations of all nodes **to be the same**)

Finally, we use **softmax** to calculate the probability of nodes belonging to which label.

GCN model will take the entrie graph as input, so it's a **transductive** algorithm.

In [None]:
import torch

def format_pytorch_version(version):
  return version.split('+')[0]

TORCH_version = torch.__version__
TORCH = format_pytorch_version(TORCH_version)

def format_cuda_version(version):
  return 'cu' + version.replace('.', '')

CUDA_version = torch.version.cuda
CUDA = format_cuda_version(CUDA_version)

!pip install torch-scatter     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-sparse      -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-cluster     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-geometric 

Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-scatter
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl (7.9 MB)
[K     |████████████████████████████████| 7.9 MB 4.2 MB/s 
[?25hInstalling collected packages: torch-scatter
Successfully installed torch-scatter-2.0.9
Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-sparse
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_sparse-0.6.13-cp37-cp37m-linux_x86_64.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 4.4 MB/s 
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.13
Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-cluster
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_cluster-1.6.0-cp37-cp37m-linux_x86_64.whl (2.5 MB)
[K     |████████████████████████████████| 2.5

In [None]:
from torch_geometric.datasets import Planetoid
import os
import re
import random
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from collections import defaultdict
import numpy as np
from sklearn.cluster import KMeans
import networkx as nx
import matplotlib.pyplot as plt

In [None]:
dataset = Planetoid(root='/tmp/Cora', name='Cora')
nodenum = dataset.data.num_nodes
A = torch.zeros(nodenum, nodenum)
edges = dataset.data.edge_index.T
for edge in edges:
  A[edge[0]][edge[1]] += 1
I = torch.eye(nodenum)
A += I
print(A)

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 1.],
        [0., 0., 0.,  ..., 0., 1., 1.]])


In [None]:
D = torch.zeros(nodenum, nodenum)
for i in range(nodenum):
  D[i][i] = torch.pow(sum(A[i]), -0.5)
  
print(D)

tensor([[0.5000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.5000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.4082,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.7071, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.4472, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.4472]])


In [None]:
Ahat = torch.mm(torch.mm(D, A), D)
print(Ahat)

tensor([[0.2500, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.2500, 0.2041,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.2041, 0.1667,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.5000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.2000, 0.2000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.2000, 0.2000]])


In [None]:
class GCN(nn.Module):
  def __init__(self, infeature, hiddenfeature, outfeature, A):
    super(GCN, self).__init__()
    self.infeature = infeature
    self.hiddenfeature = hiddenfeature
    self.outfeature = outfeature
    self.A = A
    self.relu = F.relu
    self.softmax = F.log_softmax
    self.linear1 = nn.Linear(infeature, hiddenfeature, bias=False)
    self.linear2 = nn.Linear(hiddenfeature, outfeature, bias=False)
    self.dropout = nn.Dropout()

  def forward(self, X):
    X = self.relu(self.linear1(torch.mm(self.A, X)))
    X = self.softmax(self.linear2(torch.mm(self.A, X)), dim=1)
    return X


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
lr = 0.1
epochs = 500

In [None]:
Ahat = Ahat.to(device)
model = GCN(dataset.num_node_features, 32, dataset.num_classes, Ahat).to(device)
data = dataset[0].to(device)
optimizer = optim.SGD(model.parameters(), lr=lr)
criterion = nn.NLLLoss()

In [None]:
model.train()
for epoch in range(epochs):
  optimizer.zero_grad()
  out = model(data.x)
  loss = criterion(out[data.train_mask], data.y[data.train_mask])
  loss.backward()
  optimizer.step()
  _, pred = out.max(dim=1)
  traincorrect = float(pred[data.train_mask].eq(data.y[data.train_mask]).sum().item())
  trainacc = traincorrect / data.train_mask.sum().item()
  testcorrect = float(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
  testacc = testcorrect / data.test_mask.sum().item()
  print("epoch: {0}, loss: {1}, train acc: {2}, test acc: {3}".format(epoch, loss.item(), trainacc, testacc))

epoch: 0, loss: 1.944702386856079, train acc: 0.21428571428571427, test acc: 0.212
epoch: 1, loss: 1.9435466527938843, train acc: 0.21428571428571427, test acc: 0.221
epoch: 2, loss: 1.9423930644989014, train acc: 0.21428571428571427, test acc: 0.225
epoch: 3, loss: 1.9412434101104736, train acc: 0.21428571428571427, test acc: 0.236
epoch: 4, loss: 1.9401018619537354, train acc: 0.22142857142857142, test acc: 0.246
epoch: 5, loss: 1.9389528036117554, train acc: 0.2714285714285714, test acc: 0.251
epoch: 6, loss: 1.9377944469451904, train acc: 0.2785714285714286, test acc: 0.256
epoch: 7, loss: 1.9366058111190796, train acc: 0.29285714285714287, test acc: 0.269
epoch: 8, loss: 1.9353927373886108, train acc: 0.3, test acc: 0.272
epoch: 9, loss: 1.9341577291488647, train acc: 0.3, test acc: 0.284
epoch: 10, loss: 1.9328957796096802, train acc: 0.3, test acc: 0.29
epoch: 11, loss: 1.9316056966781616, train acc: 0.3357142857142857, test acc: 0.299
epoch: 12, loss: 1.9302891492843628, train 