**Graphsage**

Graphsage is a inductive representation learning algorithm for large graphs.

Unlike transductive algorithms, which aim to learn the embeddings of each nodes, Graphsage aims to learn an aggregator functions instead.

As we know, **unsupervised algorithms** like deepwalk, node2vec aim to generate different walk sequences based on structual information of graph and then use skip-gram to learn the embedding of each node in the graph. These embeddings are not related to specific downstream task, so they could be general to many tasks. However, the ability to express the information containing in the graph is also limited. In this case, some authros proposed **GCN**, which is the first transductive algorithm using NN model. Each epoch, the model will update embeddings for each node. For large graphs, this usually could not be done.

Graphsage is designed to fix these flaws in the field of graph representation learning. Instead of directly learning node embeddings for each node, the authors decide to learn **aggregators** for nodes to get information from their neighbors and get the final embeddings.

Since we know, if we want to get as much information in the graph as we can, we should consider the structual information to generate embeddings for nodes. How to get **structual information**? Usually, we consider **k-hop neighbors** of nodes as their k-level structual information. Graphsage also use this idea. For each node, Graphsage tries to aggregate the features of its neighbors to get structual information. And then they combine neighbors' information and its own information to generate new information. This new information is the embedding for the node, containing structual information(neighbors' features) and own features.

First, they **sample a fixed number** of unique neighbors for each node(to speed up, and experiment shows it won't influence the performance). 

Second, they use aggregator to aggregate the features of these neighbors for each node. The authors proposed three types of aggregators, **mean, max-pooling, and LSTM**. The core idea of designing aggregator is to make sure it's **symmetric** while training because it has to operate over an unordered set of vectors.

Third, we get fused neighbors' features for each node. We use this fused feature to **combine** with the feature of center node and **project** this combined feature to a **lower** space to reduce the dimension of each node.

Fourth, iterate step 1 to 3 many times and use loss function to update weights of aggregators and projectors, we could get final model. We could train this model as unsupervised or supervised model. Below is the example of supervised model.

In [None]:
import torch

def format_pytorch_version(version):
  return version.split('+')[0]

TORCH_version = torch.__version__
TORCH = format_pytorch_version(TORCH_version)

def format_cuda_version(version):
  return 'cu' + version.replace('.', '')

CUDA_version = torch.version.cuda
CUDA = format_cuda_version(CUDA_version)

!pip install torch-scatter     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-sparse      -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-cluster     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-geometric 

Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-scatter
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl (7.9 MB)
[K     |████████████████████████████████| 7.9 MB 5.4 MB/s 
[?25hInstalling collected packages: torch-scatter
Successfully installed torch-scatter-2.0.9
Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-sparse
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_sparse-0.6.13-cp37-cp37m-linux_x86_64.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 5.3 MB/s 
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.13
Looking in links: https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Collecting torch-cluster
  Downloading https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_cluster-1.6.0-cp37-cp37m-linux_x86_64.whl (2.5 MB)
[K     |████████████████████████████████| 2.5

In [None]:
from torch_geometric.datasets import Planetoid
import os
import re
import random
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from collections import defaultdict
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from copy import deepcopy

In [None]:
dataset = Planetoid(root='/tmp/Cora', name='Cora')
nodenum = dataset.data.num_nodes
edges = dataset.data.edge_index.T
nodes = defaultdict(lambda: defaultdict(int))
for edge in edges:
  nodes[edge[0].item()][edge[1].item()] += 1
  nodes[edge[1].item()][edge[0].item()] += 1

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = dataset[0].to(device)
#features = data.x
features = {}
for i in range(data.x.shape[0]):
  features[i] = data.x[i].unsqueeze(0)

In [None]:
class alias():
  def __init__(self, probs):
    self.n = len(probs)
    self.scaledprobs = {}
    self.table = {}
    self.aliastable = {}
    self.small = []
    self.big = []
    self.keys = list(probs.keys())

    for item in probs:
      prob = probs[item]
      self.scaledprobs[item] = prob * self.n
      if self.scaledprobs[item] > 1:
        self.big.append(item)
      elif self.scaledprobs[item] < 1:
        self.small.append(item)
      else:
        self.table[item] = 1
    
    while self.small and self.big:
      smallitem = self.small.pop()
      bigitem = self.big.pop()
      newprob = self.scaledprobs[bigitem] - (1 - self.scaledprobs[smallitem])
      self.table[smallitem] = self.scaledprobs[smallitem]
      self.aliastable[smallitem] = bigitem
      self.scaledprobs[bigitem] = newprob
      if self.scaledprobs[bigitem] > 1:
        self.big.append(bigitem)
      elif self.scaledprobs[bigitem] < 1:
        self.small.append(bigitem)
      else:
        self.table[bigitem] = 1
    
    while self.small:
      smallitem = self.small.pop()
      self.table[smallitem] = 1
    
    while self.big:
      bigitem = self.big.pop()
      self.table[bigitem] = 1

  def sampling_one(self):
    sample = random.choice(self.keys)
    if self.table[sample] >= random.uniform(0, 1):
      return sample
    else:
      return self.aliastable[sample]
  
  def sampling_n(self, n):
    samples = []
    for i in range(n):
      samples.append(self.sampling_one())
    return samples

In [None]:
neighbors = {}
for node in nodes:
  neighbors[node] = alias(nodes[node])

In [None]:
batchsz = 64

In [None]:
xs = torch.IntTensor(nodenum, 1)
for i in range(nodenum):
  xs[i] = i

In [None]:
trainset = torch.utils.data.TensorDataset(xs[data.train_mask].to(device), data.y[data.train_mask])
train_loader = DataLoader(trainset, batch_size=batchsz, shuffle=True)

In [None]:
class Graphsage(nn.Module):
  def __init__(self, neighbors, features, k, n, dims):
    super(Graphsage, self).__init__()
    self.neighbors = neighbors
    self.features = features
    self.k = k
    self.n = n
    self.aggregator = nn.ModuleList([nn.Linear(dims[i], dims[i], bias=True) for i in range(self.k)])
    self.linears = nn.ModuleList([nn.Linear(2 * dims[i - 1], dims[i], bias=False) for i in range(1, self.k + 1)])
    self.mlp = nn.Linear(dims[-2], dims[-1])
    self.bns = nn.ModuleList([nn.BatchNorm1d(dims[i]) for i in range(1, self.k + 1)])
    self.relu = nn.ReLU()
    self.softmax = nn.LogSoftmax()

  def forward(self, batch):
    batchs = defaultdict(set)
    batchs[self.k] = set()
    batchneighbors = defaultdict(lambda: defaultdict(list))
    batchlist = []
    
    for node in batch:
      batchs[self.k].add(node.item())
      batchlist.append(node.item())

    for i in range(self.k, 0, -1):
      batchs[i - 1] |= batchs[i]
      for node in batchs[i]:
        kneighbors = set(self.neighbors[node].sampling_n(self.n))
        batchs[i - 1] |= kneighbors
        batchneighbors[i - 1][node] = list(kneighbors)

    fs = deepcopy(self.features)

    for i in range(1, self.k + 1):
      aggregate = {}
      for node in batchs[i]:
        for neigh in batchneighbors[i - 1][node]:
          if node not in aggregate:
            aggregate[node] = fs[neigh]
          else:
            aggregate[node] = torch.cat((aggregate[node], fs[neigh]), dim=0)
        aggregate[node] = torch.mean(aggregate[node], dim=0, keepdim=True)
        aggregate[node] = self.aggregator[i - 1](aggregate[node])
        aggregate[node] = torch.cat((aggregate[node], fs[node]), dim=1)
        aggregate[node] = self.linears[i - 1](aggregate[node])
        aggregate[node] = self.relu(aggregate[node])
      
      tmp = list(batchs[i])
      agg = aggregate[tmp[0]]
      for j in range(1, len(tmp)):
        agg = torch.cat((agg, aggregate[tmp[j]]), dim=0)

      agg = self.bns[i - 1](agg)

      for j, node in enumerate(tmp):
        fs[node] = agg[j].unsqueeze(0)

    z = fs[batchlist[0]]
    for i in range(1, len(batchlist)):
      z = torch.cat((z, fs[batchlist[i]]))
    
    result = self.softmax(z)

    return result

In [None]:
lr = 0.1
epochs = 50
k = 3
n = 20
dims = [dataset.num_node_features]
diff = (dataset.num_node_features - dataset.num_classes) // k
for i in range(k):
  dims.append(dims[0] - diff * (i + 1))

dims.append(dataset.num_classes)

In [None]:
model = Graphsage(neighbors, features, k, n, dims).to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)
criterion = nn.NLLLoss()

In [None]:
model.train()
for epoch in range(epochs):
  acc = 0
  for x, y in train_loader:
    optimizer.zero_grad()
    out = model(x)
    loss = criterion(out, y)
    loss.backward()
    optimizer.step()
    _, pred = out.max(dim=1)
    acc += float(pred.eq(y).sum().item())
  print("epoch: {0}, loss: {1}, train acc: {2}".format(epoch, loss.item(), acc / data.train_mask.sum().item()))



epoch: 0, loss: 2.1925950050354004, train acc: 0.1357142857142857
epoch: 1, loss: 1.367342472076416, train acc: 0.4714285714285714
epoch: 2, loss: 1.1162117719650269, train acc: 0.6714285714285714
epoch: 3, loss: 1.3178423643112183, train acc: 0.6928571428571428
epoch: 4, loss: 1.7026225328445435, train acc: 0.7214285714285714
epoch: 5, loss: 1.029147982597351, train acc: 0.8285714285714286
epoch: 6, loss: 1.3029099702835083, train acc: 0.8214285714285714
epoch: 7, loss: 0.6563180088996887, train acc: 0.9
epoch: 8, loss: 1.2874561548233032, train acc: 0.8928571428571429
epoch: 9, loss: 0.6392794251441956, train acc: 0.8928571428571429
epoch: 10, loss: 0.49636831879615784, train acc: 0.9428571428571428
epoch: 11, loss: 0.3361985683441162, train acc: 0.9571428571428572
epoch: 12, loss: 0.2691052258014679, train acc: 0.9571428571428572
epoch: 13, loss: 0.23887528479099274, train acc: 0.9571428571428572
epoch: 14, loss: 0.5103762745857239, train acc: 0.9571428571428572
epoch: 15, loss: 0.3

In [None]:
testset = torch.utils.data.TensorDataset(xs[data.test_mask].to(device), data.y[data.test_mask])
test_loader = DataLoader(testset, batch_size=batchsz, shuffle=True)

In [None]:
model.eval()
acc = 0
for x, y in test_loader:
  out = model(x)
  _, pred = out.max(dim=1)
  acc += float(pred.eq(y).sum().item())
print("test acc: {0}".format(acc / data.test_mask.sum().item()))



test acc: 0.747
