Implementing Graph Convolutional Network (GCN) in DGL
=====================================================

Graph convolutional network (GCN) is a popular model proposed by [Kipf & Welling](https://arxiv.org/abs/1609.02907) to encode graph structure by message passing. The high-level idea is similar to our toy task: node features are updated by aggregating the messages from the neighbors. Here is its message passing equation:

$$
h_{v_i}^{(l+1)} = \sigma \left(\sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_{v_j}^{(l)}W^{(l)} \right)
$$

, where $v_i$ is any node in the graph; $h_{v_i}$ is the feature of node $v_i$; $\mathcal{N}(i)$ denotes the neighborhood of $v_i$; $c_{ij}$ is the normalization constant related to node degrees; $W$ is the parameter and $\sigma$ is a non-linear activation function.

In [None]:
# A bit of setup, just ignore this cell
import matplotlib.pyplot as plt

# for auto-reloading external modules
%load_ext autoreload
%autoreload 2

%matplotlib inline
plt.rcParams['figure.figsize'] = (8.0, 6.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['animation.html'] = 'html5'

The steps to implement GCN in DGL is also similar to the toy task (2_MessagePassing.ipynb):
* Define the message function.
* Define the reduce function.
* Define how they are triggered using `send` and `recv`.

We first pretend that we already have implemented the message function `gcn_message` and reduce function `gcn_reduce`, and look at how we can define GCN layer and trigger computation with `send` and `recv`:

In [None]:
import mxnet
from mxnet import gluon
from mxnet.gluon import nn

import dgl
import dgl.function as fn

# Define the GCN module
class GCN(gluon.Block):
    def __init__(self, out_feats):
        super(GCN, self).__init__()
        self.linear = nn.Dense(out_feats)
    
    def forward(self, g, inputs):
        # g is the graph and the inputs is the input node features
        # first perform linear transformation
        h = self.linear(inputs)
        # set the node features
        g.ndata['h'] = h
        # trigger message passing, gcn_message and gcn_reduce will be defined later
        g.update_all(gcn_message, gcn_reduce)
        # get the result node features
        h = g.ndata.pop('h')
        return h

Now let's fill in the missing message and reduce function. For simplicity, for now, we ignore the normalization constant $c_{ij}$.

From the equation of GCN above:
- each node sends out the embedding after linear transformation to their neighbors, so the message from node $u$ to node $v$ can be computed as
$$m_{uv} = h_u$$
- each node aggregates received messages by summation, so the aggregated messages on node $v$ can be computed as
$$a_v = \sum\limits_{u\in \mathcal{N}(v)}m_{uv}$$, where $\mathcal{N}(v)$ is the neighbor set of node $v$.

### Exercise:
Follow the two equations above and finish the message & reduce function for Graph Convolutional Network based on the equation above.

NOTE: for now, we ignore the normalization factor $c_{ij}$

In [None]:
# >>> YOUR CODE STARTS

# >>> YOUR CODE ENDS

In this tutorial, we will still use karate club as example. Let's use the helpful utility function to load the graph and make it bidirectional with self loop:

In [None]:
import dgl
import networkx as nx
from tutorial_utils import create_karate_graph, convert_to_bidirectional
G = create_karate_graph()
GG = convert_to_bidirectional(G)

To test this model, let's try to predict which club member will join whose group (instructor or club president) after the split. We adopt the semi-supervised setting developed by Kipf:

In [None]:
import mxnet.ndarray as nd
import numpy as np

# Define a 2-layer GCN model
class Model(gluon.Block):
    def __init__(self, hidden_size, num_classes):
        super(Model, self).__init__()
        self.gcn1 = GCN(hidden_size)
        self.gcn2 = GCN(num_classes)
    
    def forward(self, g, inputs):
        h = self.gcn1(g, inputs)
        h = nd.relu(h)
        h = self.gcn2(g, h)
        return h

inputs = nd.eye(34)  # featureless inputs
labeled_nodes = nd.array([0, 33], dtype=np.int64)  # only the instructor and the president nodes are labeled
labels = nd.array([0, 1], dtype=np.int64)  # their labels are different
model = Model(5, 2)
model.initialize()
loss_fn = gluon.loss.SoftmaxCELoss()
trainer = gluon.Trainer(model.collect_params(), 'adam', {'learning_rate': 0.01})

all_logits = []
for epoch in range(30):
    with mxnet.autograd.record():
        logits = model(GG, inputs)
        loss = loss_fn(logits[labeled_nodes], labels).sum()
    
    loss.backward()
    trainer.step(batch_size=1)
    
    all_logits.append(logits.detach())
    print('Epoch %d | Loss: %.4f' % (epoch, loss.asscalar()))

In [None]:
# Visualize the node classification using the logits output.
import numpy as np
import matplotlib.animation as animation
from IPython.display import HTML

fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()
nx_G = G.to_networkx()
def draw(i):
    cls1color = '#00FFFF'
    cls2color = '#FF00FF'
    pos = {}
    colors = []
    for v in range(34):
        pos[v] = all_logits[i][v].asnumpy()
        cls = np.argmax(pos[v])
        colors.append(cls1color if cls else cls2color)
    ax.cla()
    ax.axis('off')
    ax.set_title('Epoch: %d' % i)
    nx.draw(nx_G.to_undirected(), pos, node_color=colors, with_labels=True, node_size=500)

ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
HTML(ani.to_html5_video())

### Exercise

There is still one missing piece. In our GCN model, 
$$
h_{v_i}^{(l+1)} = \sigma \left(\sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_{v_j}^{(l)}W^{(l)} \right)
$$
And we haven't implemented the normalizer $c_{ij}$. Kipf, in GCN paper, pointed out that the normalizer should be computed as follows:

$$
c_{ij} = \sqrt{d_id_j}
$$

, where $d_i, d_j$ are the degrees of node $v_i$ and $v_j$ respectively. Your task is to modify the program to implement it.

**Hint #1**: Use `GG.in_degrees(GG.nodes())` to get a 1-D tensor containing the degrees of all the nodes.

**Hint #2**: Since $c_{ij}$ has a subscription $ij$, it is tied to the edges, and our message function is (not coincidently) an **edge UDF**.

Have fun :)

In [None]:
# >>> YOUR CODE STARTS

# <<< YOUR CODE ENDS