Deep Graph Library (DGL)
=====================

DGL is designed to bring machine learning closer to graph-structured data. Specifically DGL enables trouble-free implementation of graph neural network (GNN) model family. Unlike MXNet or PyTorch, DGL provides friendly APIs to perform the fundamental operations in GNNs such as message passing and reduction. Through DGL, we hope to benefit both researchers trying out new ideas and engineers in production.

In this tutorial, we demonstrate the basics of DGL including:
- How to create a graph?
- How to manipulate node/edge features on a graph?
- How to convert a graph to/from other formats?

Although this tutorial uses [MXNet](https://mxnet.apache.org/) as backend for tensor-related computations (thus some familiarity with MXNet is preferred), DGL is designed to be platform-agnostic and can be seamlessly integrated into other frameworks like [PyTorch](https://pytorch.org) and [TensorFlow](https://www.tensorflow.org/), and we are actively working on this.

In [None]:
# A bit of setup, just ignore this cell
import matplotlib.pyplot as plt

# for auto-reloading external modules
%load_ext autoreload
%autoreload 2

%matplotlib inline
plt.rcParams['figure.figsize'] = (8.0, 6.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['animation.html'] = 'html5'

We start by creating the well-known *"Zachary's karate club"* social network. The network captures 34 members of a karate club, documenting pairwise links between members who interacted outside the club. The club later splits into two communities led by the instructor (node 0) and club president (node 33). You could read more about the story in the [wiki page](https://en.wikipedia.org/wiki/Zachary%27s_karate_club) A visualization of the network and the community is as follows:

![karate](https://www.dropbox.com/s/uqzor4lqsmbnz8k/karate1.jpg?dl=1)

Creating a graph
-----------------------------------

Let's see how we can create such a graph in DGL. We start with importing `dgl` and other relevant packages.

In [None]:
import dgl

We first create an empty `DGLGraph`. In DGL, nodes are consecutive integers start from 0. The following codes add all the club members into this graph (34 nodes).

In [None]:
G = dgl.DGLGraph()
G.add_nodes(34)
print('Number of nodes:', G.number_of_nodes())

The Karate Club network contains 78 edges:
```
[1 0]
[2 0] [2 1]
[3 0] [3 1] [3 2]
[4 0]
[5 0]
[6 0] [6 4] [6 5]
[7 0] [7 1] [7 2] [7 3]
[8 0] [8 2]
[9 2]
[10 0] [10 4] [10 5]
[11 0]
[12 0] [12 3]
[13 0] [13 1] [13 2] [13 3]
[16 5] [16 6]
[17 0] [17 1]
[19 0] [19 1]
[21 0] [21 1]
[25 23] [25 24]
[27 2] [27 23] [27 24]
[28 2]
[29 23] [29 26]
[30 1] [30 8]
[31 0] [31 24] [31 25] [31 28]
[32 2] [32 8] [32 14] [32 15] [32 18] [32 20] [32 22] [32 23] [32 29] [32 30] [32 31]
[33 8] [33 9] [33 13] [33 14] [33 15] [33 18] [33 19] [33 20] [33 22] [33 23] [33 26] [33 27] [33 28] [33 29] [33 30] [33 31] [33 32]
```

In DGL, edges can be added by specifying the two endpoints.

In [None]:
G.add_edge(1, 0)
print('Now we have %d edges!' % G.number_of_edges())

To add multiple edges at once, use a list/tensor of nodes to specify the endpoints.

In [None]:
import mxnet
import mxnet.ndarray as nd
import numpy as np

########
# NOTE: in DGL, edges are added by specifying a list of source nodes and a list of destination nodes,
# rather than a list of source-destination node pairs. This is different from other popular graph
# package such as networkx, python-igraph.

########
# NOTE: edges in DGLGraphs are all directional.

# add two edges 2->0 and 2->1 using list
G.add_edges([2, 2], [0, 1])

# add three edges 3->0, 3->1 and 3->2 using mxnet ndarray
src = nd.array([3, 3, 3], dtype=np.int64)
dst = nd.array([0, 1, 2], dtype=np.int64)
G.add_edges(src, dst)

print('Now we have %d edges!' % G.number_of_edges())

In [None]:
# add two edges 4->0, 5->0 using list
G.add_edges([4, 5], 0)

# add three edges 6->0 6->4 6->5 using mxnet ndarray
G.add_edges(6, nd.array([0, 4, 5], dtype=np.int64))

print('Now we have %d edges!' % G.number_of_edges())

If the edges share the same source or destination nodes, the list/tensor type can be replaced with a single integer.

In [None]:
# Exercise: please finish the karate club graph by adding the remaining edges. We have provided you all the
# remaining edge tuples in a list.

edge_list = [(7, 0), (7, 1), (7, 2), (7, 3), (8, 0), (8, 2), (9, 2), (10, 0), (10, 4), (10, 5),
             (11, 0), (12, 0), (12, 3), (13, 0), (13, 1), (13, 2), (13, 3), (16, 5), (16, 6),
             (17, 0), (17, 1), (19, 0), (19, 1), (21, 0), (21, 1), (25, 23), (25, 24), (27, 2),
             (27, 23), (27, 24), (28, 2), (29, 23), (29, 26), (30, 1), (30, 8), (31, 0), (31, 24),
             (31, 25), (31, 28), (32, 2), (32, 8), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22),
             (32, 23), (32, 29), (32, 30), (32, 31), (33, 8), (33, 9), (33, 13), (33, 14), (33, 15),
             (33, 18), (33, 19), (33, 20), (33, 22), (33, 23), (33, 26), (33, 27), (33, 28),
             (33, 29), (33, 30), (33, 31), (33, 32)]

# >>> YOUR CODE STARTS
src, dst = list(zip(*edge_list))
G.add_edges(src, dst)

# <<< YOUR CODE ENDS

# We should have 78 edges now!
print('Now we have %d edges!' % G.number_of_edges())

Manipulating node/edge features
---------------------------------------------------------

Nodes and edges in `DGLGraph` can have **feature** tensors. Features of multiple nodes/edges are batched on the first dimension. Let's start by assigning a random feature vector of length 5 to all nodes.

In [None]:
G.ndata['feat'] = nd.random.randn(34, 5)

Now each node has a feature vector `'feat'` that has 5 elements. Note since there are 34 nodes in this graph, the first dimension must be of size 34, so that each row corresponds to the feature vector of each node. Error will be raised if the dimension mismatches:

In [None]:
# This will raise error!!
# G.ndata['wrong_feat'] = nd.random.randn(35, 5)

The `G.ndata` is a dictionary-like structure, so it is compatible with any operation on dictionary.

In [None]:
# Use `dict.update` to add new features (vector of length 3)
G.ndata.update({'another_feat' : nd.random.randn(34, 3)})

# Print the feature dictionary
print(G.ndata)

# Delete the new feature using `dict.pop`
G.ndata.pop('another_feat')

Sometimes, you might want to update features of some but not all of the nodes. This can be done using the following syntax:

In [None]:
# Set node 0's feat to be all-zeros vector. Please be aware of the extra size 1 dimension here.
G.nodes[0].data['feat'] = nd.zeros((1, 5))

# Set node 2, 3's feat to be all-ones vector at once using list type.
G.nodes[[2, 3]].data['feat'] = nd.ones((2, 5))

# Set node 10, 11, 12's feat to be all-twos vector at once using tensor type.
to_change = nd.array([10, 11, 12], dtype=np.int64)
G.nodes[to_change].data['feat'] = nd.ones((3, 5)) * 2

Similar to `G.ndata` and `G.nodes`, we have `G.edata` and `G.edges` to access and modify edge features:

In [None]:
# The broness edge feature is just a scalar.
G.edata['broness'] = nd.ones((G.number_of_edges(),))

# The instructor (node 0) is a tough guy, so his friends are a little bit scared of him.
G.edges[G.predecessors(0), 0].data['broness'] *= 0.5

print(G.edata)

In [None]:
# Exercise: We know that measuring bro-ness cannot be accurate. Could you add some small random noise to it?
# Hint: Use `nd.random.randn` to add small permutation to it.
#
# >>> YOUR CODE STARTS

G.edata['broness'] += nd.random.randn(G.number_of_edges()) * 0.1

# <<< YOUR CODE ENDS

# You should see some randomness here
print(G.edata['broness'])

Converting to/from networkx graph and sparse matrix
-----------------------------------------------------------------

[Networkx](https://networkx.github.io/documentation/stable/) is a classical and popular python graph library. It provides many good utilities to analyze and visualize a graph. `DGLGraph` can be easily converted to/from `networkx` graph very easily:

In [None]:
import networkx as nx

nx_G = G.to_networkx()
pos = nx.circular_layout(nx_G)
nx.draw(nx_G, pos, with_labels=True)

Constructing a DGLGraph from networkx is straight-forward. In fact, DGL borrows many of the networkx utilities to create graph from different format:

In [None]:
# from networkx graph
G_from_nx = dgl.DGLGraph(nx_G)  # this gives you the same karate club network

# from edge list
G_from_elist = dgl.DGLGraph([(0,1), (1,2), (2,3)])  # this gives you a chain graph

# from scipy sparse matrix
import scipy.sparse as sp
A = sp.eye(5, 5, 1)
G_from_sp = dgl.DGLGraph(A)  # this also gives you a chain of 5 nodes