# Basic operations on distributed data structure

This notebook shows the operations on three distributed data structures: DistGraph, DistTensor and DistEmbedding.

In [None]:
import dgl
import torch as th

## Initialize DGL's distributed module

`initialize` has to be called before calling any DGL's distributed API. Users need to at least provide the IP configuration of the cluster.

In [None]:
dgl.distributed.initialize('ip_config.txt')

## Create DistGraph

When creating a DistGraph object, it will load the input graph or connected to the servers that load the input graph, depending on its execution mode. `conf_file` is only required for the standalone mode.

The code below loads the OGB product graph that was partitioned with Metis. When running in the standalone mode, the input graph can only have one partition.

In [None]:
g = dgl.distributed.DistGraph('ogbn-products', part_config='standalone_data/ogbn-products.json')

## Access DistGraph

We can access some basic information of the input graph structure from DistGraph, e.g., the number of nodes and the number of edges.

In [None]:
print('#nodes:', g.number_of_nodes())
print('#edges:', g.number_of_edges())

The input graph contains features and labels on nodes. These features and labels as well as the mask arrays (`train_mask`, `val_mask` and `test_mask`) are loaded into memory and shown up in `g.ndata` automatically.

In addition, it contains `orig_id`. When a graph is partitioned, the node Ids and edge Ids are relabeled so that all node Ids and edge Ids in a partition fall in a contiguous range. The original node Ids and edge Ids are stored as node data and edge data of `orig_id` in the graph.

In [None]:
print(list(g.ndata.keys()))

`train_mask` indicates whether a node belongs to the training set. The data is stored in a distributed tensor.

In [None]:
print(g.ndata['train_mask'])

To access the data in the distributed tensor, we need to explicitly slice data from it. The slicing operation copies the data to the local process. The sliced data is stored in a Pytorch tensor.

In [None]:
print(g.ndata['train_mask'][0:10])

Print the values of `orig_id`. We are running it in the standalone mode, the node Ids were not relabled.

In [None]:
print(g.ndata['orig_id'][0:10])

Similarly, the edge data has `orig_id`.

In [None]:
print(list(g.edata.keys()))

## Distributed tensors

When a graph is loaded, all node data and edge data are loaded and stored as `DistTensor`. Normally, we don't need to create new distributed tensors during training. One use case of creating a new distributed tensor is in the inference stage where we want to store all intermediate node embeddings. We will see this example later in the model training.

In [None]:
arr = dgl.distributed.DistTensor((g.number_of_nodes(),), th.float32)

By default, the created tensor is initialized to 0.

In [None]:
arr[0:10]

We can customize the initialization by providing an initialization function.

In [None]:
def init(shape, dtype):
    return th.rand(shape, dtype=dtype)
arr = dgl.distributed.DistTensor((g.number_of_nodes(),), th.float32, init_func=init)

In [None]:
arr[0:10]

We can assign a DistTensor as node data. DistGraph only allows DistTensor as node data.

In [None]:
g.ndata['new_data'] = arr

## Distributed embeddings

DGL provides `DistEmbedding` to help to train models with embeddings (e.g., DeepWalk). When the embeddings are updated by a mini-batch, only the embeddings involved in the mini-batch are updated. As such, we can use `DistEmbedding` to train very large models.

In [None]:
emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=init)

DGL provides a sparse optimizer for `DistEmbedding`. For example, a user can use `SparseAdagrad` to update the embeddings. The tensor returned from `DistEmbedding` is attached with gradients.

In [None]:
optimizer = dgl.distributed.SparseAdagrad([emb], lr=0.001)
feats = emb([0,1,2,3])
print(feats)
loss = th.sum(feats + 1)
loss.backward()
optimizer.step()

When getting data from `DistEmbedding` without recording gradients, no gradients are attached to the returned tensor. We can also see the embeddings for node 0, 1, 2, 3 have been updated.

**Note**: When embeddings are read from DistEmbedding and not being used in the backward computation, reading embeddings have to be invoked with Pytorch's no_grad scope.

In [None]:
with th.no_grad():
    print(emb([0, 1, 2, 3]))