# Chapter 1: Graph

DGL provides a graph-centric programming abstraction with its core data structure `DGLGraph`

## 1.1 Some Basic Definitions about Graphs

A graph G = (V, E), consists of two sets (the set of nodes V and the set of edges E).  
An edge (u, v) connecting a pair of nodes u and v indicates that there is a relation between them.  
The relation can either be undirected or directed, thus a graph can be directed or undirected.  
  
Graphs can be weighted or unweighted. In a weighted graph, each edge is associated with a scaler weight.  
  
Graphs can also be either homogeneous or heterogeneous.
 - For the former, all nodes represent instances of the same type and all the edges represent relations of the same type. E.g., a social network.
 - For the latter, the nodes and edges can be of different types.

## 1.2 Graphs, Nodes, and Edges

DGL represents each node by a **unique interger**, called its **node ID**, and each edge by **a pair of intergers** correponding to the **IDs of its end nodes**.  
DGL assigns to each edge a unique interger, called its **edge ID**, based on the order in which it was added to the graph.  
The numbering of node and edge IDs starts from 0.  
All edges are directed in DGL.  
  
Use a 1-D interger tensor of node IDs to specify multiple nodes.  
Use a tuple of node-tensors (U, V) to specify nultiple edges.  
  
Use the `dgl.graph()` method to create a `DGLGraph`.

Use `dgl.to_bidirected()` to obtain an undirected graph

In [1]:
import dgl
import torch as th

u, v = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
g = dgl.graph((u, v))
print(g)

Graph(num_nodes=4, num_edges=4,
      ndata_schemes={}
      edata_schemes={})


In [2]:
print(g.nodes()) # node IDs
print(g.edges()) # edge IDs

tensor([0, 1, 2, 3])
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]))


In [3]:
bg = dgl.to_bidirected(g)
print(bg.edges())

(tensor([0, 0, 0, 1, 1, 2, 3, 3]), tensor([1, 2, 3, 0, 3, 0, 0, 1]))


DGL can use either **32**- or **64**-bit intergers to store the node and edge IDs.  
The data types for the node and edge IDs should be same.  
Use 32-bit intergers as it leads to better speed and requires less memory.

In [4]:
# Conversions
edges = th.tensor([2, 5, 3]), th.tensor([3, 5, 0])
g64 = dgl.graph(edges)
print(g64.idtype)

torch.int64


In [5]:
# create a int32 graph
g32 = dgl.graph(edges, idtype=th.int32)
print(g32.idtype)

torch.int32


In [6]:
# convert int64 <-> int32
g64_2 = g32.long() # convert to int64
g32_2 = g64.int() # conver to int32
print(g64_2.idtype)
print(g32_2.idtype)

torch.int64
torch.int32


## 1.3 Node and Edge Features

A `DGLGraph` can have several user-defined named features for storing properties of the nodes and edges.  
The features can be accessed via `ndata` and `edata` interface.  
Different names can have different shapes

In [7]:
g = dgl.graph(([0, 0, 1, 5], [1, 2, 2, 0])) # 6 nodes, 4 edges
print(g)

Graph(num_nodes=6, num_edges=4,
      ndata_schemes={}
      edata_schemes={})


In [8]:
g.ndata['x'] = th.ones(g.num_nodes(), 3) # node feature of length 3
g.edata['x'] = th.ones(g.num_edges(), dtype=th.int32)
print(g)

Graph(num_nodes=6, num_edges=4,
      ndata_schemes={'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'x': Scheme(shape=(), dtype=torch.int32)})


In [9]:
g.ndata['y'] = th.randn(g.num_nodes(), 5)

In [10]:
print(g.ndata['x'][1])
print(g.edata['x'][th.tensor([0, 3])]) # get features of edge 0 and 3

tensor([1., 1., 1.])
tensor([1, 1], dtype=torch.int32)


Important facts about the `ndata`/`edata` interface:
 - only **numerical types** (float, double, int) are allowed, can be **scalars (标量), vectors or multu-dimensional tensors**
 - each node feature has a unique name, each edge feature has a unique name; the features of nodes and edges can have the same name
 - a feature is created by **tensor assignment**; the **leading dimension** of that tensor **must be equal to** the number of nodes/edges in the graph; **cannot** assign a feature to a **subset** of the nodes/edges
 - features of the **same name** must have the **same dimensionality and data type**

In [11]:
# For weighted graphs
edges = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
weights = th.tensor([0.1, 0.6, 0.9, 0.7]) # weight of each edge
g = dgl.graph(edges)
g.edata['w'] = weights
print(g)

Graph(num_nodes=4, num_edges=4,
      ndata_schemes={}
      edata_schemes={'w': Scheme(shape=(), dtype=torch.float32)})
