# Constructing and Accessing Properties in DGraph

This tutorial will show you how to construct a `DGraph` object in `tgm` and accessing its properties.

The `DGraph` object can be found in `tgm/graph.py`

Let's first learn how to make a `DGraph`by defining the graph ourselves. 

Start by defining the temporal edges, the `edge_index` has shape $[num_edge_events, 2]$

The `edge_timestamps` tensor specifies the timestamps associated with each edge, must has shape $[num_edge_events]$

The `edge_feats` specifies the features associated with each edge, must has shape $[num_edge_events, D_edge]$ where $D_edge$ is the feature dimension of edge. Edge features are also optional 



In [1]:
import torch

edge_index = torch.LongTensor([[2, 2], [2, 4], [1, 8]])
edge_timestamps = torch.LongTensor([1, 5, 20])
edge_feats = torch.rand(3, 5)

assert edge_index.shape[0] == edge_timestamps.shape[0] == edge_feats.shape[0]

## Construct `DGraph` from raw tensors

To construct a `DGraph` object, we first bundles related tensors into a `DGData` which then constructs the underlying storage of the data. 

`DGData` can be found at `tgm/data.py` and can be constructed with the `from_raw` function. In TGM, we support both edge events and node events at the same time. The core assumption is that a temporal graph must have edge events and with optional node events as well.

The edge events are specified with the following arguments:

- `edge_timestamps`: Tensor  # [num_edge_events] (required)

- `edge_index`: Tensor  # [num_edge_events, 2] (required)

- `edge_feats`: Tensor | None = None  # [num_edge_events, D_edge] (optional)

The node events are specified with the following arguments:

- `node_timestamps`: Tensor | None = None  # [num_node_events] (optional)

- `node_ids`: Tensor | None = None, # [num_node_events] (optional)

- `dynamic_node_feats`: Tensor | None = None  # [num_node_events, D_node_dynamic]


Lastly, each node can also have a static node feature, i.e. node feature that doesn't change over time

- `static_node_feats`: Tensor | None = None  # [num_nodes, D_node_static]

In the next part, we will specify our node events


In [2]:
from tgm.data import DGData
from tgm.graph import DGraph


node_timestamps = torch.LongTensor([1, 5, 10])
node_ids = torch.LongTensor([2, 4, 6])
dynamic_node_feats = torch.rand([3, 5])

assert node_timestamps.shape[0] == node_ids.shape[0] == dynamic_node_feats.shape[0]

static_node_feats = torch.rand(9, 11)

# initializing our DGData

dgdata = DGData.from_raw(
    edge_timestamps,
    edge_index,
    edge_feats,
    node_timestamps,
    node_ids,
    dynamic_node_feats,
    static_node_feats,
)


# initializing our DGraph

our_dgraph = DGraph(dgdata)



## Construct `DGraph` from Pandas 

You can also intialize your `DGData` from pandas dataframes. See the example below:

In [3]:
import pandas as pd

edge_dict = {
    'src': [2, 10],
    'dst': [3, 20],
    't': [1337, 1338],
    'edge_feat': [torch.rand(5).tolist(), torch.rand(5).tolist()],
}  # edge events

node_dict = {
    'node': [7, 8],
    't': [3, 6],
    'node_feat': [torch.rand(5).tolist(), torch.rand(5).tolist()],
}  # node events, optional

data = DGData.from_pandas(
    edge_df=pd.DataFrame(edge_dict),
    edge_src_col='src',
    edge_dst_col='dst',
    edge_time_col='t',
    edge_feats_col='edge_feat',
    node_df=pd.DataFrame(node_dict),
    node_id_col='node',
    node_time_col='t',
    dynamic_node_feats_col='node_feat',
)

dgraph = DGraph(dgdata)



## Construct `DGraph` from csv files

You can also intialize your `DGData` from csv files. See `DGData` `from_csv` function


## Properties of `DGraph`

`DGraph` objects act as a view on the underlying graph data, allowing the user to access various properties as well as slicing and other operations. Additional properties are seen in the `tgm/graph.py`

In [4]:
print('=== Graph Properties ===')
print(f'start time : {our_dgraph.start_time}')
print(f'end time : {our_dgraph.end_time}')
print(f'number of nodes : {our_dgraph.num_nodes}')
print(f'number of edge events : {our_dgraph.num_edges}')
print(f'number of timestamps : {our_dgraph.num_timestamps}')
print(f'number of edge and node events : {our_dgraph.num_events}')
print(f'edge feature dim : {our_dgraph.edge_feats_dim}')
print(f'static node feature dim : {our_dgraph.static_node_feats_dim}')
print(f'dynamic node feature dim : {our_dgraph.dynamic_node_feats_dim}')
print('======================')

=== Graph Properties ===
start time : 1
end time : 20
number of nodes : 9
number of edge events : 3
number of timestamps : 4
number of edge and node events : 6
edge feature dim : 5
static node feature dim : 11
dynamic node feature dim : 5


## Construct `DGraph` from TGB Datasets

We support the construction of `DGraph` from TGB link property prediction and node property prediction datasets. Currently, TKG and THG datasets from TGB is not supported. specify the time granularity of the dataset with `TimeDeltaDG`. Note that time granularity related functions are in alpha, and will be still under change

`r` means ordered time granularity which means the underlining timestamps tell us the ordering of edges and we do not use it for time conversion. (other time granularities will allow)

In [5]:
from tgm.timedelta import TimeDeltaDG

train_dg = DGraph('tgbl-wiki', time_delta=TimeDeltaDG('r'), split='train')

raw file found, skipping download
Dataset directory is  /Users/andang/tgm/my_venv/lib/python3.10/site-packages/tgb/datasets/tgbl_wiki
loading processed file


## `DGDataLoader` and `DGHook`

In TGM, we integrate data loading related operations such as negative sampling and neighbor sampling as hooks to the dataloader. For advanced users, you can also create your own custom hooks. 


In [6]:
from tgm.loader import DGDataLoader
from tgm.hooks import (
    NegativeEdgeSamplerHook,
    RecencyNeighborHook,
)

neg_hook = NegativeEdgeSamplerHook(low=0, high=train_dg.num_nodes)

n_nbrs = [20]  # sample 20 1-hop neighbors for each node
nbr_hook = RecencyNeighborHook(num_nbrs=n_nbrs, num_nodes=train_dg.num_nodes)

train_loader = DGDataLoader(
    train_dg, hook=[neg_hook, nbr_hook], batch_size=200
)  # compose hooks

## `DGDataLoader` materializes each batch into `DGBatch`

For simplicity, we will use an iterator on the loader, you will use `for batch in train_loader:` loop in practice.

In [7]:
iter_loader = iter(train_loader)
batch = next(iter_loader)

src = batch.src
dst = batch.dst
time = batch.time
neg = batch.neg
nbr_nids = batch.nbr_nids[1]

print('=== Batch of 200 edges ===')
print(f'source nodes shape : {src.shape}')
print(f'destination nodes shape : {dst.shape}')
print(f'timestamps shape : {time.shape}')
print(f'negative destinations shape : {neg.shape}')
print(f'one hop node neighbors shape: {nbr_nids.shape}')
print('======================')

=== Batch of 200 edges ===
source nodes shape : torch.Size([200])
destination nodes shape : torch.Size([200])
timestamps shape : torch.Size([200])
negative destinations shape : torch.Size([200])
one hop node neighbors shape: torch.Size([600, 20])
