torch-graph-force

A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.

The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.

Install

Install PyTorch (follow official instructions)
Install torch-graph-force:

pip install torch-graph-force

Usage

Create `GraphDataset` for The Graph.

The dataset can be created from a dataframe, an edgelist or Networkx Graph using from_pandas_dataframe, from_edgelist, or from_networkx respectively. from_pandas_dataframe is the recommended way as it's more efficient compared to other methods.

If the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:

import pandas as pd
import torch_graph_force

# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
    df, n_nodes
)

If the node IDs are not consecutive integers, a list of node IDs must be provided:

import pandas as pd
import torch_graph_force

df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]

ds = torch_graph_force.from_pandas_dataframe(
    df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling  ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)

Compute Graph Layout

Once having the graph dataset ready, we can feed the dataset to spring_layout to compute the graph layout.

pos = torch_graph_force.spring_layout(
    ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}

Optional arguments for spring_layout:

batch_size: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)
iterations: Maximum number of iterations taken. (default: 50)
num_workers: number of workers to fetch data from GraphDataset. If device is "cuda", num_workers must be 0. (default: 0)
device: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)
iteration_progress: monitor the progress of each iteration, it's useful for large graph. (default: False)
layout_config: additional config for the layout model. (default: {})

The layout model has some parameters with default values:

default_layout_config = {
    # Tensor of shape (n_nodes, ndim) for initial positions
    "pos": None,
    # Optimal distance between nodes
    "k": None,
    # Dimension of the layout
    "ndim": 2,
    # Threshold for relative error in node position changes.
    "threshold": 1e-4,
}

Use the layout_config argument to change the parameters if needed. The example below provides intial positions for the layout model:

n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
    ds,
    layout_config={"pos": init_pos}
)

Benchmarks

The implementation from torch-graph-force without GPU acceleration is 1.5x faster than Networkx's implementation.

GPU accelerated torch-graph-force can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.

Code for the benchmarks can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
assets		assets
requirements		requirements
tests		tests
torch_graph_force		torch_graph_force
.flake8		.flake8
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torch-graph-force

Install

Usage

Create `GraphDataset` for The Graph.

Compute Graph Layout

Benchmarks

About

Releases 1

Packages

Languages

License

tintn/torch-graph-force

Folders and files

Latest commit

History

Repository files navigation

torch-graph-force

Install

Usage

Create GraphDataset for The Graph.

Compute Graph Layout

Benchmarks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Create `GraphDataset` for The Graph.

Packages