SubGACC: Subgraph Operation Accelerator

The SubGAcc package is an extension library based on C and openmp to accelerate operations in subgraph-based graph representation learning (SGRL).

Follow the principles of algorithm system co-design, queried subgraphs for target links/motifs (e.g. ego-network in canonical SGRLs) are decomposed into node-level ones (e.g. collection of walks by walk_sampler in SUREL, set of nodes by gset_sampler in SUREL+), whose join can act as proxies of subgraphs, and can be reused among different queries.

Currently, SubGAcc consists of the following methods for scalable realization of SGRLs:

gset_sampler node set sampling with structure encoder of landing probability (LP)
walk_sampler walk sampling with relative positional encoder (RPE)
batch_sampler query sampling (a group of nodes) for mini-batch training of link prediction
walk_join online joining of node-level walks to construct the proxy of subgraph for given queries (e.g. link query $Q= \lbrace u,v \rbrace$ $\to$ join sampled walks of node $u$ and $v$ as $\mathcal{G}_{Q} = \lbrace W_u \bigoplus W_v \rbrace$)

Update

Feb. 25, 2023:

Release v2.2 with more robust memory management of allocation, release and indexing (billion edges).
Add bitwise-based hash for encoding structural features.
Add test cases and script of wall time measure.

Jan. 29, 2023:

Release v2.1 with refactored code base.
More robust memory accessing with buffer for set sampler on large graphs (million nodes).

Jan. 28, 2023:

Release v2.0 with the walk-based set sampler gset_sampler.

Requirements

(Other versions may work, but are untested)

python >= 3.8
numpy >= 1.17
gcc >= 8.4

Installation

python setup.py install

Functions

gset_sampler

subg_acc.gset_sampler(indptr, indices, query, num_walks, num_steps) 
-> (numpy.array [n], numpy.array [2,?], numpy.array [?,num_steps+1])

Sample a node set for each node in query (size of n) through num_walks-many num_steps-step random walks on the input graph in CSR format (indptr, indices), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node.

For usage examples, see test.py.

Parameters

indptr (np.array) - Index pointer array of the adjacency matrix in CSR format.
indices (np.array) - Index array of the adjacency matrix in CSR format.
query (np.array / list) - Nodes are queried to be sampled.
num_walks (int) - The number of random walks.
num_steps (int) - The number of steps in a walk.
bucket (int, optional) - The buffer size for sampled neighbors per node.
nthread (int, optional) - The number of threads.
seed (int, optional) - Random seed.

Returns

nsize (np.array) - The size of sampled set for each node in query.
remap (np.array) - Pairwised node id and the index of its associated structural encoding in enc array.
enc (np.array) - The compressed (unique) encoding of structural features.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
lib		lib
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
subg_acc.c		subg_acc.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

test

test

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

subg_acc.c

subg_acc.c

Repository files navigation

SubGACC: Subgraph Operation Accelerator

Update

Requirements

Installation

Functions

gset_sampler

Parameters

Returns

About

Releases

Packages

Languages

License

VeritasYin/subg_acc

Folders and files

Latest commit

History

Repository files navigation

SubGACC: Subgraph Operation Accelerator

Update

Requirements

Installation

Functions

gset_sampler

Parameters

Returns

About

Topics

Resources

License

Stars

Watchers

Forks

Languages