GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware

An implementation of the locality-aware optimization in GNNSampler. It can be plugged into the pre-processing step to accelerate the sampling process for sampling-based models. Since the paper is under review, more scripts designed to flexibly adjust the sampling weight will be open-sourced after acceptance.

Overview

GNNSampler is a unified programming model for mainstream sampling algorithms, which covers key procedures in the general sampling process. One can embed GNNSamlper into the general sampling process to learn large-scale graphs. The following figure describes the workflow of learning large-scale graph data with GNN, where GNNSampler is embedded for optimizing sampling. Moreover, to leverage the hardware feature, we choose the data locality as a case study and implement locality-aware optimizations in GNNSampler. The right part of the figure illustrates a case for data locality exploration. More details can be found in our paper.

Experimental Devices

Platform	Configuration
CPU	Intel Xeon E5-2683 v3 CPUs (dual 14-core)
GPU	NVIDIA Tesla V100 GPU (16 GB memory)

Dependencies

python
tensorflow
numpy
scipy
scikit-learn
pyyaml

Usage

One can use the following shell scripts to perform accelerated model training with locality-aware optimization:
./locality_amazon.sh
./locality_reddit.sh
./locality_flickr.sh
For comparison, one can use the following shell scripts to perform the vanilla methods (with no locality-aware optimization):
./vanilla_amazon.sh
./vanilla_reddit.sh
./vanilla_flickr.sh

Code Directory

GNNSampler/
│   README.md
│   locality_amazon.sh (One can perform optimized model training on Amazon dataset with GraphSAINT as the backbone)
|   vanilla_amazon.sh (One can perform vanilla model training on Amazon dataset with GraphSAINT as the backbone)
│   ...
└───graphsaint/ 
|   (We use the tensorflow-based implementation of GraphSAINT)
└───precomputed_weight/
|   (We offer pre-computed weights for some datasets to reproduce the performance reported in the paper)
└───train_config/
|   (The configurations of training are generally taken from backbone's repository)
└───data/
    (Pls. download and add datasets into this folder)

Datasets

All datasets used in our papers are available:

Acknowledgements

The locality-aware optimization is embedded in various sampling-based models to verify its efficiency and effectiveness. We use the implementations of GraphSAGE, FastGCN, and GraphSAINT as backbones, and owe many thanks to the authors for making their code available.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
graphsaint		graphsaint
precomputed_weight		precomputed_weight
train_config		train_config
LICENSE		LICENSE
Overview.PNG		Overview.PNG
README.md		README.md
locality_amazon.sh		locality_amazon.sh
locality_flickr.sh		locality_flickr.sh
locality_reddit.sh		locality_reddit.sh
vanilla_amazon.sh		vanilla_amazon.sh
vanilla_flickr.sh		vanilla_flickr.sh
vanilla_reddit.sh		vanilla_reddit.sh

License

ICT-GIMLab/GNNSampler

Folders and files

Latest commit

History

Repository files navigation

GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware

Overview

Experimental Devices

Dependencies

Usage

Code Directory

Datasets

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages