Augmentation-Free Self-Supervised Learning on Graphs

The official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted at AAAI 2022.

Overview

Inspired by the recent success of self-supervised methods applied on images, self-supervised learning on graph structured data has seen rapid growth especially centered on augmentation-based contrastive methods. However, we argue that without carefully designed augmentation techniques, augmentations on graphs may behave arbitrarily in that the underlying semantics of graphs can drastically change. As a consequence, the performance of existing augmentation-based methods is highly dependent on the choice of augmentation scheme, i.e., hyperparameters associated with augmentations. In this paper, we propose a novel augmentation-free self-supervised learning framework for graphs, named AFGRL. Specifically, we generate an alternative view of a graph by discovering nodes that share the local structural information and the global semantics with the graph. Extensive experiments towards various node-level tasks, i.e., node classification, clustering, and similarity search on various real-world datasets demonstrate the superiority of AFGRL.

Augmentations on images keep the underlying semantics, whereas augmentations on graphs may unexpectedly change the semantics.

Requirements

Python version: 3.7.10
Pytorch version: 1.8.1
torch-geometric version: 1.7.0
faiss: 1.7.0

Hyperparameters

Following Options can be passed to main.py

--dataset: Name of the dataset. Supported names are: wikics, cs, computers, photo, and physics. Default is wikics.
usage example :--dataset wikics

--task: Name of the task. Supported names are: node, clustering, similarity. Default is node.
usage example :--task node

--layers: The number of units of each layer of the GNN. Default is [256]
usage example :--layers 256

--pred_hid: The number of hidden units of predictor. Default is [512]
usage example :--pred_hid 512

--topk: The number of neighbors for nearest neighborhood search. Default is 4.
usage example :--topk 4

--num_centroids: The number of centroids for K-means Clustering . Default is 100.
usage example :--num_centroids 100

--num_kmeans: The number of iterations for K-means Clustering . Default is 5.
usage example :--num_kmeans 5

How to Run

You can run the model with following options

To run node classification (reproduce Table 2 in paper)

sh run_node_classification.sh

To run node clustering (reproduce Table 3 in paper)

sh run_node_clustering.sh

To run similarity search (reproduce Table 4 in paper)

sh run_similarity_search.sh

or you can run the file with above mentioned hyperparameters

python main.py --embedder AFGRL --dataset wikics --task node --layers [1024] --pred_hid 2048 --lr 0.001 --topk 8

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
embeddings		embeddings
img		img
models		models
README.md		README.md
data.py		data.py
embedder.py		embedder.py
evaluate.py		evaluate.py
main.py		main.py
reproduce.ipynb		reproduce.ipynb
run_node_classification.sh		run_node_classification.sh
run_node_clustering.sh		run_node_clustering.sh
run_similarity_search.sh		run_similarity_search.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Augmentation-Free Self-Supervised Learning on Graphs

Overview

Requirements

Hyperparameters

How to Run

About

Releases

Packages

Languages

xiaopengguo/AFGRL

Folders and files

Latest commit

History

Repository files navigation

Augmentation-Free Self-Supervised Learning on Graphs

Overview

Requirements

Hyperparameters

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages