An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any communications or issues are welcomed. Please contact yueliu19990731@163.com. If you find this repository useful to your research or work, it is really appreciate to star this repository. ❤️
Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years. However, we observe that, in the process of node encoding, existing methods suffer from representation collapse which tends to map all data into a same representation. Consequently, the discriminative capability of node representations is limited, leading to unsatisfied clustering performance. To address this issue, we propose a novel self-supervised deep graph clustering method termed Dual Correlation Reduction Network (DCRN) by reducing information correlation in a dual manner. Specifically, in our method, we first design a siamese network to encode samples. Then by forcing the cross-view sample correlation matrix and cross-view feature correlation matrix to approximate two identity matrices, respectively, we reduce the information correlation in dual level, thus improve the discriminative capability of the resulting features. Moreover, in order to alleviate representation collapse caused by over-smoothing in GCN, we introduce a propagation-regularization term to enable the network to gain long-distance information with shallow network structure. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of the proposed DCRN against the existing state-of-the-art methods.
The proposed DCRN is implemented with python 3.8.5 on a NVIDIA 3090 GPU.
Python package information is summarized in requirements.txt:
- torch==1.8.0
- tqdm==4.50.2
- numpy==1.19.2
- munkres==1.1.4
- scikit_learn==1.0.1
-
Step1: using dblp.zip or download other datasets from Awesome Deep Graph Clustering/Benchmark Datasets
-
Step2: unzip the dataset into ./dataset
-
Step3: run
python main.py --name dblp --seed 3 --alpha_value 0.2 --lambda_value 10 --gamma_value 1e3 --lr 1e-4
Parameter setting
- name: the name of dataset
- seed: the random seed. 10 runs under different random seeds.
- alpha_value: the teleport probability in graph diffusion
- PUBMED: 0.1
- DBLP, CITE, ACM, AMAP, CORAFULL: 0.2
- lambda_value: the coefficient of clustering guidance loss.
- all datasets: 10
- gamma_value: the coefficient of propagation regularization
- all datasets: 1e3
- lr: learning rate
- DBLP 1e-4
- ACM: 5e-5
- AMAP: 1e-3
- CITE, PUBMED, CORAFULL: 1e-5
Tips: Limited by the GPU memory, PUBMED and CORAFULL might be out of memory during training. Thus, we adpot batch training on PUBMED and CORAFULL dataseets and the batch size is set to 2000. Please use the batch training version of DCRN here.
If you use code or datasets in this repository for your research, please cite our paper.
@inproceedings{DCRN,
title={Deep Graph Clustering via Dual Correlation Reduction},
author={Liu, Yue and Tu, Wenxuan and Zhou, Sihang and Liu, Xinwang and Song, Linxuan and Yang, Xihong and Zhu, En},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={7},
pages={7603-7611},
year={2022}
}