[AAAI-2025] Multi-view Empowered Structural Graph Wordification for Language Models

Overview

Dr.E (Dual-Level Residual with Embedding) is a novel framework that bridges Graph Neural Networks (GNNs) with Large Language Models (LLMs) for text-attributed graph learning. The key innovation lies in the dual-level residual quantization mechanism that enables effective alignment between continuous graph representations and discrete token embeddings.

Key Features

Multi-View Graph Encoding: Captures structural information at different hop levels (1-hop, 2-hop, 3-hop neighborhoods)
Dual-Level Residual Quantization:
- Intra-Layer Residual: Generates K codes per view using residual quantization
- Inter-Layer Residual: Propagates quantized embeddings across GNN layers
Token-Level Alignment: Uses frozen LLM token embeddings as codebook for seamless integration
Parameter-Efficient Fine-Tuning: Leverages LoRA for efficient LLM adaptation

Architecture

The framework consists of three main components:

GNN Encoder: A 3-layer SAGEConv network with Inter-Layer Residual connections
Vector Quantization Module: Maps continuous GNN embeddings to discrete tokens using cosine similarity
LLM Decoder: Llama-2-7B with LoRA fine-tuning for node classification

Installation

Requirements

Python >= 3.9
CUDA >= 11.7
PyTorch >= 2.0.0

Download LLM

Download Llama-2-7B-hf from Hugging Face and update the llm_path in configs/training.py.

Download Data

Download the pre-computed embeddings file x_emb.pt from Google Drive:

x_emb.pt

Prepare Codebook

First, generate the codebook from your datasets:

python scripts/create_codebook.py --llm_path /path/to/Llama-2-7b-hf

Training

# Train on Cora
python train.py --llm_path /path/to/Llama-2-7b-hf --dataset cora_dataset

# Train on PubMed
python train.py --llm_path /path/to/Llama-2-7b-hf --dataset pubmed_dataset

# Train on ogbn-arxiv
python train.py --llm_path /path/to/Llama-2-7b-hf --dataset ogbn_arxiv_dataset --batch_size 2

Key Training Arguments

Argument	Default	Description
`--llm_path`	-	Path to Llama-2-7B-hf model
`--dataset`	`cora_dataset`	Dataset name
`--batch_size`	`4`	Training batch size
`--num_epochs`	`20`	Number of training epochs
`--llm_lr`	`1e-4`	Learning rate for LLM (LoRA)
`--gnn_lr`	`8e-4`	Learning rate for GNN
`--device`	`cuda:0`	Training device
`--quantization`	`8bit`	LLM quantization (4bit/8bit)

Results

Node Classification Accuracy

Dataset	Test Accuracy
Cora	91.33%
PubMed	96.70%
ogbn-arxiv	76.45%

Project Structure

.
├── README.md
├── requirements.txt
├── .gitignore
├── train.py                 # Main training script
├── configs/
│   ├── training.py          # Training configuration
│   ├── peft.py              # LoRA configuration
│   ├── datasets.py          # Dataset paths
│   └── quantization.py      # Quantization config
├── models/
│   ├── model.py             # Dr.E model architecture
│   └── vq.py                # Vector Quantization module
├── datasets/
│   ├── cora_dataset.py      # Cora dataset loader
│   └── pubmed_dataset.py    # PubMed dataset loader
├── utils/
│   ├── train_utils.py       # Training utilities
│   ├── dataset_utils.py     # Dataset preprocessing
│   ├── config_utils.py      # Configuration utilities
│   └── memory_utils.py      # Memory tracking
├── scripts/
│   └── create_codebook.py   # Codebook generation
└── imgs/                    # Paper figures

Citation

If you find this work useful, please cite our paper:

@inproceedings{liu2025multi,
  title={Multi-view empowered structural graph wordification for language models},
  author={Liu, Zipeng and Wu, Likang and He, Ming and Guan, Zhong and Zhao, Hongke and Feng, Nan},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={23},
  pages={24714--24722},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Llama 2 by Meta AI
DGL for graph neural network operations
PEFT for parameter-efficient fine-tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[AAAI-2025] Multi-view Empowered Structural Graph Wordification for Language Models

Overview

Key Features

Architecture

Installation

Requirements

Download LLM

Download Data

Prepare Codebook

Training

Key Training Arguments

Results

Node Classification Accuracy

Project Structure

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
datasets		datasets
imgs		imgs
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

[AAAI-2025] Multi-view Empowered Structural Graph Wordification for Language Models

Overview

Key Features

Architecture

Installation

Requirements

Download LLM

Download Data

Prepare Codebook

Training

Key Training Arguments

Results

Node Classification Accuracy

Project Structure

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages