G-Retriever

This repository contains the source code for the paper "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering".

We introduce G-Retriever, a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning.

G-Retriever integrates the strengths of Graph Neural Networks (GNNs), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), and can be fine-tuned to enhance graph understanding via soft prompting.

Citation

@misc{he2024gretriever,
      title={G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering}, 
      author={Xiaoxin He and Yijun Tian and Yifei Sun and Nitesh V. Chawla and Thomas Laurent and Yann LeCun and Xavier Bresson and Bryan Hooi},
      year={2024},
      eprint={2402.07630},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Environment setup

conda create --name g_retriever python=3.9 -y
conda activate g_retriever

# https://pytorch.org/get-started/locally/
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.version.cuda)"
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu118.html

pip install peft
pip install pandas
pip install ogb
pip install transformers
pip install wandb
pip install sentencepiece
pip install torch_geometric
pip install datasets
pip install pcst_fast

Data Preprocessing

# expla_graphs
python -m src.dataset.preprocess.expla_graphs
python -m src.dataset.expla_graphs

# scene_graphs, might take
python -m src.dataset.preprocess.scene_graphs
python -m src.dataset.scene_graphs

# webqsp
python -m src.dataset.preprocess.webqsp
python -m src.dataset.webqsp

Training

Replace path to the llm checkpoints in the src/model/__init__.py, then run

1) Inference-Only LLM

python inference.py --dataset scene_graphs --model_name inference_llm --llm_model_name 7b_chat

2) Frozen LLM + Prompt Tuning

# promot tuning
python train.py --dataset scene_graphs_baseline --model_name pt_llm

# G-Retriever
python train.py --dataset scene_graphs --model_name graph_llm

3) Tuned LLM

# finetune LLM with LoRA
python train.py --dataset scene_graphs_baseline --model_name llm --llm_frozen False

# G-Retriever with LoRA
python train.py --dataset scene_graphs --model_name graph_llm --llm_frozen False

Reproducibility

Use run.sh to run the codes and reproduce the published results in the main table.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
figs		figs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

figs

figs

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

inference.py

inference.py

run.sh

run.sh

train.py

train.py

Repository files navigation

G-Retriever

Citation

Environment setup

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Releases

Packages

Languages

License

XiaoxinHe/G-Retriever

Folders and files

Latest commit

History

Repository files navigation

G-Retriever

Citation

Environment setup

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Resources

License

Stars

Watchers

Forks

Languages