Retrieval-Augmented Generation with Hierarchical Knowledge (HiRAG)

This is the repo for the paper Retrieval-Augmented Generation with Hierarchical Knowledge.

Model Pipeline

Install

# remember clone this repo first
cd HiRAG
pip install -e .

Quick Start

You can just utilize the following code to perform a query with HiRAG.

graph_func = HiRAG(
    working_dir="./your_work_dir",
    enable_llm_cache=True,
    enable_hierachical_mode=True, 
    embedding_batch_num=6,
    embedding_func_max_async=8, # according to your machine
    enable_naive_rag=True
    )
# indexing
with open("path_to_your_context.txt", "r") as f:
    graph_func.insert(f.read())
# retrieval & generation
print("Perform hi search:")
print(graph_func.query("The question you want to ask?", param=QueryParam(mode="hi")))

Or if you want to employ HiRAG with DeepSeek, ChatGLM, or other third-party retrieval api, here are the examples in ./hi_Search_deepseek.py, ./hi_Search_glm.py, and ./hi_Search_openai.py. The API keys and the LLM configurations can be set at ./config.yaml.

Evaluation

We take the procedure in Mix dataset as an example.

cd ./HiRAG/eval

Extract context from original QA datasets.

python extract_context.py -i ./datasets/mix -o ./datasets/mix

Insert context to Graph Database.

python insert_context_deepseek.py

Note that the script insert_context_deepseek.py is for the setting of generation with DeepSeek-v3 api, you can replace that with insert_context_openai.py or insert_context_glm.py.

Test with different versions of HiRAG.

# there are different retrieval options
# If you want to employ HiRAG approach, just run:
python test_deepseek.py -d mix -m hi
# If you want to employ naive RAG approach, just run:
python test_deepseek.py -d mix -m naive
# If you want to employ HiRAG approach w/o bridge, just run:
python test_deepseek.py -d mix -m hi_nobridge
# If you want to employ HiRAG approach with retrieving only local knowledge, just run:
python test_deepseek.py -d mix -m hi_local
# If you want to employ HiRAG approach with retrieving only global knowledge, just run:
python test_deepseek.py -d mix -m hi_global
# If you want to employ HiRAG approach with retrieving only bridge knowledge, just run:
python test_deepseek.py -d mix -m hi_bridge

Note that the dataset mix can be replaced to any other datasets in Hugging Face link. And the script test_deepseek.py is for the setting of generation with DeepSeek-v3 api, you can replace that with test_openai.py or test_glm.py.

Evaluate the generated answers.

First step, request for evaluations.

python batch_eval.py -m request -api openai
python batch_eval.py -m request -api deepseek

Second step, get the results.

python batch_eval.py -m result -api openai
python batch_eval.py -m result -api deepseek

Results

Compare with Naive RAG:

With the config output_file set as f"./datasets/{DATASET}/{DATASET}_eval_hi_naive.jsonl", just run the command:

python batch_eval.py -m result -api openai

Dataset	Dimension	NaiveRAG %	HiRAG %
Mix
	Comprehensiveness	16.6	83.4
	Empowerment	11.6	88.4
	Diversity	12.7	87.3
	Overall	12.4	87.6
CS
	Comprehensiveness	30.0	70.0
	Empowerment	29.0	71.0
	Diversity	14.5	85.5
	Overall	26.5	73.5
Legal
	Comprehensiveness	32.5	67.5
	Empowerment	25.0	75.0
	Diversity	22.0	78.0
	Overall	22.5	74.5
Agriculture
	Comprehensiveness	34.0	66.0
	Empowerment	31.0	69.0
	Diversity	21.0	79.0
	Overall	28.5	71.5

Compare with GraphRAG:

With the config output_file set as f"./datasets/{DATASET}/{DATASET}_eval_hi_graphrag.jsonl", just run the command:

python batch_eval.py -m result -api openai

Dataset	Dimension	GraphRAG %	HiRAG %
Mix
	Comprehensiveness	42.1	57.9
	Empowerment	35.1	64.9
	Diversity	40.5	59.5
	Overall	35.9	64.1
CS
	Comprehensiveness	40.5	59.5
	Empowerment	38.5	61.5
	Diversity	30.5	69.5
	Overall	36.0	64.0
Legal
	Comprehensiveness	48.5	51.5
	Empowerment	43.5	56.5
	Diversity	47.0	53.0
	Overall	45.5	54.5
Agriculture
	Comprehensiveness	49.0	51.0
	Empowerment	48.5	51.5
	Diversity	45.5	54.5
	Overall	46.0	54.0

Compare with LightRAG:

With the config output_file set as f"./datasets/{DATASET}/{DATASET}_eval_hi_lightrag.jsonl", just run the command:

python batch_eval.py -m result -api openai

Dataset	Dimension	LightRAG %	HiRAG %
Mix
	Comprehensiveness	36.8	63.2
	Empowerment	34.9	65.1
	Diversity	34.1	65.9
	Overall	34.1	65.9
CS
	Comprehensiveness	44.5	55.5
	Empowerment	41.5	58.5
	Diversity	33.0	67.0
	Overall	41.0	59.0
Legal
	Comprehensiveness	49.0	51.0
	Empowerment	43.5	56.5
	Diversity	63.0	37.0
	Overall	48.0	52.0
Agriculture
	Comprehensiveness	38.5	61.5
	Empowerment	36.5	63.5
	Diversity	37.5	62.5
	Overall	38.5	61.5

Compare with FastGraphRAG:

With the config output_file set as f"./datasets/{DATASET}/{DATASET}_eval_hi_fastgraphrag.jsonl", just run the command:

python batch_eval.py -m result -api openai

Dataset	Dimension	FastGraphRAG %	HiRAG %
Mix
	Comprehensiveness	0.8	99.2
	Empowerment	0.8	99.2
	Diversity	0.8	99.2
	Overall	0.8	99.2
CS
	Comprehensiveness	0.0	100.0
	Empowerment	0.0	100.0
	Diversity	0.5	99.5
	Overall	0.0	100.0
Legal
	Comprehensiveness	1.0	99.0
	Empowerment	0.0	100.0
	Diversity	1.5	98.5
	Overall	0.0	100.0
Agriculture
	Comprehensiveness	0.0	100.0
	Empowerment	0.0	100.0
	Diversity	0.0	100.0
	Overall	0.0	100.0

Compare with KAG:

With the config output_file set as f"./datasets/{DATASET}/{DATASET}_eval_hi_kag.jsonl", just run the command:

python batch_eval.py -m result -api openai

Dataset	Dimension	KAG %	HiRAG %
Mix
	Comprehensiveness	2.3	97.7
	Empowerment	3.5	96.5
	Diversity	3.8	96.2
	Overall	2.3	97.7
CS
	Comprehensiveness	1.0	99.0
	Empowerment	4.5	95.5
	Diversity	5.0	95.0
	Overall	1.5	98.5
Legal
	Comprehensiveness	16.5	83.5
	Empowerment	9.0	91.0
	Diversity	11.0	89.0
	Overall	8.5	91.5
Agriculture
	Comprehensiveness	5.0	95.0
	Empowerment	5.0	95.0
	Diversity	3.5	96.5
	Overall	0.0	100.0

Acknowledgement

We gratefully acknowledge the use of the following open-source projects in our work:

nano-graphrag: a simple, easy-to-hack GraphRAG implementation
RAPTOR: a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents.

Cite Us

@misc{huang2025retrievalaugmentedgenerationhierarchicalknowledge,
      title={Retrieval-Augmented Generation with Hierarchical Knowledge}, 
      author={Haoyu Huang and Yongfeng Huang and Junjie Yang and Zhenyu Pan and Yongqiang Chen and Kaili Ma and Hongzhi Chen and James Cheng},
      year={2025},
      eprint={2503.10150},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.10150}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
case		case
docs		docs
eval		eval
hirag		hirag
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
config.yaml		config.yaml
hi_Search_deepseek.py		hi_Search_deepseek.py
hi_Search_glm.py		hi_Search_glm.py
hi_Search_openai.py		hi_Search_openai.py
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation with Hierarchical Knowledge (HiRAG)

Model Pipeline

Install

Quick Start

Evaluation

Results

Compare with Naive RAG:

Compare with GraphRAG:

Compare with LightRAG:

Compare with FastGraphRAG:

Compare with KAG:

Acknowledgement

Cite Us

About

Releases

Packages

Contributors 3

Languages

License

hhy-huang/HiRAG

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation with Hierarchical Knowledge (HiRAG)

Model Pipeline

Install

Quick Start

Evaluation

Results

Compare with Naive RAG:

Compare with GraphRAG:

Compare with LightRAG:

Compare with FastGraphRAG:

Compare with KAG:

Acknowledgement

Cite Us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages