Existing baseline approaches for scientific question answering follow a single-threaded iterative retrieval strategy, limiting their ability to comprehensively address multi-faceted questions and leading to incomplete coverage and answer organization.
SciRAG addresses these limitations through a novel framework with three key capabilities:
- Adaptive Retrieval: A Gap Critic mechanism automatically determines when additional retrieval is needed and uses tree-based query decomposition to enable parallel or sequential exploration of sub-questions, with citation graph expansion to enrich the retrieved paper set.
- Symbolic Reasoning-Based Reranking: A three-step symbolic reasoning process analyzes paper relationships and contributions to intelligently rerank retrieved documents.
- Outline-Guided Synthesis: Answers are synthesized through bottom-up aggregation along the query tree, guided by a structured outline to ensure comprehensive coverage and proper organization.
SciRAG achieves strong performance across both long-form literature review tasks (ScholarQA, QASA) and short-form answer tasks (SciFact, PubMedQA), demonstrating superior answer quality compared to existing baselines.
Create and activate the conda environment:
conda env create -f environment.yml
conda activate sciragRemember to configure your API keys:
- Set your Semantic Scholar API key in
web_search.py - Set your LLM API key in
config.yaml
For long-form answers (ScholarQA, QASA):
cd longans
python run.py --input_file test.jsonl --output_file testout.jsonl --model_name gpt4oFor short-form answers (SciFact, PubMedQA):
cd shortans/scifact # or shortans/pubmed
python run.py --input_file test.jsonl --output_file testout.jsonl --model_name gpt4oSince dense retrieval from a large-scale corpus (45 million papers) can be resource-intensive, we recommend following the retrieval method from:
This helps reduce the heavy resource cost during retrieval. Once retrieved, you can use the output file as the input to the SciRAG pipeline.
For evaluation tools and benchmarks, please refer to:
If you use our work and are inspired by our work, please consider cite us:
@misc{ding2025sciragadaptivecitationawareoutlineguided,
title={SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature},
author={Hang Ding and Yilun Zhao and Tiansheng Hu and Manasi Patwardhan and Arman Cohan},
year={2025},
eprint={2511.14362},
archivePrefix={arXiv},
primaryClass={cs.DL},
url={https://arxiv.org/abs/2511.14362},
}

