Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

Installation

Critic-R requires Python 3.10+ and CUDA-capable GPUs. We recommend two separate conda environments — one for inference/serving and one for retriever training.

1. Inference / serving environment

conda create -n sr1 python=3.10 -y
conda activate sr1

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install transformers==4.45.0 datasets accelerate
pip install aiohttp requests tqdm numpy

2. Retriever environment

conda create -n retriever python=3.10 -y
conda activate retriever

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.45.0 sentence-transformers
pip install faiss-gpu==1.7.2
pip install deepspeed
pip install xformers

Building the Wikipedia Index

cd search-r1/search
bash build_index_stella400M.sh

This produces a Flat FAISS index over wiki-18.jsonl using mean-pooled Stella embeddings (max_length=256, fp16). Swap --faiss_type Flat for HNSW32/64/128 to use approximate search, or set --retrieval_method bm25 for a lexical baseline.

Quick Start

A full Critic-R inference run involves three services and one driver script:

Critic LLM server (vLLM, OpenAI-compatible API).
Dense retrieval server (FAISS over Wikipedia-18).
Driver, which hosts the reasoning agent locally and orchestrates the loop.

# (1) Launch the critic LLM
sbatch vllm_server_job_multigpu.sh                       # -> http://<host>:8001/v1/chat/completions

# (2) Launch the retrieval server
sbatch retrieval_launch__stella400M.sh                   # -> http://<host>:8000/retrieve

# (3) Run the Critic-R loop
cd search-r1/critic-r
bash run.sh

Citation

If you find this work useful, please cite:

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

@misc{alam2026criticrimprovingagenticsearch,
      title={Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback}, 
      author={Md Zarif Ul Alam and Alireza Salemi and Hamed Zamani},
      year={2026},
      eprint={2606.00590},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2606.00590}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
retriever		retriever
search-r1		search-r1
.DS_Store		.DS_Store
Critic-R.png		Critic-R.png
README.md		README.md
eval_score.py		eval_score.py
eval_score.sh		eval_score.sh
retrieval_launch__stella400M.sh		retrieval_launch__stella400M.sh
vllm_server_job_multigpu.sh		vllm_server_job_multigpu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Installation

1. Inference / serving environment

2. Retriever environment

Building the Wikipedia Index

Quick Start

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Installation

1. Inference / serving environment

2. Retriever environment

Building the Wikipedia Index

Quick Start

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages