RxnBench: Benchmark for Chemical Reaction Figure/Document Understanding

🤗UniParser/RxnBench | 🤗UniParser/RxnBench-Doc

Benchmark Summary

RxnBench is a PhD-level benchmark suite for organic-chemistry Image/PDF VQA, split into two parts:

RxnBench(SF-QA): A benchmark for Chemical Reaction Figure Understanding, including 1,525 English/Chinese MCQs built on 305 peer-reviewed chemical reaction figures.

RxnBench(FD-QA): A benchmark for Multimodal Understanding of Chemistry Reaction Literature, including 540 English/Chinese multiple-select questions on document-level chemical reaction understanding.

The benchmark is released in both English and Chinese versions.

This repo provide a sample code to evaluate on this dataset.

How to run

Prerequisites

Python Environment: Python 3.8+ with required dependencies
API Keys: OpenAI(Or other compatible services) API key for model inference and evaluation
Data Setup: Ensure data files are properly placed

Installation

# Install
pip install -e .

Environment Setup

export MODEL_NAME="your-model-name"           # e.g., "gpt-4o", "Qwen3-VL-2B-Instruct"
export OPENAI_API_KEY="your-openai-api-key"
export OPENAI_BASE_URL="your-base-url"         # optional, defaults to OpenAI
export INFER_OUTPUT_DIR="./results"            # output directory
export BASE_PATH="/path/to/rxnbench/data"      # path to RxnBench data directory containing "pdf_files" and "images"(see below)

Running Evaluations

Benchmark 1: RxnBench (SF-QA)

Single Figure VQA Evaluation: UniParser/RxnBench

# Run inference for English and Chinese
cd rxnbench_eval
python example_inference.py

# Run evaluation
python evaluate.py

Benchmark 2: RxnBench (FD-QA)

Full Document VQA Evaluation: UniParser/RxnBench-Doc

Step1: PDF file preparation

Note: Due to legal considerations, the actual PDF files for the document evaluation are not provided in our dataset and must be collected and prepared by the user.

To run the document evaluation benchmark, you need to prepare the corresponding PDF files for each paper referenced in the dataset:

Identify Required PDFs: The dataset contains a pdf_doi field for each question, which contains the DOI (Digital Object Identifier) of the paper.
Download PDFs: You can download the PDFs using the DOI from academic databases or publishers. Common sources include:
- Publisher websites (ACS, RSC, Wiley, etc.)
- Academic databases (PubMed, Google Scholar, etc.)
- Institutional access through universities/libraries

File Organization: Create a directory structure as follows:

BASE_PATH/
├── pdf_files/
│   ├── 10.1021_jacsau.3c00814.pdf
│   ├── 10.1021_ja123456.pdf
│   └── ...(naming with data["pdf_doi"].replace("/", "_") as basename)
└── images/
    ├── question images unzipped from https://huggingface.co/datasets/UniParser/RxnBench-Doc/resolve/main/images.zip

Important Notes:

Ensure you have proper access rights to download and use the PDFs

Step2: Run evaluation

# Run inference
cd rxnbench_doc_eval
python example_inference.py

# Run evaluation
python evaluate.py

Output Files

{MODEL_NAME}_{lang}.json: Raw model predictions
{MODEL_NAME}_{lang}_extracted.jsonl: Processed predictions with accuracy
{MODEL_NAME}_{lang}_accuracy.json: Accuracy statistics by question type
{MODEL_NAME}_{lang}_error.jsonl: Failed predictions and errors

📄 License

See the LICENSE file for details.

📖 Citation

Our paper is coming soon. Please cite this repository for now:

@misc{rxnbench2025,
  title={RxnBench: A Benchmark for Chemical Reaction Figure Understanding},
  author={UniParser Team},
  year={2025},
  publisher={GitHub},
  url={https://github.com/uni-parser/RxnBench}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
rxnbench_doc_eval		rxnbench_doc_eval
rxnbench_eval		rxnbench_eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_eval.py		run_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RxnBench: Benchmark for Chemical Reaction Figure/Document Understanding

Benchmark Summary

How to run

Prerequisites

Installation

Environment Setup

Running Evaluations

Benchmark 1: RxnBench (SF-QA)

Benchmark 2: RxnBench (FD-QA)

Output Files

📄 License

📖 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

uni-parser/RxnBench

Folders and files

Latest commit

History

Repository files navigation

RxnBench: Benchmark for Chemical Reaction Figure/Document Understanding

Benchmark Summary

How to run

Prerequisites

Installation

Environment Setup

Running Evaluations

Benchmark 1: RxnBench (SF-QA)

Benchmark 2: RxnBench (FD-QA)

Output Files

📄 License

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages