🤗UniParser/RxnBench |
🤗UniParser/RxnBench-Doc
RxnBench is a PhD-level benchmark suite for organic-chemistry Image/PDF VQA, split into two parts:
RxnBench(SF-QA): A benchmark for Chemical Reaction Figure Understanding, including 1,525 English/Chinese MCQs built on 305 peer-reviewed chemical reaction figures.
RxnBench(FD-QA): A benchmark for Multimodal Understanding of Chemistry Reaction Literature, including 540 English/Chinese multiple-select questions on document-level chemical reaction understanding.
The benchmark is released in both English and Chinese versions.
This repo provide a sample code to evaluate on this dataset.
- Python Environment: Python 3.8+ with required dependencies
- API Keys: OpenAI(Or other compatible services) API key for model inference and evaluation
- Data Setup: Ensure data files are properly placed
# Install
pip install -e .export MODEL_NAME="your-model-name" # e.g., "gpt-4o", "Qwen3-VL-2B-Instruct"
export OPENAI_API_KEY="your-openai-api-key"
export OPENAI_BASE_URL="your-base-url" # optional, defaults to OpenAI
export INFER_OUTPUT_DIR="./results" # output directory
export BASE_PATH="/path/to/rxnbench/data" # path to RxnBench data directory containing "pdf_files" and "images"(see below)Single Figure VQA Evaluation: UniParser/RxnBench
# Run inference for English and Chinese
cd rxnbench_eval
python example_inference.py
# Run evaluation
python evaluate.pyFull Document VQA Evaluation: UniParser/RxnBench-Doc
Step1: PDF file preparation
Note: Due to legal considerations, the actual PDF files for the document evaluation are not provided in our dataset and must be collected and prepared by the user.
To run the document evaluation benchmark, you need to prepare the corresponding PDF files for each paper referenced in the dataset:
-
Identify Required PDFs: The dataset contains a
pdf_doifield for each question, which contains the DOI (Digital Object Identifier) of the paper. -
Download PDFs: You can download the PDFs using the DOI from academic databases or publishers. Common sources include:
- Publisher websites (ACS, RSC, Wiley, etc.)
- Academic databases (PubMed, Google Scholar, etc.)
- Institutional access through universities/libraries
-
File Organization: Create a directory structure as follows:
BASE_PATH/ ├── pdf_files/ │ ├── 10.1021_jacsau.3c00814.pdf │ ├── 10.1021_ja123456.pdf │ └── ...(naming with data["pdf_doi"].replace("/", "_") as basename) └── images/ ├── question images unzipped from https://huggingface.co/datasets/UniParser/RxnBench-Doc/resolve/main/images.zip
Important Notes:
- Ensure you have proper access rights to download and use the PDFs
Step2: Run evaluation
# Run inference
cd rxnbench_doc_eval
python example_inference.py
# Run evaluation
python evaluate.py{MODEL_NAME}_{lang}.json: Raw model predictions{MODEL_NAME}_{lang}_extracted.jsonl: Processed predictions with accuracy{MODEL_NAME}_{lang}_accuracy.json: Accuracy statistics by question type{MODEL_NAME}_{lang}_error.jsonl: Failed predictions and errors
See the LICENSE file for details.
Our paper is coming soon. Please cite this repository for now:
@misc{rxnbench2025,
title={RxnBench: A Benchmark for Chemical Reaction Figure Understanding},
author={UniParser Team},
year={2025},
publisher={GitHub},
url={https://github.com/uni-parser/RxnBench}
}