This repository contains the official implementation of zERExtractor, an automated and extensible platform for multimodal extraction of enzyme-catalyzed reaction data from scientific literature.
The system integrates tables, molecular diagrams, enzyme sequences, and experimental conditions into structured, machine-readable datasets for downstream AI-driven modeling.
- ✅ Unified framework combining deep learning and large language models
- ✅ Supports tables, figures, and text extraction
- ✅ Benchmarked on 1,000+ annotated tables and 5,000 biological fields
- ✅ Achieves 89.9% accuracy on table recognition and 98%+ accuracy on molecular recognition
|
You can explore zERExtractor directly through our online platform:
🔗 zERExtractor Platform based on zCloud platform by Shanghai Zelixir Biotech Co Ltd
🛠️ The source code will be released upon the acceptance and publication of our paper.
Ryan(CAS) 📧 ryan5zh5@gmail.com 📧 contact@zelixir.com
If you find this work useful, please cite:
@article{zhou2025zerextractor,
title={zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature},
author={Zhou, Rui and Ma, Haohui and Xin, Tianle and Zou, Lixin and Hu, Qiuyue and Cheng, Hongxi and Lin, Mingzhi and Guo, Jingjing and Wang, Sheng and Zhang, Guoqing and others},
journal={arXiv preprint arXiv:2508.09995},
year={2025}
}

