Skip to content

zelixirSH/ERExtractor

 
 

Repository files navigation

zERExtractor: Automated Multimodal Extraction of Enzyme-Catalyzed Reaction Data

zERExtractor Overview

📌 Introduction

This repository contains the official implementation of zERExtractor, an automated and extensible platform for multimodal extraction of enzyme-catalyzed reaction data from scientific literature.
The system integrates tables, molecular diagrams, enzyme sequences, and experimental conditions into structured, machine-readable datasets for downstream AI-driven modeling.

🚀 Features

  • ✅ Unified framework combining deep learning and large language models
  • ✅ Supports tables, figures, and text extraction
  • ✅ Benchmarked on 1,000+ annotated tables and 5,000 biological fields
  • ✅ Achieves 89.9% accuracy on table recognition and 98%+ accuracy on molecular recognition

📊 Results

Method Acc(%) Gain
TableMaster 77.90* -
LGPMA 65.74* -
SLANet 86.0 -
Ours 89.9 3.9%

zERExtractor Overview

2025-09-05 21 25 18

⚡ Quick Start

You can explore zERExtractor directly through our online platform:
🔗 zERExtractor Platform based on zCloud platform by Shanghai Zelixir Biotech Co Ltd

🛠️ The source code will be released upon the acceptance and publication of our paper.

🌐 Links

📬 Contact

Ryan(CAS) 📧 ryan5zh5@gmail.com 📧 contact@zelixir.com

📖 Citation

If you find this work useful, please cite:

@article{zhou2025zerextractor,
  title={zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature},
  author={Zhou, Rui and Ma, Haohui and Xin, Tianle and Zou, Lixin and Hu, Qiuyue and Cheng, Hongxi and Lin, Mingzhi and Guo, Jingjing and Wang, Sheng and Zhang, Guoqing and others},
  journal={arXiv preprint arXiv:2508.09995},
  year={2025}
}

About

Zelixir's Enzyme Reaction Data Extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.6%
  • HTML 23.4%