Yinda Chen¹,²*, Haoyuan Shi¹,²*, Xiaoyu Liu¹, Te Shi², Ruobing Zhang³,², Dong Liu¹, Zhiwei Xiong¹,²†, Feng Wu¹,²‡
¹MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
²Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
³Institute for Brain and Intelligence, Fudan University
*Equal Contribution †Project Leader ‡Corresponding Author
This repository contains the official implementation of the paper TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation. It includes experimental settings, source code, and theoretical proofs. For details, please refer to the original paper.
- [2025.06] 🎉 TokenUnify was accepted by ICCV 2025, looking forward to meeting you in Hawaii.
- [2025.06] 📊 MEC dataset released! Wafer (MEC) dataset available on HuggingFace.
- [2025.06] 🔧 Pre-trained weights updated! Robust initialization weights (pre-trained, not fine-tuned) available in the Pretrained_weights folder on HuggingFace.
- [2024.12] 🎉 Code and pre-training dataset released! Core implementation and pre-training weights released.
- [2024.12] 📊 Datasets released! Pre-training dataset available on HuggingFace.
- [2024.05] 📝 Paper released! TokenUnify paper published on arXiv.
TokenUnify introduces a novel autoregressive visual pre-training method for neuron segmentation from electron microscopy (EM) volumes. The method tackles the unique challenges of EM data including high noise levels, anisotropic voxel dimensions, and ultra-long spatial dependencies through hierarchical predictive coding that combines three complementary prediction tasks:
- Random Token Prediction: Captures noise-robust spatial patterns and learns position-invariant local feature detectors.
- Next Token Prediction: Maintains sequential dependencies and captures critical transitional patterns in neuronal morphology.
- Next-All Token Prediction: Models global context and long-range correlations while mitigating cumulative errors in autoregression.
Leveraging the Mamba architecture's linear-time sequence modeling capabilities, TokenUnify achieves 44% improvement in neuron segmentation performance compared to training from scratch and 25% improvement over MAE, while demonstrating superior scaling properties and reducing autoregressive error accumulation from O(K) to O(√K) for sequences of length K.
Set up the environment using our Docker image:
sudo docker pull registry.cn-hangzhou.aliyuncs.com/mybitahub/large_model:mamba0224_ydchenDatasets for pre-training and segmentation:
| Dataset Type | Dataset Name | Description | URL |
|---|---|---|---|
| Pre-training Dataset | Large EM Datasets | Various brain regions for pre-training | 🤗 EM Pretrain Dataset |
| Segmentation Dataset | Wafer (MEC) | High-resolution neuron segmentation | 🤗 Wafer_EM Dataset |
| Segmentation Dataset | CREMI Dataset | Circuit reconstruction challenge | CREMI Dataset |
| Segmentation Dataset | AC3/AC4 | Mouse brain cortex dataset | Google Drive |
Pre-trained (robust initialization, not fine-tuned) TokenUnify weights are available in the Pretrained_weights folder:
| Model | Parameters | Dataset | URL |
|---|---|---|---|
| TokenUnify_pretrained-100M | 100M | EM Multi-dataset | 🤗 Pretrained_weights |
| TokenUnify_pretrained-200M | 200M | EM Multi-dataset | 🤗 Pretrained_weights |
| TokenUnify_pretrained-500M | 500M | EM Multi-dataset | 🤗 Pretrained_weights |
| TokenUnify_pretrained-1B | 1B | EM Multi-dataset | 🤗 Pretrained_weights |
Fine-tuned weights are also available:
| Model | Parameters | Dataset | URL |
|---|---|---|---|
| TokenUnify-100M | 100M | EM Multi-dataset | 🤗 HuggingFace |
| TokenUnify-200M | 200M | EM Multi-dataset | 🤗 HuggingFace |
| TokenUnify-500M | 500M | EM Multi-dataset | 🤗 HuggingFace |
| TokenUnify-1B | 1B | EM Multi-dataset | 🤗 HuggingFace |
| superhuman | - | EM Multi-dataset | 🤗 HuggingFace |
bash src/run_mamba_mae_AR.shbash src/launch_huge.shbash src/run_mamba_seg.sh-
Hierarchical Predictive Coding Framework: We introduce a unified framework that integrates three distinct visual structure perspectives within a coherent information-theoretic formulation, providing optimal coverage of visual data structure while reducing autoregressive error accumulation from O(K) to O(√K).
-
Large-Scale EM Dataset: We construct one of the largest EM neuron segmentation datasets with 1.2 billion finely annotated voxels across six functional brain regions, providing an ideal testbed for long-sequence visual modeling.
-
Billion-Parameter Mamba Network: We achieve the first billion-parameter Mamba network for visual autoregression, demonstrating both effectiveness and computational efficiency in processing long-sequence visual data with favorable scaling properties.
Usage Notes
-
Non-commercial Use: Users do not have the rights to copy, distribute, publish, or use the data for commercial purposes or develop and produce products. Any format or copy of the data is considered the same as the original data. Users may modify the content and convert the data format as needed but are not allowed to publish or provide services using the modified or converted data without permission.
-
Research Purposes Only: Users guarantee that the authorized data will only be used for their own research and will not share the data with third parties in any form.
-
Citation Requirements: Research results based on the authorized data, including books, articles, conference papers, theses, policy reports, and other publications, must cite the data source according to citation norms, including the authors and the publisher of the data.
-
Prohibition of Profit-making Activities: Users are not allowed to use the authorized data for any profit-making activities.
-
Termination of Data Use: Users must terminate all use of the data and destroy the data (e.g., completely delete from computer hard drives and storage devices/spaces) upon leaving their team or organization or when the authorization is revoked by the copyright holder.
- Sample Source: Mouse MEC MultiBeam-SEM, Intelligent Institute Brain Imaging Platform (Wafer 4 at layer VI, wafer 25, wafer 26, and wafer 36 at layer II/III)
- Resolution: 8nm x 8nm x 35nm
- Volume Size: 1250 x 1250 x 125
- Annotation Completion Dates: 2023.12.11 (w4), 2024.04.12 (w36)
- Authors: Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu
- Copyright Holder: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
- Chinese Name: 合肥人工智能研究院
- English Name: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
- 📝 Open-source core code
- 📖 Write README for code usage
- 🗂️ Open-source pre-training dataset
- ⚖️ Upload pre-trained and fine-tuned weights
- 🧠 Release Wafer (MEC) dataset
- 🏆 Release evaluation scripts and benchmarks
- 🔧 Add support for natural image datasets
If you find this code or dataset useful, please cite:
@inproceedings{chen2025tokenunify,
title={TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation},
author={Chen, Yinda and Shi, Haoyuan and Liu, Xiaoyu and Shi, Te and Zhang, RuoBing and Liu, Dong and Xiong, Zhiwei and Wu, Feng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}We welcome contributions to improve TokenUnify! Please submit issues and pull requests.
For questions, contact: cyd0806@mail.ustc.edu.cn
⭐ If you find this work helpful, please give us a star! ⭐




