Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li+
contact: zd1423606603@gmail.com
We introduce Uni-Edit, an intelligent image editing task that serves as the first general task for Unified Multimodal Model (UMM) tuning. Unlike conventional mixed multi-task training that suffers from inherent task conflicts and requires complex multi-stage pipelines, Uni-Edit breaks this paradigm. It achieves true mutual reinforcement by improving image understanding, generation, and editing capabilities simultaneously using only one task, one training stage, and one dataset.
To overcome the limitations of simplistic existing editing data, we propose the first automated and scalable data synthesis pipeline for intelligent editing. By transforming diverse VQA data into complex instructions with embedded questions and nested logic, we build Uni-Edit-148k, a dedicated dataset pairing reasoning-intensive instructions with high-quality edited images.
Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves comprehensive enhancements across all three multimodal capabilities without requiring any massive data mixing, balancing tricks, or auxiliary operations.
- May 21, 2026: Releasing train, inference, eval code and models!
1️⃣ Set up environment
git clone https://github.com/zhengdian1/Uni-Edit.git
cd Uni-Edit
conda create -n uniedit python=3.10 -y
conda activate uniedit
pip install -r requirements.txt
pip install flash_attn==2.5.8 --no-build-isolation2️⃣ Download pretrained checkpoint
from huggingface_hub import snapshot_download
save_dir = "your/path/to/Uni-Edit-BAGEL"
repo_id = "Uni-Edit/Uni-Edit-BAGEL"
cache_dir = save_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
local_dir=save_dir,
repo_id=repo_id,
local_dir_use_symlinks=False,
resume_download=True,
allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)AutoModel.from_pretrained(). To run the provided inference code, you MUST physically merge these shards into a single ema.safetensors file on your local machine.
Run the following Python script in the directory where you downloaded the repository. (Note: You need at least 54GB of free system RAM to perform this merge).
python merge.py --model_path your/path/to/Uni-Edit-BAGEL3️⃣ Quick infer with Uni-Edit with task type gen, und, edit!
python infer.py --task editbash train.shYou can replace the variables in the script with your own before running. See TRAIN for more details.
bash scripts/eval/run_geneval.sh
bash scripts/eval/run_wise.sh
bash scripts/eval/run_eval_vlm.sh
bash scripts/eval/run_imgedit.sh
bash scripts/eval/run_gedit.sh
bash scripts/eval/run_rise.shWe provide the scripts for evaluating VLM, T2I and Editing benchmarks. See EVAL for more details.
We provide the scripts for our full data construction pipeline. See DATA for more details.
@article{zheng2026uniedit,
title = {Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning},
author = {Zheng, Dian and Zhang, Manyuan and Li, Hongyu and Liu, Hongbo and Zou, Kai and Feng, Kaituo and Li, Hongsheng},
journal = {arXiv preprint arXiv:2605.21487},
year = {2026}
}Uni-Edit is licensed under the Apache 2.0.

