[paper]
We introduce an low-biased general annotated dataset generation (lbGen) framework which directly generates low-biased images with
category annotations.
We find that pre-training on our generated dataset can significantly improve the generalization ability of the models under the cross-category or cross-domain scenarios.
cd your_path/lbGen-main
conda create -n lbgen python=3.11 -y
conda activate lbgen
pip install -r debias.txtUsing this repo, you can obtain lbGen generator by fine-tuning the diffusion models (sd15). By default, we use a single machine with 8 gpus to train the model, and the checkpoints will be saved in 'your_path/lbGen-main/training/output/lbGen/'.
cd training
bash scripts/sd15.shAfter fine-tuning the diffusion model, you can generate the dataset. By default, we use 4 gpus to generate the dataset and the whole data will be saved in 'your_path/lbGen-main/data_gen/IN1K'.
cd /code/data_gen
python sd15gen.pyIt's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during cleaning the code and the differences between current available sd1.5 and previous runwayml version.
Feel free to inform us if you encounter any issues.
This code is mainly built upon diffusers and CoMat. Thank their open-source!
If you find lbGen useful for your research and applications, please consider starring this repository and citing:
@inproceedings{jiang2025lbgen,
title={Low-Biased General Annotated Dataset Generation},
author={Jiang, Dengyang and Wang, Haoyu and Zhang, Lei and Wei, Wei and Dai, Guang and Wang, Mengmeng and Wang, Jingdong and Zhang, Yanning},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={25113--25123},
year={2025}
}

