This is the training code for EXPO. The script is adapted from the diffusers library and the Diffusion-DPO code.
The below are initialized with StableDiffusion models and trained as described in the paper (replicable with launchers/ scripts assuming 8 GPUs, scale gradient accumulation accordingly).
Require CUDA==12.1 for sdpa acceleration
conda create -n expo python=3.10 && \
conda activate expo && \
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124&& \
pip install -r requirements.txtwandb loginlaunchers/is examples of running SD1.5 training. Meanwhile, dist_bench.sh and dist_pick.sh are convinient scripts for running benchmark experiments.utils/has the scoring models for evaluation or AI feedback (PickScore, HPS, Aesthetics, CLIP)requirements.txtBasic pip requirementstrain.pyMain script, this is pretty bulky at >1000 lines, training loop starts at ~L1000 at this commit (ctrl-F"for epoch").data_gen*.pyScripts for on-policy data generation. I provide 4 versions to adapt to different algorithms.
Example SD1.5 launch
# Effective BS will be (N_GPU * train_batch_size * gradient_accumulation_steps)
# Paper used 2048. Training takes ~24 hours / 2000 steps
bash launchers/sd15.shMODEL_NAME: Current main agent model path.DATASET_DIR: Current dataset path.OUT_DIR: Current output path.EXPERIMENT_NAME: Current experiment name.PROMPT_PATH: Current prompt path for data generation.SELECTOR: Current reward model name.
--pretrained_model_name_or_pathwhat model to train/initalize from--output_dirwhere to save/log to--seedtraining seed (not set by default)--sdxlrun SDXL training--sftrun SFT instead of DPO
--beta_dpoKL-divergence parameter beta for DPO--choice_modelModel for AI feedback (Aesthetics, CLIP, PickScore, HPS)
-
--max_train_stepsHow many train steps to take -
--gradient_accumulation_steps -
--train_batch_sizesee above notes in script for actual BS -
--checkpointing_stepshow often to save model -
--gradient_checkpointingReduces GPU memory usage -
--learning_rate -
--scale_lrFound this to be very helpful but isn't default in code -
--lr_schedulerType of LR warmup/decay. Default is linear warmup to constant -
--lr_warmup_stepsnumber of scheduler warmup steps -
--use_adafactorAdafactor over Adam (lower memory)
--cache_dirwhere dataset is cached locally (users will want to change this to fit their file system)--resolutiondefaults to 512 for sd1.5--random_cropand--no_hflipchanges data aug--dataloader_num_workersnumber of total dataloader workers
# Effective BS will be (N_GPU * train_batch_size * gradient_accumulation_steps)
# Paper used 2048. Training takes ~24 hours / 2000 steps
bash launchers/pipe.shstart_iteration: Training starts from this iteration.total_iterations: Total number of iterations you want to runskip_first_training: Whether to skip the first training(If your training is interrupted by some reason, you can resume from the last iteration)