<a href="https://colab.research.google.com/github/xliu0628/ML_courses/blob/main/dreambooth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dreambooth Personalized Image Generation

## Setup and Installation


**Important**: We need to clone this forked diffusers repo, as it contains additional arguments / modifications to improve the results

In [None]:
# Install diffusers
!git clone https://github.com/ShivamShrirao/diffusers

%cd diffusers/
!pip install -e .

# For GPU efficiency
!pip install bitsandbytes

# Install dreambooth requirements
%cd /content/diffusers/examples/dreambooth/
!pip install -r requirements.txt

# Accelerate environment
!accelerate config default

Cloning into 'diffusers'...
remote: Enumerating objects: 20180, done.[K
remote: Total 20180 (delta 0), reused 0 (delta 0), pack-reused 20180[K
Receiving objects: 100% (20180/20180), 22.96 MiB | 6.36 MiB/s, done.
Resolving deltas: 100% (14456/14456), done.
/content/diffusers
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/diffusers
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface-hub>=0.13.2
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 KB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: diffusers
  Building editable for diffusers (pyproject.toml) ... [?25l[

## Which GPU can we use?

- Typically we have 16GB GPUs on Colab (Free Version)
- The below settings should be fine to run the code. Otherwise you might want to utilize xformers memory efficient attention (as described [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth))

In [None]:
!nvidia-smi

Sat Apr  8 08:41:00 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Time to upload the samples

In [None]:
%cd /content
print("Make sure you put the images into the newly created directory!")
!mkdir images

/content
Make sure you put the images into the newly created directory!


## Fine-tuning

### Explanation of the most important args:

| Argument | Description |
| --- | --- |
| instance_data_dir | Directory of the sample images |
| instance_prompt | Prompt with the special token like [V], zwx, sks... |
| with_prior_preservation | Used to avoid overfitting and language-drift |
| num_class_images | Number of generated images (prior) |
| class_prompt | Used together with prior preservation - type of generated samples to avoid prior loss |
| use_8bit_adam | Quantized optimizer to reduce GPU memory (from bitsandbytes using reduced precision) |
| mixed_precision | Another accelerator to reduce data type precision |
| pretrained_vae_name_or_path | Custom autoencoder to improve eyes and faces |


- There are also other scripts like train_dreambooth_lora.py
- The actual values are based on a mix of experimentation and different diffusers tutorials



In [None]:
%cd /content
%env MODEL_NAME=runwayml/stable-diffusion-v1-5
%env INSTANCE_DIR=/content/images/
%env OUTPUT_DIR=outputs/
%env CLASS_DIR=/content/diffusers/examples/dreambooth/person/


!accelerate launch diffusers/examples/dreambooth/train_dreambooth.py \
    --pretrained_model_name_or_path=$MODEL_NAME  \
    --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
    --instance_data_dir=$INSTANCE_DIR \
    --output_dir=$OUTPUT_DIR \
    --class_data_dir=$CLASS_DIR \
    --with_prior_preservation --prior_loss_weight=1.0 \
    --instance_prompt="photo of zwx person" \
    --class_prompt="photo of person" \
    --resolution=512 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=1 --gradient_checkpointing \
    --use_8bit_adam \
    --mixed_precision="fp16" \
    --learning_rate=1e-6 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --num_class_images=300 \
    --max_train_steps=1000

/content
env: MODEL_NAME=runwayml/stable-diffusion-v1-5
env: INSTANCE_DIR=/content/images/
env: OUTPUT_DIR=outputs/
env: CLASS_DIR=/content/diffusers/examples/dreambooth/person/
Downloading (…)lve/main/config.json: 100% 547/547 [00:00<00:00, 93.9kB/s]
Downloading (…)ch_model.safetensors: 100% 335M/335M [00:01<00:00, 288MB/s]
  with safe_open(filename, framework="pt", device=device) as f:
  return self.fget.__get__(instance, owner)()
  storage = cls(wrap_storage=untyped_storage)
Downloading (…)ain/model_index.json: 100% 543/543 [00:00<00:00, 92.3kB/s]
Fetching 15 files:   0% 0/15 [00:00<?, ?it/s]
Downloading model.safetensors:   0% 0.00/1.22G [00:00<?, ?B/s][A

Downloading model.safetensors:   0% 0.00/492M [00:00<?, ?B/s][A[A


Downloading (…)rocessor_config.json: 100% 342/342 [00:00<00:00, 55.0kB/s]
Fetching 15 files:   7% 1/15 [00:00<00:03,  3.56it/s]


Downloading (…)_checker/config.json:   0% 0.00/4.72k [00:00<?, ?B/s][A[A[A



Downloading (…)_checker/config.json: 100% 4.72k/4

## Generation

**You might need to restart the runtime after running this command**

In [None]:
# Ugly hack, I'm too stupid to import StableDiffusionPipeline from local installation
# If you manage to fix this, let me know :)
!pip uninstall diffusers -y
!pip install diffusers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting diffusers
  Downloading diffusers-0.14.0-py3-none-any.whl (737 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m737.4/737.4 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub>=0.10.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, diffusers
Successfully installed diffusers-0.14.0 huggingface-hub-0.13.4


- Ideas for better prompts: https://stablediffusionweb.com/prompts

In [None]:
%cd /content
import torch
import random
from diffusers import StableDiffusionPipeline

model_id = "outputs/1000"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

example_prompts = [
"close-up photo of zwx person in a suit",
"closeup portrait of zwx person in a suit, highly detailed, with future city skyline in the background",
"Professional headshot of zwx person inside a modern office building, highly detailed, business, open eyes",
"LinkedIn profile picture of zwx person, close up, professional, businessman, blue suit, 8k"
]

num_images = 10
for i in range(num_images):
    prompt = example_prompts[random.randint(0, len(example_prompts) - 1)]
    image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
    image.save(f"profile_{i}.png")

/content


The config attributes {'class_embeddings_concat': False} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .


  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]