## 5.2 Stable Diffusion and DreamBooth
 Stable diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

 We use `runwayml/stable-diffusion-v1-5` in this notebook. 

Use the pre-downloaded model weights on `/share/lab5/sd`, or make sure you have downloaded the weights of stable-diffusion: 


### Stable Diffusion Model

In [None]:
# You can directly use the downloaded weights from /share/lab5/sd, or pre-download them:

#!export HF_ENDPOINT=https://hf-mirror.com
#!huggingface-cli download --resume-download runwayml/stable-diffusion-v1-5  --local-dir your_path_of_sd

In [None]:
#If you encounter an 'out of memory' error, make sure no other programs are running on this GPU (reset the previous notebook's kernel).

from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("/share/lab5/sd", torch_dtype=torch.float16, variant="fp16").to("cuda")
output = pipeline("stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k")

for image in output.images:
    image.show()


In [None]:
# using stable diffusion in Image2Image pipeline  (just change the AutoPipeline class)

from diffusers import AutoPipelineForImage2Image
import torch
import requests
from PIL import Image
from io import BytesIO

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "/share/lab5/sd",
    torch_dtype=torch.float16,
    use_safetensors=True,
).to("cuda")


In [None]:
# the image to image pipeline requires inputting an image.

image = Image.open("/share/lab5/data/Girl_with_a_Pearl_Earring.jpg").convert("RGB")
prompt = "a portrait of a dog wearing a pearl earring"
image.thumbnail((768, 768))

image = pipeline(prompt, image, num_inference_steps=200, strength=0.75, guidance_scale=10.5).images[0]
image

### [Do it after class, as it will take a while] Stable-diffusion with Dreambooth
DreamBooth is a training technique that updates the entire diffusion model by training on just a few images of a subject or style. It works by associating a special word in the prompt with the example images.

- If you’re training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and  `mixed_precision` parameters in the training command. 

- The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces. 
Pass the `--train_text_encoder` argument to the script to enable training `text_encoder`.

First, we need to finetune the diffusion model.

I strongly suggest you reset your kernel here to prevent it from out of memory.

In [None]:
# a few directory settings

MODEL_NAME="/share/lab5/sd"
INSTANCE_DIR="/share/lab5/data/dog" 
CLASS_DIR='/scratch2/original_dog' 
MODEL_OUTPUT="/scratch2/dog-model"

INSTANCE_PROMPT='a photo of sks dog'
CLASS_PROMPT="a photo of a dog"

In [None]:
# run the training script (this will take a long while)

!python train_dreambooth.py \
  --pretrained_model_name_or_path={MODEL_NAME} \
  --instance_data_dir={INSTANCE_DIR} \
  --class_data_dir={CLASS_DIR} \
  --output_dir={MODEL_OUTPUT} \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="{INSTANCE_PROMPT}"\
  --class_prompt="{CLASS_PROMPT}" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800 \
  --gradient_checkpointing \
  --mixed_precision fp16

Once you have trained a model using the above command, you can run inference simply using the `StableDiffusionPipeline`. Make sure to include the `identifier` (e.g. `sks` in above example) in your prompt.

In [None]:
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(MODEL_OUTPUT, torch_dtype=torch.float16, use_safetensors=True).to("cuda")



In [None]:
## Using the tuned prompt "sks dog" riding a bicycle

image = pipeline("A photo of sks dog riding bycycle", num_inference_steps=200, guidance_scale=7.5).images[0]
image.save("/scratch2/dog-bike.png")
image.show()

In [None]:
# different from without the prompt sks dog

image = pipeline("A photo of a dog riding bycycle", num_inference_steps=200, guidance_scale=7.5).images[0]
image.show()

In [None]:
image = pipeline("A photo of sks dog on an airplane", num_inference_steps=200, guidance_scale=7.5).images[0]
image.show()

### Your tasks

In [None]:
#### Your Task ####
# Use your favourite character to build a DreamBooth model, and generate the character in at least three different scenes.  
# Note that if the model generates obviously wrong / non-sense images, you can leave them their, just to entertain the TAs!