# KAIST 산학 교육 : Stable diffusion

![](https://github.com/EilieYoun/box/blob/main/images/240214_sd_intro.png?raw=true)

**참고 사이트**

- [허깅페이스 Diffusers](https://huggingface.co/docs/diffusers/index)
- [Diffusers 라이브러리](https://github.com/huggingface/diffusers)
- [High-Resolution Image Synthesis with Latent Diffusion Models
](https://arxiv.org/abs/2112.10752)

## 0. Setting
---

In [None]:
!pip install --upgrade diffusers[torch]

Collecting diffusers[torch]
  Downloading diffusers-0.26.3-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate>=0.11.0 (from diffusers[torch])
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: diffusers, accelerate
Successfully installed accelerate-0.27.2 diffusers-0.26.3


In [None]:
from diffusers.utils import make_image_grid, load_image
import torch
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from pprint import pprint as pp

def noise2image(noise):
    image = noise.permute(0,2,3,1).cpu().numpy()[0].copy()
    image =  np.clip( (image / 2 + 0.5) , 0, 1)
    image = Image.fromarray((image * 255).round().astype("uint8"))
    return image


## 1. DDPM
----

![](https://github.com/EilieYoun/box/blob/main/images/240214_ddpm.png?raw=true)

* `google/ddpm-bedroom-256` : https://huggingface.co/google/ddpm-bedroom-256

In [None]:
from diffusers import DDPMPipeline
# ddpm pipe 인스턴스 불러오기

### **| Module**


#### **unet**


* 구성 확인

In [None]:
# unet 모듈
print(unet)

In [None]:
sample_size =  # unet의 인풋 size
channels = # unt의 인풋 channels
image_shape = (1, channels, sample_size, sample_size) # image shape 정의
print('image_shape: ', image_shape)

* 함수 작동

In [None]:
x_t =  # x_t 정의
t =  # t 정의

with torch.no_grad(): # 가중치 계산 비활성화
    # unet 작동

print('t : ', t)
print('x_t : ', x_t.shape)
print('noisy residual : ', noisy_residual.shape)

#### **scheduler**

* 구성 확인

In [None]:
scheduler =  # noise scheduler 모듈
print(scheduler)

In [None]:
# noise scheduler timesteps 설정
print('timesteps : ', len(scheduler.timesteps), scheduler.timesteps) # timesteps 확인

* 함수 작동

In [None]:
x_t1 =  # noise scheduler 작동

print('t : ', t)
print('x_t : ', x_t.shape)
print('noisy residual : ', noisy_residual.shape)
print('x_(t-1) : ', x_t1.shape)

* noise 확인

In [None]:
xt_image =  # x_t 이미지 변경
xt1_image =  # x_t1 이미지 변경
make_image_grid([xt_image, xt1_image], rows=1, cols=2)

### **| Inference**



#### **make loop code**


In [None]:
x_list = [] # 이미지 담을 리스트 생성

# 초기 x_t 설정
# timesteps 만큼 반복하기
with torch.no_grad(): # 가중치 계산 비활성화
    noisy_residual = unet(x_t, t).sample # unet 작동
    x_t1  = scheduler.step(noisy_residual, t, x_t).prev_sample # noise scheduler 작동

# x_t 업데이트
x_image = noise2image(x_t) # image 변경
x_list.append(x_image) # 이미지 리스트에 담기

In [None]:
make_image_grid(x_list, rows=2, cols=10)

#### **Use pipeline**


- [`class diffusers.DDPMPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm/pipeline_ddpm.py)


- [Parameters](https://huggingface.co/docs/diffusers/v0.26.1/en/api/pipelines/ddpm#diffusers.DDPMPipeline)


In [None]:
imgs = ddpm(num_inference_steps=20, batch_size=4)[0]
make_image_grid(imgs, rows=1, cols=len(imgs))

## 2. Stable Diffusion : text-to-image
---

![](https://github.com/EilieYoun/box/blob/main/images/240214_sd_text2img.png?raw=true)

- `runwayml/stable-diffusion-v1-5`
- `dreamlike-art/dreamlike-photoreal-2.0`
- `stabilityai/stable-diffusion-xl-base-1.0`
- `stabilityai/sdxl-turbo`

In [None]:
from diffusers import AutoPipelineForText2Image

sd = AutoPipelineForText2Image.from_pretrained('', torch_dtype=torch.float16).to('cuda')

### **| Module**


#### **tokenizer**

In [None]:
# tokenizer encoer 확인

In [None]:
prompt = '' # 프롬프트 작성

# tokenizer 작동
token = sd.tokenizer(prompt,
                     padding="max_length",
                     max_length=sd.tokenizer.model_max_length,
                     truncation=True,
                     return_tensors="pt",
                    )

print('prompt: ' , prompt)
print('token: ', token)

#### **text_encoder**

In [None]:
# text encoer 확인

In [None]:
# text encoder 작동
prompt_embed = sd.text_encoder(token.input_ids.to('cuda'), attention_mask = token.attention_mask.to('cuda'), output_hidden_states=True)[0]
print(prompt_embed.shape, prompt_embed.dtype)

#### **unet**

#### **unet**


* 구성 확인

In [None]:
# unet 모듈

In [None]:
sample_size =  # unet의 인풋 size
channels =  # unt의 인풋 channels
latent_shape = (1, channels, sample_size, sample_size) # latent shape 정의
print('latent_shape: ', latent_shape)

* 함수 작동

In [None]:
x_t = torch.randn(latent_shape, device='cuda', dtype=torch.float16) # x_t 정의
t = 900 # t 정의

with torch.no_grad(): # 가중치 계산 비활성화
    noisy_residual =  # unet 작동

print('t : ', t)
print('x_t : ', x_t.shape)
print('prmpte_embed: ', prompt_embed.shape)
print('noisy residual : ', noisy_residual.shape)

#### **scheduler**

* 구성 확인

In [None]:
# noise scheduler 모듈
print(scheduler)

* 함수 작동

In [None]:
x_t1 =  # noise scheduler 작동

print('t : ', t)
print('x_t : ', x_t.shape)
print('noisy residual : ', noisy_residual.shape)
print('x_(t-1) : ', x_t1.shape)

#### **vae**

In [None]:
latent = x_t1 / sd.vae.config.scaling_factor # latent 조정
with torch.no_grad(): # 가중치 계산 비활성화
    image = sd.vae.decode(, return_dict=False)[0] # vae decoder 작동

print('latent: ', latent.shape)
print('image: ', image.shape)

image = noise2image(image) # image 변환
_=plt.imshow(image) # image 확인

### **| Inference**



#### **Use pipeline**


- [`class diffusers.StableDiffusionPipieline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py)

- [Parameters](https://huggingface.co/docs/diffusers/v0.26.1/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline)
  * `prompt (str or List[str], optional)` — The prompt or prompts to guide image generation. If not defined, you need to pass prompt_embeds.

  * `height (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor)` — The height in pixels of the generated image.

  * `width (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor)` — The width in pixels of the generated image.

  * `num_inference_steps (int, optional, defaults to 50)` — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  * `timesteps (List[int], optional)` — Custom timesteps to use for the denoising process with schedulers which support a timesteps argument in their set_timesteps method. If not defined, the default behavior when
 num_inference_steps is passed will be used. Must be in descending order.

  * `guidance_scale (float, optional, defaults to 7.5)` — A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.

  * `negative_prompt (str or List[str], optional)` — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass negative_prompt_embeds instead. Ignored when not using guidance (guidance_scale < 1).

  * `num_images_per_prompt (int, optional, defaults to 1)` — The number of images to generate per prompt.
```

In [None]:
prompt1 = "An astronaut in the jungle, cold color palette, muted colors, detailed, 8k, masterpiece, wonderful artistic"
prompt2 = "An astronaut riding a horse on marse, cold color palette, muted colors, detailed, 8k, masterpiece, wonderful artistic"

num_inference_steps = 25
num_images_per_prompt = 4

imgs = sd([prompt1, prompt2],
        num_inference_steps=num_inference_steps,
        num_images_per_prompt=num_images_per_prompt,
        generator = torch.Generator(device="cuda").manual_seed(4),
        )[0]

make_image_grid(imgs, rows=2, cols=num_images_per_prompt)

## Stable Diffusion: Beyond the "Text-to-Image""
---

![](https://github.com/EilieYoun/box/blob/main/images/240214_sd_beyond.png?raw=true)

- controlnet : https://huggingface.co/docs/diffusers/api/pipelines/controlnet
- image-to-image : https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img
- inpainting : https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint
- stable diffusion XL : https://huggingface.co/docs/diffusers/using-diffusers/sdxl
- SDXL Turbo : https://huggingface.co/docs/diffusers/using-diffusers/sdxl_turbo
- LoRA : https://huggingface.co/docs/diffusers/training/lora