# HyperTile Playground

I'll presume you have some familiarity with the `diffusers` library.

I encourage you to experiment with both the `text2img` and `img2img` variations. Keep in mind that due to the absence of an HD-LoRA model for `diffusers`, the `text2img` results may exhibit suboptimal structures. However, it's essential to recognize that these limitations stem from the initial training of SD at 512x512 resolution, and improvements are anticipated in the future.

If you find yourself needing further information, don't hesitate to consult the comprehensive documentation provided by `diffusers`; it offers valuable insights and guidance.

## Introduction

Initialize the packages we need.

In [None]:
from PIL import Image
from tqdm.auto import trange
import logging

import torch

from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline, StableDiffusionXLPipeline
from diffusers.schedulers import UniPCMultistepScheduler

from diffusers.utils import load_image

from hyper_tile import split_attention, flush

# To log attention-splitting
logging.basicConfig(level=logging.INFO)

Select the path to the model you want (*safetensors*).

In [None]:
dtype = torch.float16 # bfloat16 can result of out-of-memory with big images due to interpolation limits, well-document in diffusers library
device = torch.device('cuda')

model_path = r"path-to-model.safetensors"

## Text-to-Image

In [None]:
pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_single_file(model_path, torch_dtype=dtype, local_files_only=True, use_safetensors=True, load_safety_checker=False) # type: ignore
pipe.to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

1. Choose your desired `height` and `width`.

2. You have the flexibility to adjust the `tile_size` independently for the VAE and UNet components. For the VAE, a `tile_size` of 128 is optimal without sacrificing performance. However, for the UNet, it's advisable to use a chunk size of 256 or greater. `swap_size` determine how many different tiles per dimension are used, to avoid overlap seams in some cases.

3. Modify the `disable` parameter to either True or False to observe the results with or without HyperTile.

**Note**: For improved chunk division, consider using dimensions that are multiples of 128. This can enhance the effectiveness of the chunking process. (This is enforced)

In [None]:
# Try lower value if you dont have 16 Gb VRAM

height, width = 2688, 1792

height = int(height)//128*128 # enforcing multiples of 128
width = int(width)//128*128
print(height, width)

with split_attention(pipe.vae, height, width, tile_size=128):
    # ! Change the tile_size and disable to see their effects
    with split_attention(pipe.unet, height, width, tile_size=128, swap_size=2, disable=False):
        flush()
        img = pipe(
            # ! Change the prompt and other parameters
            prompt='forest, path, stone, red trees, detailed, buildings', 
            negative_prompt='blurry, low detail',
            num_inference_steps=26, guidance_scale=7.5, 
            height=height, width=width,
        ).images[0]
img

## Image-to-Image

In [None]:
pipe: StableDiffusionImg2ImgPipeline  = StableDiffusionImg2ImgPipeline.from_single_file(model_path, torch_dtype=dtype, local_files_only=True, use_safetensors=True, load_safety_checker=False) # type: ignore
pipe.to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

Load the image that you want.

In [None]:
image = load_image("image.png")
ar = image.height / image.width

1. Define the target `height` and the number of loopbacks required, indicating how many times we perform image-to-image operations. This parameter is essential as we employ simple Lanczos upscaling.

2. Adjust the `strength` and `loopback` settings to achieve the optimal outcome. You can experiment with lower strength values paired with more loopbacks or larger strength values with fewer loopbacks.

3. Customize the `tile_size` separately for the VAE and UNet components. A `tile_size` of 128 is recommended for the VAE without compromising quality. For the UNet, it's advisable to use a `tile_size` size of 256 or greater. `swap_size` determine how many different tiles per dimension are used, to avoid overlap seams in some cases.

4. Toggle the `disable` parameter between True and False to observe the results with or without the use of HyperTile.

**Note**: For improved chunk division, consider using dimensions that are multiples of 128. This practice enhances the efficiency of the chunking process.

**Note**: The inclusion of loopbacks is essential due to the original training of Stable-Diffusion (SD) on 512x512 images. When we upscale these images 3-4 times or more, the use of Lanczos upscaling introduces blurriness. Loopbacks play a crucial role in mitigating this issue, effectively restoring image clarity and preserving details during the upscaling process.

In [None]:
loopback = 2 # Use 1 or 2, depending on how much upscaling we are doing
# Try lower value if you dont have 16 Gb VRAM
height = 512*3

width = height/ar
height = int(height)//128*128 # enforcing multiples of 128
width = int(width)//128*128
print(height, width)

# Upscale to the correct resolution
img = image.resize((width, height), resample=Image.LANCZOS) if image.size != (width, height) else image

with split_attention(pipe.vae, height, width, tile_size=128):
    # ! Change the chunk and disable to see their effects
    with split_attention(pipe.unet, height, width, tile_size=256, swap_size=2, disable=False):
        flush()
        for i in trange(loopback):
            img = pipe(
                prompt='forest, path, stone, red trees, detailed', 
                negative_prompt='blurry, low detail',
                num_inference_steps=28, guidance_scale=7.5, 
                image=img, strength=0.46, # ! you can also change the strength
            ).images[0]
img