Diffusion model based on Denoising Diffusion Probabilistic Model (DDPM)

Generative model that uses a process of progressively adding noise to an image, then learning to reverse this process to generate new, high-quality images.

unet (UNet2DModel) — Deep Learning Architecture consisting of encoders and decoders.

scheduler (SchedulerMixin) — Controls amount of noise added to image. To be used in combination with unet to denoise the encoded image. Can be one of DDPMScheduler, or DDIMScheduler.



In [1]:
#pip install diffusers transformers torch datasets torchvision Pillow

Install the following libraries:

pip install diffusers transformers torch datasets torchvision Pillow

In [None]:
import torch
from diffusers import UNet2DModel, DDPMScheduler
from torch.utils.data import DataLoader
from torchvision import transforms
from tqdm import tqdm
from PIL import Image
import os

# Resize and normalise input images
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])

# Load Images
class CustomImageDataset(torch.utils.data.Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.transform = transform
        self.image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith(".webp")] #change to filetype of dataset

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert("RGB")
        if self.transform:
            image = self.transform(image)
        return image
    


In [None]:
# Load your dataset
dataset = CustomImageDataset("C:/Users/BurnD/Desktop/DSA3101/Dataset/Red2", transform=transform) #change the path to where the images are stored
dataloader = DataLoader(dataset, batch_size=10, shuffle=True) 

# Initialize the UNet model
model = UNet2DModel(
    sample_size=256,         # Image resolution
    in_channels=3,          # RGB images
    out_channels=3,         # Predicting RGB noise
    layers_per_block=2,
    block_out_channels=(64, 128, 256, 512),  # Number of channels for each layer
    down_block_types=("DownBlock2D", "DownBlock2D", "DownBlock2D", "DownBlock2D"),
    up_block_types=("UpBlock2D", "UpBlock2D", "UpBlock2D", "UpBlock2D"),
)

# Define the noise scheduler
scheduler = DDPMScheduler(num_train_timesteps=500, beta_start=0.00005, beta_end=0.01, beta_schedule = 'scaled_linear')

# Set up the optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Dataloader Parameters:

dataset: Dataset containing the images to be used

shuffle: Determines whether the data should be shuffled randomly at each epoch. Setting shuffle=True means that the order of samples will be randomized for each new epoch, which can help improve model generalization by reducing the chance of learning patterns that depend on the order of the data.

batch_size: Specifies the number of samples in each batch. Using batches (rather than individual samples) speeds up training by allowing simultaneous processing of multiple samples.

DDPM Parameters:

num_train_timesteps: number of diffusion steps

beta_start: starting beta value of inference (amount of noise added at the first time step)

beta_end: final beta value (amount of noise added at the last timestep)

beta_schedule: how the noise should scale over the timesteps 

Optimiser:

AdamW: Variant of Adam Optimiser with weight decay for better regularisation

lr: Learning Rate, in this case, learning rate = 0.0001

U-Net Model:

sample_size: Image Resolution

in_channels: number of input channels (3 channels = RGB)

out_channels: number of output channels

layers_per_block: number of convolutional layers in each block

block_out_channel: output channels for each block in the encoder and decoder. 64 -> 128: more features captured as spatial size decreases

down_block_types: blocks used in encoder (reduce spatial dimension, learning patterns and features)

up_block_types: blocks used in decoder (reconstruction from compressed image)



In [None]:
epochs = 10  # Define number of epochs

for epoch in range(epochs):
    model.train()
    for step, images in enumerate(tqdm(dataloader)):
        images = images.to(device)

        # Sample random noise and timesteps for each image
        noise = torch.randn_like(images).to(device)
        timesteps = torch.randint(0, scheduler.num_train_timesteps, (images.size(0),), device=device).long()

        # Add noise to images based on the timesteps (i.e., create noisy images)
        noisy_images = scheduler.add_noise(images, noise, timesteps)

        # Forward pass: Predict the noise
        noise_pred = model(noisy_images, timesteps).sample

        # Calculate the loss (mean squared error between predicted noise and actual noise)
        loss = torch.nn.functional.mse_loss(noise_pred, noise)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if step % 10 == 0:
            print(f"Epoch [{epoch + 1}/{epochs}], Step [{step}/{len(dataloader)}], Loss: {loss.item():.4f}")

print("Training completed.")


epochs: number of times the dataset is passed through the model

tqdm: visual display of progress

optimizer.zero_grad(): clears any existing gradients

loss.backward(): computes gradient of loss wrt model parameters

optimizer.step(): updates parameters based on calculated gradient


In [15]:
model.save_pretrained("diffusion_model")

In [None]:
from diffusers import DDPMPipeline
import torch

# Load the model and scheduler
model = UNet2DModel.from_pretrained("diffusion_model")
scheduler = DDPMScheduler(num_train_timesteps=200)

# Generate images by reversing the noise process
model.to(device).eval()

num_samples = 4
with torch.no_grad():
    for i in range(num_samples):
        # Start with random noise
        image = torch.randn(1, 3, 64, 64).to(device)

        # Perform reverse diffusion to denoise step-by-step
        for t in reversed(range(scheduler.num_train_timesteps)):
            # Predict noise and update image
            noise_pred = model(image, torch.tensor([t]).to(device)).sample
            image = scheduler.step(noise_pred, t, image).prev_sample

        # Save or display the generated image
        image = (image.clamp(-1, 1) + 1) / 2  # Rescale to [0, 1]
        image = transforms.ToPILImage()(image.squeeze().cpu()) #converts the tensor to a PIL format for saving
        image.save(f"generated_image_{i}.png")


torch.randn(batch, channel, resolution, resolution)
torch.randn(1, 3, 64, 64): 1 batch with 3 colour channels of resolution 64 x 64
