# Diffusion models overview

This gives a code overview of: 
- Diffusion probabilistic models



### Table of contents
1. [Generating synthetic data for regression tasks](#synthetic)
2. [Creating gradient descent optimizer algorithms (INNOVATORS)](#optimizer)
3. [Performing the gradient descent training loop (BENCHMARKERS)](#train)
4. [Verifying gradient descent computation (BLOCKCHAIN)](#verify)

In [1]:
%load_ext autoreload
%autoreload 2

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# attempt to request a GPU if available
dev = torch.device('cpu')#src.device.get_device(gpu=0)

ModuleNotFoundError: No module named 'torchvision'

<a id="synthetic"></a>
## 1. Generating synthetic data for regression tasks

In regression tasks that we consider, one wants to find the scalar function $f(\cdot)$ that describes noisy observation values $y_i$ at locations $\mathbf{x}_i$ 
$$
y_i = f(\mathbf{x}_i) + \epsilon_i
$$
where $\epsilon_i$ is some noise term, e.g. unit normal. 
Note that input locations can be multidimensional $\mathbf{x} \in \mathbb{R}^{D}$, and in particular neural networks are empirically known to excel at modeling such functions for large $D$, such as in [classifying images](https://wandb.ai/wandb_fc/wb-tutorials/reports/Tutorial-Text-Classification-Using-CNNs--Vmlldzo0NTIxNDI5) (where $D$ is the number of pixels) or compressing high dimensional data (e.g. [autoencoders](https://www.tensorflow.org/tutorials/generative/autoencoder)). 
In addition, neural networks can flexibly approximate complicated and non-smooth functions $f(\cdot)$ that often arise in real-world applications, and this is commonly formalized through the [universal approximation theorem](https://www.deep-mind.org/2023/03/26/the-universal-approximation-theorem/). 

In this notebook, we consider a simple regression task with i.i.d. Gaussian noise $\epsilon$ and $D = 2$ which we can visualize.

In [None]:
def forward_diffusion(x_0, timesteps, beta_start=1e-4, beta_end=0.02):
    """
    Forward diffusion: Adds noise to the data at each timestep.

    Args:
    - x_0: Original data (batch of images)
    - timesteps: Number of diffusion steps
    - beta_start, beta_end: Controls noise schedule

    Returns:
    - Noisy images at each timestep
    """
    device = x_0.device
    betas = torch.linspace(beta_start, beta_end, timesteps).to(device)  # Noise schedule
    alphas = 1.0 - betas
    alphas_cumprod = torch.cumprod(alphas, dim=0)  # Cumulative product of alphas

    noise = torch.randn_like(x_0).to(device)
    timesteps = torch.randint(0, timesteps, (x_0.shape[0],), device=device)  # Random timestep for each image
    sqrt_alpha_cumprod = torch.sqrt(alphas_cumprod[timesteps])[:, None, None, None]
    sqrt_one_minus_alpha_cumprod = torch.sqrt(1 - alphas_cumprod[timesteps])[:, None, None, None]
    
    x_t = sqrt_alpha_cumprod * x_0 + sqrt_one_minus_alpha_cumprod * noise  # Noisy sample
    return x_t, timesteps, noise


In [None]:
class SimpleDenoiser(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)  # Input 1-channel (grayscale)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 32, 3, padding=1)
        self.conv4 = nn.Conv2d(32, 1, 3, padding=1)

        self.relu = nn.ReLU()
    
    def forward(self, x, t):
        """
        Forward pass through denoising network.
        x: Noisy image
        t: Timestep information (not used here but could be added via embeddings)
        """
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.conv4(x)  # Predict noise
        return x


In [None]:
# Hyperparameters
epochs = 10
timesteps = 100
lr = 1e-3
batch_size = 128

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Model, loss function, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleDenoiser().to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)
criterion = nn.MSELoss()

# Training loop
for epoch in range(epochs):
    for images, _ in train_loader:
        images = images.to(device)
        x_t, t, noise = forward_diffusion(images, timesteps)
        
        optimizer.zero_grad()
        noise_pred = model(x_t, t)
        loss = criterion(noise_pred, noise)  # MSE loss to denoise the image
        loss.backward()
        optimizer.step()
    
    print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")

print("Training complete!")
