# Universal Adversarial Perturbation (UAP) Training

This notebook implements the training loop for Universal Adversarial Perturbations (UAPs). The goal is to find a single perturbation vector $v$ that, when added to any input audio $x$, reduces the model's accuracy.

### Strategy: Gradient Accumulation
We iterate through the training set. For each sample $x_i$, we compute the gradient of the loss $\mathcal{L}$ w.r.t the input $x_i$ ($\nabla_{x_i} \mathcal{L}$). We then project these gradients onto the current global perturbation vector $v$. This is based on the 'accumulated gradient' approach used in many UAP papers.

### Key Constraints
- **Input Length**: Whisper requires 16kHz audio. We will define a fixed UAP length (e.g., 5 seconds or 10 seconds).
- **Clipping**: We must clip $v$ to ensure it stays within $[-1, 1]$ (or the model's input range).
- **Gradient Flow**: Ensure $x$ has `requires_grad=True`.

In [None]:
import torch
import torch.nn as nn
import torchaudio
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from transformers import WhisperModel

# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

In [None]:
from src.data.audio_loader import AudioLoader

# Initialize Loader
# We use the same loader setup as previous notebooks
loader = AudioLoader(
    sample_rate=16000,
    n_mels=80,
    max_duration=10.0, # Fixed duration for UAP training
    normalize=True
)

# Load a small subset for UAP training (e.g., 20 samples)
# In a real scenario, you'd load a proper training set
print("Loading LibriSpeech data...")
dataset = loader.load_dataset(subset="train-clean-100", num_samples=20)
print(f"Loaded {len(dataset)} samples.")

In [None]:
from src.models.whisper_wrapper import WhisperASRWithAttack

# Initialize the model using our wrapper that supports gradient flow
model = WhisperASRWithAttack(device=device)
print("Model loaded with differentiable Mel-Spectrogram layer.")

In [None]:
def train_uap(
    model, 
    dataset, 
    uap_length_sec=5.0, 
    epsilon=0.05,
    lr=0.01,
    epochs=10,
    device=device
):
    """
    Training loop for Universal Perturbation.
    """
    
    # 1. Initialize Global Perturbation v
    uap_length = int(uap_length_sec * 16000)
    print(f"UAP Length: {uap_length} samples ({uap_length_sec}s)")
    
    # v is a tensor of shape (1, uap_length) initialized with zeros
    # We repeat it to match dataset sample lengths if needed later
    v = torch.zeros(1, uap_length, device=device)
    
    # Optimizer for v
    optimizer = torch.optim.SGD([v], lr=lr)
    
    # Loss function to minimize (we want to minimize the *negative* of the attack success? No, we maximize model loss)
    # In this gradient accumulation method, we treat the accumulation of gradients as the update direction.
    
    for epoch in range(epochs):
        print(f"\n--- Epoch {epoch + 1}/{epochs} ---")
        
        grad_accum = torch.zeros_like(v)
        
        for audio, label in tqdm(dataset, desc="Processing samples"):
            optimizer.zero_grad()
            
            # 2. Construct Adversarial Input
            # Pad/Truncate audio to match UAP length
            if audio.size(0) < uap_length:
                pad_len = uap_length - audio.size(0)
                audio_padded = torch.nn.functional.pad(audio, (0, pad_len))
            else:
                audio_padded = audio[:, :uap_length]
            
            # Apply Perturbation: x_adv = x + v
            # We assume v is normalized or we handle it in clipping.
            # Ensure x is on device
            audio_input = audio_padded.to(device)
            
            # Forward pass
            _, loss, input_tensor = model(audio_input)
            
            # Backward pass
            loss.backward()
            
            # 3. Gradient Accumulation Strategy
            # We want to update v to be in the direction of the gradients of the loss w.r.t input.
            # Note: The gradient of the loss w.r.t input is the 'attack' signal.
            # In the standard UAP paper (Nguyen et al.), they update v to align with these gradients.
            
            # Accumulate gradient w.r.t the perturbation itself
            # Grad w.r.t input (input_tensor.grad) * 1 (since perturbation is 1x1)
            grad_accum += input_tensor.grad.detach()
            
        # 4. Projection & Update
        optimizer.zero_grad()
        
        # Update v: v = v - lr * normalize(grad_accum)
        # This moves v towards the directions that increase loss
        with torch.no_grad():
            grad_accum = grad_accum / (torch.norm(grad_accum) + 1e-8)
            v = v - lr * grad_accum
            
            # Clip to [-1, 1] (and also epsilon constraint if strictly needed)
            v = torch.clamp(v, -epsilon, epsilon)
            
        print(f"Updated v magnitude: {torch.norm(v)}")
        
    return v

# Run Training
uap_vector = train_uap(
    model=model,
    dataset=dataset,
    uap_length_sec=5.0,
    epsilon=0.1,
    lr=0.1,
    epochs=5
)

In [None]:
# Save the trained perturbation
torch.save(uap_vector, 'universal_perturbation_v.pt')
print("Universal Perturbation saved.")

### Visualize Perturbation
Plot the generated UAP to inspect its nature (Gaussian-like or structured).

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
plt.plot(uap_vector[0].cpu().numpy())
plt.title("Universal Adversarial Perturbation (v)")
plt.xlabel("Time (Samples)")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()

# Save figure
plt.savefig('uap_visualization.png')
print("Visualization saved to uap_visualization.png")