# Experiment: DDPM for Dose Prediction - Training and Optimization

**Date:** 2026-01-20  
**Experiment IDs:** `ddpm_dose_v1`, `phase1_sampling`, `phase1_ensemble`  
**Status:** Complete  
**Git Commit:** `3efbea0` (DDPM training)  

---

## 1. Overview

### 1.1 Objective

Evaluate whether Denoising Diffusion Probabilistic Models (DDPM) can improve dose prediction accuracy compared to a simple baseline U-Net.

**Hypothesis:** The iterative denoising process of diffusion models will capture complex dose distributions better than direct regression.

### 1.2 Key Results

| Metric | DDPM (training) | DDPM (optimized) | Baseline |
|--------|-----------------|------------------|----------|
| **Val MAE** | 12.19 Gy | **3.78 Gy** | 3.73 Gy |
| Optimal steps | - | 50 | N/A |
| Training time | 1.94 hours | - | 2.55 hours |

### 1.3 Conclusion

**DDPM is NOT recommended for dose prediction.** Key findings:
1. **"More steps = worse"** - Counter-intuitive result indicating structural issue
2. **No benefit over baseline** - Optimized DDPM (3.78 Gy) matches baseline (3.73 Gy)
3. **Near-zero sample variability** - Model is deterministic, not generative
4. **High complexity, no payoff** - 1000 timesteps, iterative sampling for equivalent results

**Recommendation:** Use baseline U-Net with gradient loss instead.

---

## 2. Reproducibility Information

In [None]:
REPRODUCIBILITY_INFO = {
    'git_commit_training': '3efbea0',
    'git_commit_optimization': '206f84c',
    'python_version': '3.12.12',
    'pytorch_version': '2.6.0+cu124',
    'cuda_version': '12.4',
    'gpu': 'NVIDIA GeForce RTX 3090 (24 GB)',
    'random_seed': 42,
    'experiment_date': '2026-01-19 to 2026-01-20',
    'training_script': 'scripts/train_dose_ddpm_v2.py',
    'figure_script': 'scripts/generate_ddpm_figures.py',
}

print('Reproducibility Information:')
for k, v in REPRODUCIBILITY_INFO.items():
    print(f'  {k}: {v}')

### Command to Reproduce

```bash
# Checkout correct commit
git checkout 3efbea0

# Activate environment (Windows)
call C:\pinokio\bin\miniconda\Scripts\activate.bat vmat-win
cd C:\Users\Bill\vmat-diffusion-project

# Train DDPM
python scripts\train_dose_ddpm_v2.py \
    --data_dir I:\processed_npz \
    --epochs 200 \
    --batch_size 1

# Generate figures
python scripts\generate_ddpm_figures.py
```

---

## 3. Dataset

In [None]:
DATASET_INFO = {
    'total_cases': 23,
    'train_cases': 19,
    'val_cases': 2,
    'test_cases': 2,
    'val_case_ids': ['case_0011', 'case_0016'],
    'test_case_ids': ['case_0007', 'case_0021'],
    'preprocessing_version': 'v2.2.0',
    'data_location': 'I:\\processed_npz',
    'split_seed': 42,
}

print('Dataset Information:')
for k, v in DATASET_INFO.items():
    print(f'  {k}: {v}')

---

## 4. Model / Method

### 4.1 DDPM Architecture

In [None]:
MODEL_CONFIG = {
    'architecture': 'SimpleUNet3D (DDPM)',
    'parameters': '23,705,857 (23.7M)',
    'in_channels': 10,  # 9 anatomy + 1 noisy dose
    'out_channels': 1,
    'base_channels': 48,
    'timesteps': 1000,
    'noise_schedule': 'cosine',
    'sampling_method': 'DDIM',
}

print('Model Configuration:')
for k, v in MODEL_CONFIG.items():
    print(f'  {k}: {v}')

### 4.2 DDPM Approach

The DDPM approach:
1. **Forward process:** Gradually add noise to ground truth dose over T=1000 timesteps
2. **Training:** Learn to predict the noise added at each timestep
3. **Inference:** Start from pure noise, iteratively denoise to recover dose

**Key difference from baseline:** DDPM predicts noise, not dose directly. The dose is recovered by iterative denoising.

---

## 5. Training Configuration

In [None]:
TRAINING_CONFIG = {
    'max_epochs': 200,
    'actual_epochs': 37,
    'early_stopping_patience': 20,
    'batch_size': 1,
    'patch_size': 128,
    'learning_rate': 1e-4,
    'optimizer': 'AdamW',
    'weight_decay': 0.01,
    'loss_function': 'MSE (noise prediction)',
    'precision': '16-mixed',
    'gradient_clip': 1.0,
    'training_time': '1.94 hours',
}

print('Training Configuration:')
for k, v in TRAINING_CONFIG.items():
    print(f'  {k}: {v}')

---

## 6. Results

### 6.1 Training Curves - Volatile MAE

![DDPM Training Curves](../runs/vmat_dose_ddpm/figures/fig1_ddpm_training_curves.png)

**Figure 1:** (A) Training loss decreases steadily as expected. (B) Validation MAE is **extremely volatile** (range: 12-64 Gy), indicating the noise prediction loss doesn't correlate well with dose accuracy.

**Key observation:** The model learns denoising well (loss decreases) but produces unstable dose predictions during training validation.

### 6.2 Phase 1: Sampling Steps Ablation

![Sampling Steps Ablation](../runs/vmat_dose_ddpm/figures/fig2_sampling_steps_ablation.png)

**Figure 2:** (A) **Critical finding: More steps = worse MAE.** 50 DDIM steps achieves 3.80 Gy; 1000 steps gives 6.73 Gy (77% worse!). (B) Inference time scales linearly with steps.

| Steps | MAE (Gy) | Time (min) | vs Baseline |
|-------|----------|------------|-------------|
| 50 | **3.80** | 6.6 | +1.9% |
| 100 | 4.89 | 13.5 | +31% |
| 250 | 5.24 | 32.8 | +41% |
| 500 | 5.93 | 65.3 | +59% |
| 1000 | 6.73 | 130.5 | +80% |

**Interpretation:** The model denoises away the dose signal with more steps. This indicates a fundamental structural issue - the model isn't learning to generate dose distributions, it's learning to denoise noise.

### 6.3 Phase 1: Ensemble Averaging

![Ensemble Averaging](../runs/vmat_dose_ddpm/figures/fig3_ensemble_averaging.png)

**Figure 3:** (A) Ensemble averaging provides no benefit (n=1 is optimal). (B) Sample variability is near-zero (~0.02), indicating the model is essentially deterministic despite being a "generative" model.

| Ensemble Size | MAE (Gy) | Sample Std |
|---------------|----------|------------|
| 1 | **3.775** | 0.00 |
| 3 | 3.835 | 0.02 |
| 5 | 3.830 | 0.02 |
| 10 | 3.835 | 0.03 |

**Interpretation:** If diffusion models excel at multi-modal generation, we'd expect sample diversity. The near-zero variability confirms dose prediction is deterministic (one correct answer per patient), making diffusion's generative capability irrelevant.

### 6.4 Key Finding Summary

![Key Finding](../runs/vmat_dose_ddpm/figures/fig5_key_finding.png)

**Figure 5:** Summary visualization showing DDPM consistently underperforms or matches baseline. Even the optimal DDPM configuration (50 steps, n=1) only matches the baseline, while adding significant complexity.

### 6.5 Model Comparison

![Model Comparison](../runs/vmat_dose_ddpm/figures/fig4_model_comparison.png)

**Figure 4:** Final comparison of all models tested. Gradient loss achieves the best results (3.67 Gy) while DDPM provides no benefit over baseline.

### 6.6 Quantitative Results Summary

In [None]:
RESULTS = {
    'ddpm_training_best_mae_gy': 12.19,
    'ddpm_training_best_epoch': 15,
    'ddpm_optimized_mae_gy': 3.78,
    'ddpm_optimal_steps': 50,
    'ddpm_optimal_ensemble': 1,
    'baseline_mae_gy': 3.73,
    'gradient_loss_mae_gy': 3.67,
    'ddpm_training_time_hours': 1.94,
    'ddpm_inference_time_50_steps_min': 6.6,
}

print('Quantitative Results:')
for k, v in RESULTS.items():
    print(f'  {k}: {v}')

---

## 7. Analysis

### 7.1 Why DDPM Failed for Dose Prediction

| Red Flag | Implication |
|----------|-------------|
| More steps = worse results | Model denoises away dose signal |
| Near-zero sample variability | Model is deterministic, not generative |
| DDPM = Baseline accuracy | Added complexity provides no benefit |
| 50/1000 steps optimal | Essentially one-shot prediction |

### 7.2 Fundamental Mismatch

**Diffusion models excel at multi-modal generation** (many valid outputs for one input):
- Image generation: Many valid images for "a cat"
- Audio synthesis: Many valid waveforms for "hello"

**Dose prediction is deterministic** (one correct answer per patient):
- Given patient anatomy and prescription, there's ONE optimal dose distribution
- No benefit from sampling multiple outputs
- The "generative" capability is wasted

### 7.3 Limitations

1. **Small dataset:** 23 cases may not be enough for diffusion models to learn
2. **Architecture:** SimpleUNet3D may not be optimal for diffusion
3. **Hyperparameters:** Only tested cosine schedule, DDIM sampling

However, the "more steps = worse" finding suggests a fundamental issue that more data or tuning won't fix.

---

## 8. Conclusions

### Key Takeaways

1. **DDPM is NOT recommended** for dose prediction - no benefit over baseline
2. **"More steps = worse"** indicates structural issue, not just tuning problem
3. **Dose prediction is deterministic** - diffusion's generative strength is irrelevant
4. **Use gradient loss instead** - achieves 3.67 Gy MAE and 27.9% Gamma (best so far)

### Strategic Recommendation

Abandon DDPM approach. Focus on:
1. **Gradient loss baseline** (current best)
2. **VGG perceptual loss** (Phase B)
3. **Flow Matching** if generative approach desired (simpler than diffusion)

---

## 9. Next Steps

- [x] Complete DDPM training and optimization
- [x] Document findings and create figures
- [x] Complete gradient loss experiment (see separate notebook)
- [ ] **Phase B:** Gradient + VGG combined loss
- [ ] Consider Flow Matching as alternative generative approach
- [ ] Re-evaluate with 100+ cases when available

---

## 10. Artifacts

| Artifact | Path |
|----------|------|
| DDPM Checkpoint | `runs/vmat_dose_ddpm/checkpoints/best-epoch=015-val/mae_gy=12.19.ckpt` |
| Training Metrics | `runs/vmat_dose_ddpm/epoch_metrics.csv` |
| Training Config | `runs/vmat_dose_ddpm/training_config.json` |
| Sampling Results | `experiments/phase1_sampling/exp1_1_sampling_results.json` |
| Ensemble Results | `experiments/phase1_ensemble/exp1_2_ensemble_results.json` |
| Figures (PNG) | `runs/vmat_dose_ddpm/figures/*.png` |
| Figures (PDF) | `runs/vmat_dose_ddpm/figures/*.pdf` |
| Figure Script | `scripts/generate_ddpm_figures.py` |

---

*Notebook created: 2026-01-20*  
*Last updated: 2026-01-20*  
*Git commit (DDPM training): `3efbea0`*  
*Git commit (optimization): `206f84c`*