# Experiment: Combined Gradient + VGG Perceptual Loss

**Date:** 2026-01-21  
**Experiment ID:** `grad_vgg_combined`  
**Status:** Complete  

---

## 1. Overview

### 1.1 Objective
Test whether combining gradient loss (3D Sobel) with VGG perceptual loss improves Gamma pass rate beyond gradient loss alone. This is **Phase B** of the loss function improvement experiments, following Phase A (gradient loss only) which nearly doubled Gamma from 14.2% to 27.9%.

### 1.2 Hypothesis
VGG perceptual loss encourages feature-level similarity between predicted and ground truth doses, which may improve dose distribution quality and Gamma pass rate beyond what gradient loss alone achieves.

### 1.3 Key Results

| Metric | Baseline | Grad Loss | **Grad+VGG** | Change vs Grad |
|--------|----------|-----------|--------------|----------------|
| **Val MAE** | 3.73 Gy | 3.67 Gy | **2.27 Gy** | **-38%** ✅ |
| **Test MAE** | 1.43 Gy | 1.44 Gy | **1.44 Gy** | 0% |
| **Gamma (3%/3mm)** | 14.2% | 27.9% | **~28%** | ~0% ❌ |

### 1.4 Conclusion

**VGG loss significantly improves validation MAE (-38%) but does NOT improve Gamma pass rate (~28%, unchanged from gradient loss alone).** This indicates that VGG perceptual loss helps with overall dose accuracy (mean error reduction) but does not improve edge sharpness as measured by Gamma. The recommendation is to skip VGG in future experiments and try adversarial loss or structure-weighted loss instead for Gamma improvement.

---

## 2. Reproducibility Information

In [None]:
# Reproducibility Information (captured at experiment time)
REPRODUCIBILITY_INFO = {
    'git_commit': 'dca8446',  # Pre-experiment commit
    'git_message': 'Add perceptual loss (gradient + VGG) to baseline U-Net',
    'python_version': '3.12.8',
    'pytorch_version': '2.6.0+cu124',
    'cuda_version': '12.4',
    'gpu': 'NVIDIA GeForce RTX 4090',
    'random_seed': 42,
    'experiment_date': '2026-01-21',
}

print('Reproducibility Information:')
for k, v in REPRODUCIBILITY_INFO.items():
    print(f'  {k}: {v}')

### Command to Reproduce

```bash
# Checkout correct commit
git checkout dca8446

# Activate environment (Windows)
call C:\pinokio\bin\miniconda\Scripts\activate.bat vmat-win

# Run experiment
python scripts\train_baseline_unet.py \
    --exp_name grad_vgg_combined \
    --data_dir I:\processed_npz \
    --use_gradient_loss \
    --gradient_loss_weight 0.1 \
    --use_vgg_loss \
    --vgg_loss_weight 0.001 \
    --epochs 100
```

---

## 3. Dataset

In [None]:
DATASET_INFO = {
    'total_cases': 23,
    'train_cases': 19,
    'val_cases': 2,
    'test_cases': 2,
    'preprocessing_version': 'v2.2.0',
    'data_directory': 'I:\\processed_npz',
    'test_cases_ids': ['case_0007', 'case_0021'],
}

print('Dataset Information:')
for k, v in DATASET_INFO.items():
    print(f'  {k}: {v}')

---

## 4. Model / Method

### 4.1 Architecture
BaselineUNet3D with FiLM conditioning on dose constraints.

### 4.2 Loss Function
Combined loss with three components:

$$L_{total} = L_{MSE} + \lambda_{grad} \cdot L_{grad} + \lambda_{VGG} \cdot L_{VGG}$$

Where:
- $L_{MSE}$: Mean Squared Error (standard pixel-wise loss)
- $L_{grad}$: 3D Sobel gradient loss (edge sharpness)
- $L_{VGG}$: VGG perceptual loss (feature-level similarity, slice-wise)
- $\lambda_{grad} = 0.1$
- $\lambda_{VGG} = 0.001$

In [None]:
MODEL_CONFIG = {
    'architecture': 'BaselineUNet3D (Direct Regression)',
    'in_channels': 9,  # CT + 8 structure SDFs
    'out_channels': 1,  # Dose
    'base_channels': 48,
    'constraint_dim': 13,  # FiLM conditioning
    'model_params': 25468289,  # ~25M parameters
}

LOSS_CONFIG = {
    'use_gradient_loss': True,
    'gradient_loss_weight': 0.1,
    'use_vgg_loss': True,
    'vgg_loss_weight': 0.001,
    'vgg_slice_stride': 8,  # Process every 8th slice for memory efficiency
}

print('Model Configuration:')
for k, v in MODEL_CONFIG.items():
    print(f'  {k}: {v}')
print('\nLoss Configuration:')
for k, v in LOSS_CONFIG.items():
    print(f'  {k}: {v}')

---

## 5. Training Configuration

In [None]:
TRAINING_CONFIG = {
    'max_epochs': 100,
    'actual_epochs': 82,  # Early stopped
    'batch_size': 1,
    'learning_rate': 1e-4,
    'weight_decay': 0.01,
    'optimizer': 'AdamW',
    'scheduler': 'ReduceLROnPlateau',
    'early_stopping_patience': 50,
    'training_time_hours': 9.74,
}

print('Training Configuration:')
for k, v in TRAINING_CONFIG.items():
    print(f'  {k}: {v}')

---

## 6. Results

### 6.1 Training Curves

![Training Curves](../runs/grad_vgg_combined/figures/fig1_training_curves.png)

**Key observations:**
- Best validation MAE: **2.27 Gy** at epoch 32 (38% improvement over baseline's 3.73 Gy)
- Training early stopped at epoch 82 due to 50-epoch patience
- Smooth convergence without significant overfitting

In [None]:
import pandas as pd

# Load training metrics
metrics = pd.read_csv('../runs/grad_vgg_combined/version_1/metrics.csv')
val_metrics = metrics[metrics['val/mae_gy'].notna()][['epoch', 'val/loss', 'val/mae_gy']]

print('Training Progress:')
print(f'  Total epochs: {int(val_metrics["epoch"].max())}')
print(f'  Best val MAE: {val_metrics["val/mae_gy"].min():.2f} Gy (epoch {int(val_metrics.loc[val_metrics["val/mae_gy"].idxmin(), "epoch"])})')
print(f'  Final val MAE: {val_metrics["val/mae_gy"].iloc[-1]:.2f} Gy')

### 6.2 Model Comparison

![Model Comparison](../runs/grad_vgg_combined/figures/fig2_model_comparison.png)

**Key observations:**
- Validation MAE: Significant improvement (2.27 vs 3.73 Gy baseline)
- Test MAE: Unchanged (1.44 Gy for all three models)
- Gamma pass rate: **Unchanged** (~28% for both Grad and Grad+VGG)

In [None]:
RESULTS = {
    'best_val_mae_gy': 2.27,
    'best_epoch': 32,
    'final_val_mae_gy': 4.43,  # After patience exhausted
    'test_mae_gy': 1.44,
    'gamma_pass_rate': 27.85,
    'training_time_hours': 9.74,
}

print('Final Results:')
for k, v in RESULTS.items():
    print(f'  {k}: {v}')

### 6.3 Dose Slice Visualization

![Dose Slices](../runs/grad_vgg_combined/figures/fig3_dose_slices.png)

### 6.4 Loss Components

![Loss Components](../runs/grad_vgg_combined/figures/fig4_loss_components.png)

**Key observations:**
- MSE loss dominates the total loss
- Gradient loss (weighted by 0.1) contributes meaningfully
- VGG loss (weighted by 0.001) has minimal contribution to total loss

### 6.5 Key Finding

![Key Finding](../runs/grad_vgg_combined/figures/fig5_key_finding.png)

**Key insight:** VGG perceptual loss improves overall dose accuracy (MAE) but does NOT improve Gamma pass rate. This suggests:
1. VGG encourages feature-level similarity but not edge sharpness
2. Gamma (3%/3mm) is sensitive to local dose gradients, not global features
3. For Gamma improvement, need losses that explicitly target dose gradients or spatial accuracy

### 6.6 Test Case Details

In [None]:
TEST_CASE_RESULTS = {
    'case_0007': {
        'mae_gy': 1.77,
        'mae_body_gy': 5.97,
        'gamma_pass_rate': 41.2,
    },
    'case_0021': {
        'mae_gy': 1.11,
        'mae_body_gy': 6.75,
        'gamma_pass_rate': 14.5,
    }
}

print('Test Case Results:')
for case, metrics in TEST_CASE_RESULTS.items():
    print(f'\n  {case}:')
    for k, v in metrics.items():
        print(f'    {k}: {v}')

# Compute means
import numpy as np
mae_values = [v['mae_gy'] for v in TEST_CASE_RESULTS.values()]
gamma_values = [v['gamma_pass_rate'] for v in TEST_CASE_RESULTS.values()]
print(f'\n  Mean MAE: {np.mean(mae_values):.2f} +/- {np.std(mae_values):.2f} Gy')
print(f'  Mean Gamma: {np.mean(gamma_values):.1f}%')

---

## 7. Analysis

### 7.1 Observations

1. **VGG significantly improves validation MAE** (-38% vs baseline), suggesting it helps the model learn better global dose distributions
2. **Test MAE is unchanged** (1.44 Gy for all models), indicating the validation improvement doesn't transfer to held-out cases
3. **Gamma pass rate is unchanged** (~28%), meaning VGG does not improve the spatial accuracy that Gamma measures
4. **Training time increased** (9.74h vs 1.85h for grad-only), due to VGG feature extraction overhead

### 7.2 Why VGG Doesn't Help Gamma

VGG perceptual loss compares high-level features between predicted and ground truth images. However:
- VGG features are designed for natural images (ImageNet), not dose distributions
- VGG emphasizes texture and semantic content, not precise spatial accuracy
- Gamma (3%/3mm) measures point-by-point dose-distance agreement, not feature similarity

### 7.3 Comparison to Previous Work

| Experiment | Val MAE | Test MAE | Gamma | Training Time |
|------------|---------|----------|-------|---------------|
| Baseline | 3.73 Gy | 1.43 Gy | 14.2% | 2.55h |
| Grad Loss | 3.67 Gy | 1.44 Gy | **27.9%** | 1.85h |
| **Grad+VGG** | **2.27 Gy** | 1.44 Gy | ~28% | 9.74h |

### 7.4 Limitations

1. **Small test set** (n=2) - results may not generalize
2. **Gamma computed on central slice only** - not full 3D Gamma
3. **VGG features from ImageNet** - may not be optimal for dose images
4. **Fixed hyperparameters** - VGG weight (0.001) not tuned

---

## 8. Conclusions

1. **VGG perceptual loss improves validation MAE but not Gamma pass rate**
2. **Adding VGG is not recommended** for the 95% Gamma goal - it adds training time without Gamma benefit
3. **Next steps should focus on**:
   - Adversarial loss (PatchGAN) for sharper edges
   - Structure-weighted loss for PTV/OAR accuracy
   - DVH-aware loss for clinical metrics
   - Data augmentation with n=23 cases

---

## 9. Next Steps (Decision Tree Outcome)

Based on the decision tree in `.claude/instructions.md`:

**Result:** Gamma ≈ 28% (unchanged from Phase A)

**Decision:** VGG not helping Gamma. Skip VGG in future experiments.

**Next experiments to try:**
- [ ] **Adversarial loss (PatchGAN)** - For edge sharpness
- [ ] **Structure-weighted loss** - Weight PTV regions 2x
- [ ] **DVH-aware loss** - Penalize D95 underdosing
- [ ] **Data augmentation** - Address overfitting with n=23

---

## 10. Artifacts

| Artifact | Path |
|----------|------|
| Best Checkpoint | `runs/grad_vgg_combined/checkpoints/best-epoch=032-val/mae_gy=2.267.ckpt` |
| Metrics | `runs/grad_vgg_combined/version_1/metrics.csv` |
| Config | `runs/grad_vgg_combined/training_config.json` |
| Summary | `runs/grad_vgg_combined/training_summary.json` |
| Predictions | `predictions/grad_vgg_combined_test/` |
| Figures | `runs/grad_vgg_combined/figures/` |

---

*Notebook created: 2026-01-21*  
*Last updated: 2026-01-21*