# Structure-Weighted Loss Experiment

**Date:** 2026-01-22  
**Experiment ID:** structure_weighted_loss  
**Status:** Complete

---

## 1. Overview

### Objective
Implement and evaluate a structure-weighted MSE loss that weights errors by clinical importance:
- 2x weight for PTV regions (critical target volumes)
- 1.5x weight for OAR boundaries (sharp gradients matter)
- 0.5x weight for "no-man's land" (flexible dose regions)

### Hypothesis
By focusing model learning capacity on clinically important regions (PTVs) while allowing flexibility elsewhere, the model should achieve better Gamma pass rates without sacrificing overall dose accuracy.

### Key Results
| Metric | Baseline | Grad Loss | DVH-Aware | **Structure-Weighted** |
|--------|----------|-----------|-----------|------------------------|
| Val MAE | 3.73 Gy | 3.67 Gy | 3.61 Gy | **2.91 Gy** |
| Test MAE | 1.43 Gy | 1.44 Gy | **0.95 Gy** | 1.40 Gy |
| Gamma (3%/3mm) | 14.2% | 27.9% | 27.7% | **31.2%** |

### Conclusion
**Structure-weighted loss achieves the best Gamma pass rate (31.2%)**, improving 3.3% over gradient loss alone. It also achieves the best validation MAE (2.91 Gy). This demonstrates that weighting errors by clinical importance helps the model focus on the most critical regions.

---

## 2. Reproducibility Information

### Git Information
- **Implementation Commit:** `8b08506` (feat: Add structure-weighted loss)
- **Repository:** wrockey/vmat-diffusion
- **Branch:** main

### Environment
- **Platform:** Native Windows (Pinokio)
- **Conda Environment:** vmat-win
- **Python:** 3.10
- **PyTorch:** 2.6.0+cu124
- **GPU:** NVIDIA GeForce RTX 3090 (24 GB)

### Command to Reproduce
```cmd
call C:\pinokio\bin\miniconda\Scripts\activate.bat vmat-win
cd C:\Users\Bill\vmat-diffusion-project

python scripts\train_baseline_unet.py \
    --exp_name structure_weighted_loss \
    --data_dir I:\processed_npz \
    --use_gradient_loss --gradient_loss_weight 0.1 \
    --use_structure_weighted --structure_weighted_weight 1.0 \
    --epochs 100 --batch_size 1
```

---

## 3. Dataset Information

- **Total Cases:** 23 (prostate VMAT with SIB)
- **Train:** 19 cases (83%)
- **Validation:** 2 cases (9%)
- **Test:** 2 cases (9%) - case_0007, case_0021
- **Random Seed:** 42
- **Data Location:** `I:\processed_npz`

### Data Format
Each NPZ file contains:
- `ct`: CT volume (normalized)
- `dose`: Dose distribution (normalized 0-1, Rx=70 Gy)
- `masks_sdf`: 8 signed distance fields for structures
- `constraints`: 13 planning constraints

---

## 4. Method: Structure-Weighted Loss

### Concept
Standard MSE treats all voxels equally, but clinical importance varies:
- **PTV regions** must receive accurate dose (underdosing = treatment failure)
- **OAR boundaries** require sharp gradients (protect organs while treating tumor)
- **"No-man's land"** has flexible requirements (physics-bounded but not clinically critical)

### Implementation
```python
class StructureWeightedLoss(nn.Module):
    def __init__(self, ptv_weight=2.0, oar_boundary_weight=1.5, 
                 background_weight=0.5, boundary_width_mm=5.0):
        ...
    
    def compute_weight_map(self, condition):
        # Start with background weight everywhere
        weight_map = background_weight
        
        # OAR boundaries: |SDF| < threshold
        for oar in ['Rectum', 'Bladder', 'Bowel', ...]:
            near_boundary = |sdf[oar]| < boundary_threshold
            weight_map[near_boundary] = oar_boundary_weight
        
        # PTVs have highest priority
        for ptv in ['PTV70', 'PTV56']:
            weight_map[inside_ptv] = ptv_weight
        
        return weight_map
    
    def forward(self, pred, target, condition):
        weight_map = self.compute_weight_map(condition)
        weighted_mse = (weight_map * (pred - target)**2).mean()
        return weighted_mse
```

### Hyperparameters
| Parameter | Value | Rationale |
|-----------|-------|------------|
| PTV weight | 2.0 | Highest priority (target volumes) |
| OAR boundary weight | 1.5 | Important for sharp gradients |
| Background weight | 0.5 | Flexible regions |
| Boundary width | 5.0 mm | Typical gradient region width |

---

## 5. Training Configuration

| Parameter | Value |
|-----------|-------|
| Model | BaselineUNet3D (23.7M params) |
| Base Channels | 48 |
| Learning Rate | 1e-4 |
| Optimizer | AdamW (weight_decay=0.01) |
| Scheduler | CosineAnnealing |
| Batch Size | 1 |
| Patch Size | 128 |
| Max Epochs | 100 |
| Early Stopping | Patience=50 on val/mae_gy |
| Precision | 16-mixed |

### Loss Configuration
| Loss Component | Weight |
|----------------|--------|
| MSE | 1.0 |
| Negative Penalty | 0.1 |
| Gradient Loss (3D Sobel) | 0.1 |
| Structure-Weighted Loss | 1.0 |

---

## 6. Results

### 6.1 Training Summary
- **Best Epoch:** 31
- **Best Val MAE:** 2.91 Gy
- **Early Stopped:** Epoch 81 (patience 50)
- **Training Time:** 2.62 hours

### 6.2 Test Set Results
| Case | MAE (Gy) | Gamma (3%/3mm) |
|------|----------|----------------|
| case_0007 | 1.63 | 33.4% |
| case_0021 | 1.16 | 29.0% |
| **Mean** | **1.40 ± 0.23** | **31.2 ± 2.2%** |

### 6.3 Comparison with Previous Models

| Model | Val MAE | Test MAE | Gamma | Training Time |
|-------|---------|----------|-------|---------------|
| Baseline | 3.73 Gy | 1.43 Gy | 14.2% | 2.55h |
| Grad Loss | 3.67 Gy | 1.44 Gy | 27.9% | 1.85h |
| DVH-Aware | 3.61 Gy | **0.95 Gy** | 27.7% | 11.2h |
| **Struct-Weight** | **2.91 Gy** | 1.40 Gy | **31.2%** | 2.62h |

### 6.4 Figures

#### Figure 1: Model Comparison
![Model Comparison](../runs/structure_weighted_loss/figures/fig1_model_comparison.png)

#### Figure 2: Training Curves
![Training Curves](../runs/structure_weighted_loss/figures/fig2_training_curves.png)

#### Figure 3: Per-Case Test Results
![Case Metrics](../runs/structure_weighted_loss/figures/fig3_case_metrics.png)

#### Figure 4: Key Finding
![Key Finding](../runs/structure_weighted_loss/figures/fig4_key_finding.png)

---

## 7. Analysis

### 7.1 Key Observations

1. **Best Gamma Pass Rate (31.2%):** Structure-weighted loss improves Gamma by 3.3% over gradient loss alone (27.9%).

2. **Best Validation MAE (2.91 Gy):** 22% improvement over baseline (3.73 Gy), suggesting better fit to clinical priorities.

3. **Competitive Test MAE (1.40 Gy):** Slightly lower than baseline (1.43 Gy), but test MAE and val MAE can diverge with small test sets.

4. **Efficient Training (2.62h):** Comparable to gradient loss alone (1.85h), much faster than DVH-aware (11.2h).

### 7.2 Why Structure Weighting Helps Gamma

Gamma pass rate is most sensitive to errors in high-dose regions and at dose gradients. By:
- Weighting PTV errors 2x → model prioritizes accurate high-dose delivery
- Weighting OAR boundary errors 1.5x → model learns sharper gradients
- Reducing background weight to 0.5x → model doesn't waste capacity on flexible regions

### 7.3 Comparison with DVH-Aware Loss

| Aspect | DVH-Aware | Structure-Weighted |
|--------|-----------|--------------------|
| Test MAE | **0.95 Gy** (best) | 1.40 Gy |
| Gamma | 27.7% | **31.2%** (best) |
| Training Time | 11.2h | **2.62h** |
| Complexity | High (soft D95, Vx) | Low (weight map) |

DVH-aware optimizes clinical metrics directly but is slower and doesn't improve Gamma. Structure-weighted is simpler, faster, and achieves better Gamma.

### 7.4 Limitations

1. **Small Test Set (n=2):** Results may not generalize. Need validation on larger dataset.

2. **Still Below Clinical Target:** 31.2% Gamma is far from 95% target. Fundamental limitation likely from dataset size (n=23).

3. **Fixed Weight Hyperparameters:** Optimal weights may vary by patient anatomy or treatment protocol.

---

## 8. Conclusions

### Main Finding
**Structure-weighted loss achieves the best Gamma pass rate (31.2%) among all tested approaches, improving 3.3% over gradient loss alone while maintaining competitive training time.**

### Implications
1. Weighting errors by clinical importance is a simple, effective way to improve model performance
2. The approach is complementary to gradient loss and can potentially be combined with DVH-aware loss
3. Training efficiency is maintained (2.62h vs 11.2h for DVH)

### Recommendations
1. **Use structure-weighted loss as new default** for future experiments
2. **Combine with DVH-aware loss** to get best of both worlds (good MAE + good Gamma)
3. **Retrain on larger dataset** (100+ cases) to validate improvements at scale

---

## 9. Next Steps

1. **Combine structure-weighted + DVH-aware** - May get best MAE AND Gamma
2. **Wait for 100+ cases** - Retrain both approaches at scale
3. **Tune weight hyperparameters** - Try 3x PTV, 2x OAR boundary, etc.
4. **Region-specific Gamma analysis** - Understand where remaining errors concentrate

---

## 10. Artifacts

| Artifact | Path |
|----------|------|
| Best Checkpoint | `runs/structure_weighted_loss/checkpoints/best-epoch=031-val/mae_gy=2.911.ckpt` |
| Training Config | `runs/structure_weighted_loss/training_config.json` |
| Training Summary | `runs/structure_weighted_loss/training_summary.json` |
| Metrics CSV | `runs/structure_weighted_loss/version_1/metrics.csv` |
| Test Predictions | `predictions/structure_weighted_test/` |
| Test Results | `predictions/structure_weighted_test/evaluation_results.json` |
| Figures | `runs/structure_weighted_loss/figures/` |
| Figure Script | `scripts/generate_structure_weighted_figures.py` |