# Semi-Multi-Modal Hypothesis for VMAT Dose Prediction

**Date:** 2026-01-21  
**Type:** Hypothesis & Analysis  
**Status:** Active Investigation  

---

## 1. Executive Summary

This notebook documents a key insight that may change our approach to dose prediction:

**Hypothesis:** Dose prediction is not purely deterministic. While PTV coverage and OAR constraints are hard requirements, the low/intermediate dose distribution in "no-man's land" (transition regions between PTVs and OARs) is flexible - multiple valid solutions exist as long as DVH constraints are met.

**Implications:**
1. Our current pixel-wise metrics (MAE, Gamma) may be too strict
2. DDPM's "blurring" may be averaging valid solutions, not failing
3. DVH-aware losses may unlock clinically-focused optimization
4. DDPM may be viable with proper region-aware metrics

**Recommendation:** Test DVH-aware loss on baseline U-Net before revisiting DDPM.

---

## 2. Background: The Original Assessment

### 2.1 Why We Dismissed DDPM

After Phase 1 DDPM optimization experiments (2026-01-20), we concluded:

| Observation | Original Interpretation |
|-------------|------------------------|
| "More steps = worse" | Structural issue - model denoises away dose signal |
| Near-zero sample variability | Model is deterministic, not generative |
| DDPM = Baseline accuracy | Added complexity provides no benefit |
| 50/1000 steps optimal | Essentially one-shot prediction |

**Conclusion:** "Dose prediction is deterministic (one correct answer per patient). Diffusion models excel at multi-modal generation. We're forcing a generative framework onto a regression problem."

### 2.2 What We May Have Missed

The assessment treated dose prediction as **fully deterministic**, overlooking that:

1. Clinical acceptability is defined by **DVH constraints**, not pixel-perfect reproduction
2. Multiple dose distributions can satisfy the same constraints
3. Low-dose "spray" in non-critical regions varies significantly between equally valid plans

---

## 3. The Semi-Multi-Modal Hypothesis

### 3.1 Core Insight

Dose prediction is **semi-multi-modal**:

| Region | Constraint Type | Flexibility | Example |
|--------|-----------------|-------------|----------|
| **PTV** | Hard | None - deterministic | D95 ≥ 95% of 70 Gy |
| **OARs** | Hard | Minimal | Rectum V70 < 15% |
| **No-man's land** | Physics-bounded | **High** | 10-50 Gy spray |

### 3.2 What is "No-Man's Land"?

The transition regions between PTVs and OARs where:
- Dose must fall off from prescription levels
- No specific dose requirement exists
- Multiple valid distributions satisfy constraints
- Clinical plans show significant variation

```
     [PTV 70Gy]  →  [No-Man's Land 10-50Gy]  →  [OAR <constraints]
     Deterministic      Flexible                 Deterministic
```

### 3.3 Physics Constraints on Flexibility

Flexibility is **bounded**, not unlimited:

- **Beam physics:** Inverse square law, exponential attenuation, penumbra
- **Linac capabilities:** MLC leaf speeds, dose rates, energy limits
- **Build-up effects:** Near skin and tissue interfaces
- **Homogeneity:** No random hot spots (>105% Rx outside PTV)

**Risk of over-relaxation:** Unphysical artifacts, hot spots, poor homogeneity.

---

## 4. Reinterpreting DDPM Behavior

### 4.1 "More Steps = Worse"

**Original interpretation:** Model denoises away dose signal (structural failure).

**Alternative interpretation:** With more steps, model converges to the **mean** of multiple valid solutions. This mean is blurred/averaged, producing higher MAE against a single ground truth that is just one of many valid options.

### 4.2 "Near-Zero Sample Variability"

**Original interpretation:** Model is deterministic, not generative.

**Alternative interpretation:** Model learned to output the average of valid solutions. The variability is collapsed because we trained with pixel-wise loss that penalizes any deviation, forcing convergence to mean.

### 4.3 "DDPM = Baseline Accuracy"

**Original interpretation:** DDPM provides no benefit.

**Alternative interpretation:** Both converge to same average because both use pixel-wise losses. DDPM's generative capacity is suppressed, not absent.

### 4.4 Key Question

**If we relax metrics in flexible regions and focus on DVH compliance, would DDPM show meaningful diversity in valid dose distributions?**

---

## 5. Evidence Assessment

### 5.1 Supporting Evidence

- **Clinical practice:** Planners produce different-looking plans that meet same constraints
- **Inverse planning nature:** Optimization has many local minima satisfying constraints
- **Literature:** Some papers report 85-95% Gamma with relaxed criteria (5%/5mm)
- **Phase B results:** VGG improved MAE 38% but Gamma unchanged - suggests different valid distributions possible

### 5.2 Counter-Evidence

- **DDPM volatility:** Extreme MAE range (12-64 Gy) during training suggests instability, not multi-modality
- **Clinician expectations:** May prefer predictable, reviewable plans over "creative" distributions
- **Small dataset:** n=23 may not capture true clinical variability

### 5.3 Validation Needed

Before committing to hypothesis:
1. Analyze ground-truth doses for actual variation in no-man's land
2. Compute region-specific Gamma (separate PTV vs flexible regions)
3. Test DVH-aware loss on baseline first (simpler than DDPM overhaul)

---

## 6. Proposed Validation Experiments

### 6.1 Low-Dose Variability Analysis (Diagnostic)

**Goal:** Quantify actual variation in clinical plans' no-man's land regions.

```python
def analyze_low_dose_variability(cases):
    """
    Analyze ground-truth doses for variation in flexible regions.
    If clinical plans show significant diversity, hypothesis is supported.
    """
    for case in cases:
        # Define no-man's land: outside PTV, outside OARs, dose 10-50 Gy
        ptv_mask = case['masks']['PTV70'] | case['masks']['PTV56']
        oar_mask = case['masks']['Rectum'] | case['masks']['Bladder']
        no_mans_land = ~ptv_mask & ~oar_mask & (case['dose'] > 0.14) & (case['dose'] < 0.71)  # 10-50 Gy normalized
        
        # Track statistics
        dose_in_nml = case['dose'][no_mans_land]
        stats['mean'].append(np.mean(dose_in_nml))
        stats['std'].append(np.std(dose_in_nml))
        stats['volume_fraction'].append(np.sum(no_mans_land) / case['dose'].size)
    
    # Compare across cases with similar anatomies
    # High inter-case variance → hypothesis supported
    return stats
```

**Expected outcome:** If std(mean_dose_in_nml) > 5 Gy across similar anatomies, significant flexibility exists.

### 6.2 Region-Specific Gamma (Diagnostic)

**Goal:** Understand where prediction errors concentrate.

```python
def region_specific_gamma(pred_dose, gt_dose, masks):
    """
    Compute Gamma separately for critical vs flexible regions.
    """
    results = {}
    
    # PTV region (critical - must be accurate)
    ptv_mask = masks['PTV70'] | masks['PTV56']
    results['PTV_gamma'] = compute_gamma(pred_dose, gt_dose, mask=ptv_mask)
    
    # OAR region (critical - must meet constraints)
    oar_mask = masks['Rectum'] | masks['Bladder']
    results['OAR_gamma'] = compute_gamma(pred_dose, gt_dose, mask=oar_mask)
    
    # No-man's land (flexible - variation acceptable)
    nml_mask = ~ptv_mask & ~oar_mask & (gt_dose > 0.14)
    results['NML_gamma'] = compute_gamma(pred_dose, gt_dose, mask=nml_mask)
    
    return results
```

**Expected outcome:** If PTV_gamma >> NML_gamma, failures are in flexible regions (potentially acceptable).

### 6.3 DVH-Aware Loss on Baseline (Implementation)

**Goal:** Test if DVH-focused optimization improves clinical relevance.

```python
class DVHAwareLoss(nn.Module):
    def __init__(self, structures, constraints):
        super().__init__()
        self.structures = structures
        self.constraints = constraints
    
    def forward(self, pred_dose, gt_dose, masks):
        loss = 0
        
        # PTV D95 must match (high weight)
        for ptv in ['PTV70', 'PTV56']:
            pred_d95 = self.compute_d95(pred_dose, masks[ptv])
            gt_d95 = self.compute_d95(gt_dose, masks[ptv])
            loss += 10.0 * F.mse_loss(pred_d95, gt_d95)
        
        # OAR constraints (penalize violations)
        for oar, constraint in self.constraints.items():
            pred_metric = self.compute_dvh_metric(pred_dose, masks[oar], constraint['type'])
            if pred_metric > constraint['limit']:
                loss += (pred_metric - constraint['limit']) ** 2
        
        # Relaxed pixel-wise loss in flexible regions
        nml_mask = self.get_no_mans_land_mask(masks)
        loss += 0.5 * F.mse_loss(pred_dose[nml_mask], gt_dose[nml_mask])  # Lower weight
        
        return loss
    
    def compute_d95(self, dose, mask):
        """Differentiable D95 approximation via sorted histogram."""
        dose_in_struct = dose[mask]
        sorted_dose, _ = torch.sort(dose_in_struct, descending=True)
        idx_95 = int(0.95 * len(sorted_dose))
        return sorted_dose[idx_95]
```

---

## 7. Decision Framework

### 7.1 Path Forward

```
Phase 1: Validate Hypothesis (Low effort)
├── Run low_dose_variability_analysis on ground-truth data
├── Run region_specific_gamma on existing predictions
└── IF hypothesis supported → Proceed to Phase 2
    ELSE → Continue with current pixel-wise approach

Phase 2: DVH-Aware Loss on Baseline (Medium effort)
├── Implement DVHAwareLoss class
├── Train baseline U-Net with DVH loss
├── Evaluate: DVH compliance, region-specific Gamma
└── IF Gamma ≥ 50% → Success! Continue tuning
    IF Gamma ≈ 30% → Proceed to Phase 3

Phase 3: Structure-Weighted Loss (Medium effort)
├── Implement region-weighted MSE
├── 2x weight PTV, 1.5x weight OAR, 0.5x weight NML
└── IF combined with DVH achieves Gamma ≥ 50% → Success!
    ELSE → Consider Phase 4

Phase 4: Physics-Bounded DDPM (High effort - only if needed)
├── Region-aware noise schedules
├── Physics-informed regularizers
├── Bounded multi-modality sampling
└── Only pursue if baseline + DVH + structure-weighted < 50% Gamma
```

### 7.2 Success Criteria

| Metric | Current | Target | Clinical Requirement |
|--------|---------|--------|----------------------|
| Overall Gamma | ~28% | 50% → 95% | 95% for deployment |
| PTV Gamma | Unknown | >95% | Critical |
| PTV D95 error | Unknown | <2 Gy | Critical |
| OAR constraint compliance | Unknown | >95% | Critical |
| NML Gamma | Unknown | >80% | Acceptable if others pass |

### 7.3 DDPM Revisit Criteria

**Revisit DDPM if:**
- Baseline + DVH + structure-weighted plateaus at ~30% Gamma
- Region-specific analysis shows NML failures dominate
- Ground-truth analysis confirms significant clinical variability

**Don't revisit DDPM if:**
- DVH-aware baseline achieves >50% Gamma
- Region-specific analysis shows PTV failures dominate
- Ground-truth analysis shows minimal clinical variability

---

## 8. Implementation Recommendations

### 8.1 Immediate Actions

1. **Implement DVH-aware loss** in `train_baseline_unet.py`
   - Add `--use_dvh_loss` flag
   - Implement differentiable D95, Dmean, Vx
   - Priority: HIGH

2. **Add region-specific Gamma** to evaluation
   - Modify test evaluation to compute PTV/OAR/NML Gamma separately
   - Priority: MEDIUM

3. **Run low-dose variability analysis** on existing data
   - Create analysis script
   - Priority: MEDIUM

### 8.2 Code Changes Required

```
src/models/losses.py:
  + class DVHAwareLoss
  + class StructureWeightedMSE

scripts/train_baseline_unet.py:
  + --use_dvh_loss flag
  + --use_structure_weighted flag

scripts/evaluate_dose.py:
  + region_specific_gamma()
  + dvh_compliance_check()

scripts/analyze_dose_variability.py (new):
  + analyze_low_dose_variability()
  + compute_no_mans_land_stats()
```

### 8.3 Risk Mitigation

| Risk | Mitigation |
|------|------------|
| Over-relaxing metrics → unphysical artifacts | Add hot-spot penalty, physics constraints |
| DVH loss unstable during training | Start with low weight, gradually increase |
| Region masks inaccurate | Validate masks visually, use SDF gradients |
| DDPM pivot wastes time | Only pursue if baseline approaches proven insufficient |

---

## 9. Conclusion

The semi-multi-modal hypothesis offers a promising reframing of our dose prediction challenge:

1. **Current metrics may be too strict** - penalizing valid variations in flexible regions
2. **DVH-focused losses** may unlock clinically-relevant optimization
3. **DDPM is not necessarily unsuitable** - may need proper region-aware metrics
4. **Validation before commitment** - test hypothesis with low-effort experiments first

**Next steps:**
1. Implement DVH-aware loss (highest priority)
2. Run region-specific Gamma analysis (diagnostic)
3. Analyze ground-truth variability (validation)
4. Revisit DDPM only if baseline approaches insufficient

---

*Notebook created: 2026-01-21*  
*Based on discussion with Grok AI regarding dose prediction multi-modality*  
*Status: Hypothesis under investigation*