# Improved Two-Stream CNN: RGB + Scalar NDVI

## Solution to Performance Issue

**Problem with v28:** NDVI "image" had no spatial variation ‚Üí CNN couldn't learn useful patterns  
**Solution:** Use NDVI as scalar feature with MLP encoder instead of fake image

### Architecture

```
RGB Image (3√ó224√ó224)           NDVI Scalar (1)
        ‚Üì                              ‚Üì
    ResNet18                      MLP Encoder
        ‚Üì                              ‚Üì
   [512 features]               [128 features]
        ‚Üì                              ‚Üì
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ Concatenate ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                      ‚Üì
                Fusion MLP
                      ‚Üì
            [5 Biomass Targets]
```

---

In [None]:
# Cell 1: Setup & Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image

from sklearn.model_selection import train_test_split
from tqdm.auto import tqdm
from datetime import datetime

sns.set_style('whitegrid')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

# Reproducibility
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

print("‚úì Setup complete")

In [None]:
# Cell 2: Configuration

# Model configuration
CONFIG = {
    # Architecture
    'rgb_backbone': 'resnet18',
    'ndvi_hidden': 128,  # NDVI encoder output dimension
    'hidden_dim': 512,
    'dropout': 0.3,
    
    # Training
    'epochs': 40,  # Increased since architecture is better
    'batch_size': 16,
    'learning_rate': 3e-4,
    'weight_decay': 1e-4,
    'val_split': 0.2,
    
    # Data
    'image_size': 224,
    'augmentation': True,
}

# Paths
DATA_PATH = '../../competition/train_enriched.csv'
IMAGE_BASE = '../../competition/'

# Target columns
TARGET_COLS = ['Dry_Green_g', 'Dry_Dead_g', 'Dry_Clover_g', 'GDM_g', 'Dry_Total_g']

print("Configuration:")
print(f"  Model: RGB Stream (ResNet18) + NDVI Encoder (MLP)")
print(f"  RGB features: 512")
print(f"  NDVI features: {CONFIG['ndvi_hidden']}")
print(f"  Epochs: {CONFIG['epochs']}")
print(f"  Batch size: {CONFIG['batch_size']}")
print(f"  Learning rate: {CONFIG['learning_rate']}")
print(f"\n‚úì Configuration loaded")

In [None]:
# Cell 3: Load & Prepare Data

print("Loading data...")
df = pd.read_csv(DATA_PATH)
df['Sampling_Date'] = pd.to_datetime(df['Sampling_Date'])
df['full_image_path'] = df['image_path'].apply(lambda x: f"{IMAGE_BASE}{x}")

print(f"Total samples: {len(df)}")
print(f"\nTarget statistics:")
for col in TARGET_COLS:
    print(f"  {col:15s}: mean={df[col].mean():7.2f}g, std={df[col].std():7.2f}g")

print(f"\nNDVI statistics:")
print(f"  Range: [{df['Pre_GSHH_NDVI'].min():.3f}, {df['Pre_GSHH_NDVI'].max():.3f}]")
print(f"  Mean: {df['Pre_GSHH_NDVI'].mean():.3f}")
print(f"  Std: {df['Pre_GSHH_NDVI'].std():.3f}")

# Train/val split
train_df, val_df = train_test_split(
    df, 
    test_size=CONFIG['val_split'], 
    random_state=42,
    stratify=df['State']
)

print(f"\nSplit:")
print(f"  Training: {len(train_df)} images")
print(f"  Validation: {len(val_df)} images")

# Calculate normalization statistics on TRAINING set
target_means = torch.tensor([train_df[col].mean() for col in TARGET_COLS], dtype=torch.float32)
target_stds = torch.tensor([train_df[col].std() for col in TARGET_COLS], dtype=torch.float32)

ndvi_mean = train_df['Pre_GSHH_NDVI'].mean()
ndvi_std = train_df['Pre_GSHH_NDVI'].std()

print(f"\n‚úì Data loaded and split")

In [None]:
# Cell 4: Improved Dataset (NDVI as scalar)

class ImprovedTwoStreamDataset(Dataset):
    """Dataset that returns RGB image and NDVI as SCALAR feature."""
    
    def __init__(self, dataframe, target_means, target_stds, 
                 ndvi_mean, ndvi_std, augment=False):
        self.df = dataframe.reset_index(drop=True)
        self.target_means = target_means
        self.target_stds = target_stds
        self.ndvi_mean = ndvi_mean
        self.ndvi_std = ndvi_std
        
        # RGB transforms
        rgb_transform_list = [transforms.Resize((CONFIG['image_size'], CONFIG['image_size']))]  
        if augment:
            rgb_transform_list.extend([
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(10),
            ])
        rgb_transform_list.extend([
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
        self.rgb_transform = transforms.Compose(rgb_transform_list)
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        
        # 1. RGB image
        rgb_img = Image.open(row['full_image_path']).convert('RGB')
        rgb_img = self.rgb_transform(rgb_img)  # [3, 224, 224]
        
        # 2. NDVI as SCALAR (not fake image!)
        ndvi_value = row['Pre_GSHH_NDVI']
        ndvi_normalized = (ndvi_value - self.ndvi_mean) / self.ndvi_std
        ndvi_scalar = torch.tensor([ndvi_normalized], dtype=torch.float32)  # [1]
        
        # 3. Targets
        targets = torch.tensor(row[TARGET_COLS].values.astype('float32'), dtype=torch.float32)
        targets_normalized = (targets - self.target_means) / self.target_stds
        
        return {
            'rgb_image': rgb_img,
            'ndvi_scalar': ndvi_scalar,  # KEY CHANGE: scalar not image
            'targets': targets_normalized,
            'targets_original': targets
        }

print("‚úì ImprovedTwoStreamDataset defined")

In [None]:
# Cell 5: Improved Model Architecture

class ImprovedTwoStreamModel(nn.Module):
    """RGB Stream (CNN) + NDVI Stream (MLP) ‚Üí Fusion ‚Üí Biomass Prediction."""
    
    def __init__(self, num_outputs=5, ndvi_hidden=128, hidden_dim=512, dropout=0.3):
        super().__init__()
        
        # Stream 1: RGB (ResNet18 backbone)
        resnet_rgb = models.resnet18(weights=None)
        self.rgb_stream = nn.Sequential(*list(resnet_rgb.children())[:-1])
        rgb_feature_dim = 512
        
        # Stream 2: NDVI (MLP encoder for scalar input)
        self.ndvi_stream = nn.Sequential(
            nn.Linear(1, 64),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(64, ndvi_hidden),
            nn.ReLU(),
            nn.Dropout(dropout)
        )
        
        # Fusion layers
        combined_dim = rgb_feature_dim + ndvi_hidden
        self.fusion = nn.Sequential(
            nn.Linear(combined_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim // 2, num_outputs)
        )
    
    def forward(self, rgb_img, ndvi_scalar):
        # Extract features from RGB stream
        rgb_features = self.rgb_stream(rgb_img).flatten(1)  # [batch, 512]
        
        # Process NDVI through MLP
        ndvi_features = self.ndvi_stream(ndvi_scalar)  # [batch, 128]
        
        # Concatenate features
        combined = torch.cat([rgb_features, ndvi_features], dim=1)  # [batch, 640]
        
        # Fusion and prediction
        output = self.fusion(combined)  # [batch, 5]
        
        return output

# Create model
model = ImprovedTwoStreamModel(
    num_outputs=5,
    ndvi_hidden=CONFIG['ndvi_hidden'],
    hidden_dim=CONFIG['hidden_dim'],
    dropout=CONFIG['dropout']
)
model = model.to(device)

print("‚úì ImprovedTwoStreamModel defined")
print(f"  Total parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"  RGB stream: {sum(p.numel() for p in model.rgb_stream.parameters()):,} params")
print(f"  NDVI stream: {sum(p.numel() for p in model.ndvi_stream.parameters()):,} params")
print(f"  Fusion layers: {sum(p.numel() for p in model.fusion.parameters()):,} params")
print(f"\nArchitecture:")
print(f"  RGB ‚Üí 512 features")
print(f"  NDVI ‚Üí {CONFIG['ndvi_hidden']} features")
print(f"  Combined ‚Üí {512 + CONFIG['ndvi_hidden']} features ‚Üí Fusion ‚Üí 5 outputs")

In [None]:
# Cell 6: Create Dataloaders

train_dataset = ImprovedTwoStreamDataset(
    train_df, target_means, target_stds, 
    ndvi_mean, ndvi_std, augment=CONFIG['augmentation']
)

val_dataset = ImprovedTwoStreamDataset(
    val_df, target_means, target_stds,
    ndvi_mean, ndvi_std, augment=False
)

train_loader = DataLoader(
    train_dataset, 
    batch_size=CONFIG['batch_size'], 
    shuffle=True,
    num_workers=0
)

val_loader = DataLoader(
    val_dataset,
    batch_size=CONFIG['batch_size'],
    shuffle=False,
    num_workers=0
)

print("‚úì Dataloaders created")
print(f"  Training batches: {len(train_loader)}")
print(f"  Validation batches: {len(val_loader)}")

# Test batch
sample_batch = next(iter(train_loader))
print(f"\nSample batch shapes:")
print(f"  RGB image: {sample_batch['rgb_image'].shape}")
print(f"  NDVI scalar: {sample_batch['ndvi_scalar'].shape}  ‚Üê KEY CHANGE: scalar not image!")
print(f"  Targets: {sample_batch['targets'].shape}")

# Verify NDVI values have variation
print(f"\nNDVI batch statistics (normalized):")
print(f"  Min: {sample_batch['ndvi_scalar'].min():.3f}")
print(f"  Max: {sample_batch['ndvi_scalar'].max():.3f}")
print(f"  Std: {sample_batch['ndvi_scalar'].std():.3f}")
print(f"  ‚úì NDVI has variation across samples (unlike fake constant images)")

In [None]:
# Cell 7: Training Setup

# Loss function
criterion = nn.MSELoss()

# Optimizer
optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=CONFIG['learning_rate'],
    weight_decay=CONFIG['weight_decay']
)

# Learning rate scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5, verbose=True
)

print("‚úì Training setup complete")
print(f"  Optimizer: AdamW")
print(f"  Learning rate: {CONFIG['learning_rate']}")
print(f"  Weight decay: {CONFIG['weight_decay']}")
print(f"  Scheduler: ReduceLROnPlateau")

In [None]:
# Cell 8: Training Functions

def train_epoch(model, loader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    
    for batch in tqdm(loader, desc='Training', leave=False):
        rgb_imgs = batch['rgb_image'].to(device)
        ndvi_scalars = batch['ndvi_scalar'].to(device)  # Changed: scalar not image
        targets = batch['targets'].to(device)
        
        optimizer.zero_grad()
        outputs = model(rgb_imgs, ndvi_scalars)  # Changed signature
        loss = criterion(outputs, targets)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0)
        optimizer.step()
        
        total_loss += loss.item() * rgb_imgs.size(0)
    
    return total_loss / len(loader.dataset)

def validate_epoch(model, loader, criterion, device, target_means, target_stds):
    model.eval()
    total_loss = 0
    all_preds = []
    all_targets = []
    
    with torch.no_grad():
        for batch in tqdm(loader, desc='Validating', leave=False):
            rgb_imgs = batch['rgb_image'].to(device)
            ndvi_scalars = batch['ndvi_scalar'].to(device)  # Changed: scalar not image
            targets = batch['targets'].to(device)
            targets_original = batch['targets_original']
            
            outputs = model(rgb_imgs, ndvi_scalars)  # Changed signature
            loss = criterion(outputs, targets)
            
            # Denormalize predictions
            outputs_denorm = outputs.cpu() * target_stds + target_means
            
            total_loss += loss.item() * rgb_imgs.size(0)
            all_preds.append(outputs_denorm.numpy())
            all_targets.append(targets_original.numpy())
    
    all_preds = np.vstack(all_preds)
    all_targets = np.vstack(all_targets)
    
    # Calculate R¬≤ for each target
    r2_scores = []
    for i in range(5):
        ss_res = np.sum((all_targets[:, i] - all_preds[:, i]) ** 2)
        ss_tot = np.sum((all_targets[:, i] - np.mean(all_targets[:, i])) ** 2)
        r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0.0
        r2_scores.append(r2)
    
    # Weighted R¬≤ (competition metric)
    weights = np.array([0.1, 0.1, 0.1, 0.2, 0.5])
    weighted_r2 = np.sum(np.array(r2_scores) * weights) / np.sum(weights)
    
    return total_loss / len(loader.dataset), weighted_r2, r2_scores

print("‚úì Training functions defined")

In [None]:
# Cell 9: Main Training Loop

print("="*80)
print("STARTING TRAINING - IMPROVED TWO-STREAM MODEL")
print("="*80)
print(f"Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Epochs: {CONFIG['epochs']}")
print(f"Batch size: {CONFIG['batch_size']}")
print(f"Architecture: RGB (ResNet18) + NDVI (MLP) ‚Üí Fusion")
print("\n")

best_val_r2 = -float('inf')
history = {'train_loss': [], 'val_loss': [], 'val_r2': []}

for epoch in range(CONFIG['epochs']):
    print(f"Epoch {epoch+1}/{CONFIG['epochs']}")
    
    # Train
    train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Validate
    val_loss, val_r2, r2_scores = validate_epoch(
        model, val_loader, criterion, device, target_means, target_stds
    )
    
    # Learning rate scheduling
    scheduler.step(val_loss)
    
    # Log
    history['train_loss'].append(train_loss)
    history['val_loss'].append(val_loss)
    history['val_r2'].append(val_r2)
    
    print(f"  Train Loss: {train_loss:.4f}")
    print(f"  Val Loss:   {val_loss:.4f}")
    print(f"  Val R¬≤:     {val_r2:+.4f}")
    print(f"  Per-target R¬≤: [{', '.join([f'{r:+.3f}' for r in r2_scores])}]")
    
    # Save best model
    if val_r2 > best_val_r2:
        best_val_r2 = val_r2
        torch.save(model.state_dict(), 'two_stream_improved_best.pth')
        print(f"  üíæ New best model saved! R¬≤={best_val_r2:+.4f}")
    
    print()

print("="*80)
print("TRAINING COMPLETE")
print("="*80)
print(f"Best Val R¬≤: {best_val_r2:+.4f}")
print(f"Finished: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Cell 10: Plot Training History

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss plot
axes[0].plot(history['train_loss'], label='Train Loss', marker='o')
axes[0].plot(history['val_loss'], label='Val Loss', marker='s')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('MSE Loss')
axes[0].set_title('Training and Validation Loss')
axes[0].legend()
axes[0].grid(True)

# R¬≤ plot
axes[1].plot(history['val_r2'], label='Val R¬≤ (Improved)', marker='o', color='green')
axes[1].axhline(y=0.51, color='red', linestyle='--', label='Baseline Kaggle (0.51)')
axes[1].axhline(y=0.69, color='orange', linestyle='--', label='Baseline Val (0.69)')
axes[1].axhline(y=0.327, color='purple', linestyle='--', label='Old Two-Stream (0.327)')
axes[1].axhline(y=best_val_r2, color='blue', linestyle='--', label=f'Best ({best_val_r2:+.4f})')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('R¬≤ Score')
axes[1].set_title('Validation R¬≤ Comparison')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.savefig('training_history_improved.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Training history plotted")

In [None]:
# Cell 11: Load Best Model & Final Evaluation

print("Loading best model...")
model.load_state_dict(torch.load('two_stream_improved_best.pth'))
model.eval()

# Final validation
val_loss, val_r2, r2_scores = validate_epoch(
    model, val_loader, criterion, device, target_means, target_stds
)

print("="*80)
print("FINAL EVALUATION - IMPROVED TWO-STREAM MODEL")
print("="*80)
print(f"\nValidation Results:")
print(f"  Loss: {val_loss:.4f}")
print(f"  Weighted R¬≤: {val_r2:+.4f}")
print(f"\nPer-target R¬≤:")
for i, col in enumerate(TARGET_COLS):
    print(f"  {col:15s}: {r2_scores[i]:+.4f}")

print(f"\n{'='*80}")
print("COMPARISON WITH ALL PREVIOUS ATTEMPTS")
print(f"{'='*80}")
print(f"  Baseline (single ResNet18):        Val R¬≤ = ~0.69, Kaggle = 0.51")
print(f"  K-Fold (all features):             Val R¬≤ = ~0.90, Kaggle = 0.50")
print(f"  K-Fold (universal features):       Val R¬≤ = ~0.68, Kaggle = 0.40")
print(f"  Two-Stream (fake NDVI image):      Val R¬≤ = +0.327 ‚ùå")
print(f"  Two-Stream Improved (NDVI scalar): Val R¬≤ = {val_r2:+.4f} ‚Üê THIS MODEL")

# Performance assessment
improvement_over_old = val_r2 - 0.327
print(f"\n{'='*80}")
print("PERFORMANCE ASSESSMENT")
print(f"{'='*80}")
print(f"Improvement over old two-stream: {improvement_over_old:+.4f}")

if val_r2 >= 0.69:
    print(f"\n‚úÖ EXCELLENT! Val R¬≤ ({val_r2:+.4f}) matches baseline validation")
    print(f"   Expected Kaggle: 0.51-0.54")
    print(f"   Recommendation: Create Kaggle submission!")
elif val_r2 >= 0.60:
    print(f"\n‚úÖ GOOD! Val R¬≤ ({val_r2:+.4f}) is respectable")
    print(f"   Expected Kaggle: 0.48-0.52")
    print(f"   Recommendation: Worth submitting to Kaggle")
elif val_r2 >= 0.50:
    print(f"\n‚ö†Ô∏è  MODERATE. Val R¬≤ ({val_r2:+.4f}) is decent")
    print(f"   Expected Kaggle: 0.45-0.50")
    print(f"   Recommendation: Try submission, but keep exploring")
else:
    print(f"\n‚ùå POOR. Val R¬≤ ({val_r2:+.4f}) is still low")
    print(f"   Expected Kaggle: <0.45")
    print(f"   Recommendation: Try different approach (add more features, different architecture)")

print(f"\n{'='*80}")

---

## Analysis: Why This Should Work Better

### Problem with Old Architecture (v28)
- NDVI "image" = constant value replicated 224√ó224 times
- No spatial variation ‚Üí CNN filters learn nothing useful
- Result: Val R¬≤ = 0.327 (terrible)

### Solution in This Architecture (v29)
- NDVI is treated as **scalar feature** (what it actually is!)
- MLP encoder learns appropriate non-linear transformations
- RGB stream still extracts spatial features from images
- Fusion combines spatial (RGB) + scalar (NDVI) information

### Expected Outcome
- Should beat old two-stream (0.327) significantly
- Should be competitive with baseline (~0.69 val)
- Whether it beats baseline depends on whether NDVI encoder adds value beyond what RGB already captures

---

## Next Steps

### If Val R¬≤ ‚â• 0.65
1. Create Kaggle submission notebook
2. Upload checkpoint as dataset
3. Submit to competition

### If Val R¬≤ ‚âà 0.50-0.64
1. Add more scalar features (Height, Season, Daylength)
2. Try attention-based fusion instead of concatenation
3. Try K-Fold ensemble of this architecture

### If Val R¬≤ < 0.50
1. Try ResNet50 instead of ResNet18 for RGB stream
2. Add auxiliary pretraining (as in previous experiments)
3. Consider that single-stream baseline is hard to beat

---