# Option 1: Fix EfficientNet with Proper Hyperparameters

## Problem
EfficientNet-B0 underperformed (Val R¬≤=0.5903) because it used ResNet18's hyperparameters.

**Key Issue**: EfficientNet has 1280 features vs ResNet18's 512 (2.5x more!)

## Solution
Test 3 EfficientNet variants with higher learning rates:

1. **HigherLR**: head_lr=1e-3 (3.3x), backbone_lr=5e-5 (5x), less regularization
2. **VeryHighLR**: head_lr=2e-3 (6.7x), backbone_lr=1e-4 (10x) - aggressive!
3. **WithBatchNorm**: Same as HigherLR but adds BatchNorm to head

## Expected
Best case: Val R¬≤=0.64-0.68 (could beat baseline 0.6852!)

## Time
- DEBUG: ~15 min (2 epochs √ó 3 variants)
- FULL: ~1.5 hours (40 epochs √ó 3 variants)

---

## Part 1: Setup & Data Preparation

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tqdm.auto import tqdm
import copy
from datetime import datetime

# Install timm
try:
    import timm
    TIMM_AVAILABLE = True
    print("‚úÖ timm library available")
except ImportError:
    print("‚è≥ Installing timm...")
    import subprocess, sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "timm", "-q"])
    import timm
    TIMM_AVAILABLE = True
    print("‚úÖ timm installed")

sns.set_style('whitegrid')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

print("‚úì Setup complete")

‚úÖ timm library available
Device: cpu
‚úì Setup complete


In [24]:
# ================================================================
# CONFIGURATION: 3 EFFICIENTNET TUNING EXPERIMENTS
# ================================================================

# üêõ DEBUG MODE
DEBUG_MODE = False  # Set to False for full training

BATCH_SIZE = 16
TARGET_COLS = ['Dry_Green_g', 'Dry_Dead_g', 'Dry_Clover_g', 'GDM_g', 'Dry_Total_g']
COMPETITION_WEIGHTS = torch.tensor([0.1, 0.1, 0.1, 0.2, 0.5])

if DEBUG_MODE:
    print("\n" + "üêõ"*40)
    print("DEBUG MODE: 2 epochs/phase for testing")
    print("üêõ"*40 + "\n")

def get_epochs(full):
    return 2 if DEBUG_MODE else full

# Three EfficientNet variants
VARIATIONS = {
    'EfficientNet_HigherLR': {
        'phase1_epochs': get_epochs(15),
        'phase2_epochs': get_epochs(40),
        'phase1_lr': 3e-4,
        'phase2_lr': 1e-3,  # 3.3x higher
        'phase2_backbone_lr': 5e-5,  # 5x higher
        'weight_decay': 5e-5,  # Less regularization
        'hidden_dim': 256,
        'dropout': 0.15,  # Less dropout
        'backbone': 'efficientnet_b0',
        'use_scheduler': False,
        'use_full_norm': False,
        'use_batch_norm': False,
        'requires_timm': True,
        'description': 'EfficientNet with 3.3x higher LR',
        'expected_kaggle': '0.52-0.54'
    },
    'EfficientNet_VeryHighLR': {
        'phase1_epochs': get_epochs(15),
        'phase2_epochs': get_epochs(40),
        'phase1_lr': 3e-4,
        'phase2_lr': 2e-3,  # 6.7x higher!
        'phase2_backbone_lr': 1e-4,  # 10x higher!
        'weight_decay': 5e-5,
        'hidden_dim': 256,
        'dropout': 0.15,
        'backbone': 'efficientnet_b0',
        'use_scheduler': False,
        'use_full_norm': False,
        'use_batch_norm': False,
        'requires_timm': True,
        'description': 'EfficientNet with aggressive 6.7x LR',
        'expected_kaggle': '0.51-0.53'
    },
    'EfficientNet_WithBatchNorm': {
        'phase1_epochs': get_epochs(15),
        'phase2_epochs': get_epochs(40),
        'phase1_lr': 3e-4,
        'phase2_lr': 1e-3,
        'phase2_backbone_lr': 5e-5,
        'weight_decay': 5e-5,
        'hidden_dim': 256,
        'dropout': 0.15,
        'backbone': 'efficientnet_b0',
        'use_scheduler': False,
        'use_full_norm': False,
        'use_batch_norm': True,  # Added BatchNorm
        'requires_timm': True,
        'description': 'EfficientNet with BatchNorm in head',
        'expected_kaggle': '0.52-0.54'
    }
}

print("="*80)
print("EFFICIENTNET TUNING EXPERIMENTS")
print("="*80)
print(f"\nExperiments ({len(VARIATIONS)}):")
for name, cfg in VARIATIONS.items():
    ep = f"P1:{cfg['phase1_epochs']}ep P2:{cfg['phase2_epochs']}ep"
    lr = f"head_lr={cfg['phase2_lr']:.0e} backbone_lr={cfg['phase2_backbone_lr']:.0e}"
    print(f"  {name}: {ep}")
    print(f"     {lr}, Expected Kaggle: {cfg['expected_kaggle']}")
print(f"\n‚è±Ô∏è {'DEBUG: ~15 min' if DEBUG_MODE else 'FULL: ~1.5 hours'}")
print("="*80)

EFFICIENTNET TUNING EXPERIMENTS

Experiments (3):
  EfficientNet_HigherLR: P1:15ep P2:40ep
     head_lr=1e-03 backbone_lr=5e-05, Expected Kaggle: 0.52-0.54
  EfficientNet_VeryHighLR: P1:15ep P2:40ep
     head_lr=2e-03 backbone_lr=1e-04, Expected Kaggle: 0.51-0.53
  EfficientNet_WithBatchNorm: P1:15ep P2:40ep
     head_lr=1e-03 backbone_lr=5e-05, Expected Kaggle: 0.52-0.54

‚è±Ô∏è FULL: ~1.5 hours


In [25]:
# Load data
train_enriched = pd.read_csv('../../competition/train_enriched.csv')
train_enriched['Sampling_Date'] = pd.to_datetime(train_enriched['Sampling_Date'])
train_enriched['full_image_path'] = train_enriched['image_path'].apply(lambda x: f'../../competition/{x}')

print(f"Full dataset: {len(train_enriched)} images")

# Split
train_data, val_data = train_test_split(train_enriched, test_size=0.2, random_state=42)

# Calculate normalization stats (split-based, since use_full_norm=False for all)
target_means = torch.tensor([train_data[col].mean() for col in TARGET_COLS], dtype=torch.float32)
target_stds = torch.tensor([train_data[col].std() for col in TARGET_COLS], dtype=torch.float32)

print("\nNormalization (train split, 285 images):")
for i, col in enumerate(TARGET_COLS):
    print(f"  {col:15s}: mean={target_means[i]:.2f}g, std={target_stds[i]:.2f}g")

print(f"\n‚úì Split: {len(train_data)} train, {len(val_data)} val")

Full dataset: 357 images

Normalization (train split, 285 images):
  Dry_Green_g    : mean=27.49g, std=26.19g
  Dry_Dead_g     : mean=12.01g, std=12.50g
  Dry_Clover_g   : mean=6.26g, std=11.75g
  GDM_g          : mean=33.74g, std=25.62g
  Dry_Total_g    : mean=45.75g, std=28.86g

‚úì Split: 285 train, 72 val


In [26]:
# Prepare tabular features
weather_features = [
    'rainfall_7d', 'rainfall_30d',
    'temp_max_7d', 'temp_min_7d', 'temp_mean_7d', 'temp_mean_30d', 'temp_range_7d',
    'et0_7d', 'et0_30d',
    'water_balance_7d', 'water_balance_30d',
    'days_since_rain', 'daylength', 'season'
]

continuous_features = weather_features + ['Pre_GSHH_NDVI', 'Height_Ave_cm']
scaler = StandardScaler()
train_data[continuous_features] = scaler.fit_transform(train_data[continuous_features])
val_data[continuous_features] = scaler.transform(val_data[continuous_features])

le_state = LabelEncoder()
le_species = LabelEncoder()
train_data['State_encoded'] = le_state.fit_transform(train_data['State'])
train_data['Species_encoded'] = le_species.fit_transform(train_data['Species'])
val_data['State_encoded'] = le_state.transform(val_data['State'])
val_data['Species_encoded'] = le_species.transform(val_data['Species'])

num_states = len(le_state.classes_)
num_species = len(le_species.classes_)

print(f"‚úì Tabular features prepared")
print(f"  States: {num_states}, Species: {num_species}")

‚úì Tabular features prepared
  States: 4, Species: 15


### Dataset Classes

In [27]:
class NormalizedDataset(Dataset):
    """Image-only dataset with normalized targets."""
    def __init__(self, dataframe, target_means, target_stds, augment=False):
        self.df = dataframe.reset_index(drop=True)
        self.target_means = target_means
        self.target_stds = target_stds
        
        transform_list = [transforms.Resize((224, 224))]
        
        if augment:
            transform_list.extend([
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(10),
            ])
        
        transform_list.extend([
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
        
        self.transform = transforms.Compose(transform_list)
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img = Image.open(row['full_image_path']).convert('RGB')
        img = self.transform(img)
        
        targets = torch.tensor(row[TARGET_COLS].values.astype('float32'), dtype=torch.float32)
        targets_normalized = (targets - self.target_means) / self.target_stds
        
        return {
            'image': img,
            'targets': targets_normalized,
            'targets_original': targets
        }

class MultimodalDataset(Dataset):
    """Multimodal dataset for auxiliary pretraining."""
    def __init__(self, dataframe, target_means, target_stds, augment=False):
        self.df = dataframe.reset_index(drop=True)
        self.target_means = target_means
        self.target_stds = target_stds
        
        transform_list = [transforms.Resize((224, 224))]
        
        if augment:
            transform_list.extend([
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(10),
            ])
        
        transform_list.extend([
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
        
        self.transform = transforms.Compose(transform_list)
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        
        # Image
        img = Image.open(row['full_image_path']).convert('RGB')
        img = self.transform(img)
        
        # Tabular features
        weather = torch.tensor(row[weather_features].values.astype('float32'), dtype=torch.float32)
        ndvi = torch.tensor(row['Pre_GSHH_NDVI'], dtype=torch.float32)
        height = torch.tensor(row['Height_Ave_cm'], dtype=torch.float32)
        state = torch.tensor(row['State_encoded'], dtype=torch.long)
        species = torch.tensor(row['Species_encoded'], dtype=torch.long)
        
        # Targets
        targets = torch.tensor(row[TARGET_COLS].values.astype('float32'), dtype=torch.float32)
        targets_normalized = (targets - self.target_means) / self.target_stds
        
        return {
            'image': img,
            'weather': weather,
            'ndvi': ndvi,
            'height': height,
            'state': state,
            'species': species,
            'targets': targets_normalized,
            'targets_original': targets
        }

print("‚úì Dataset classes defined")
print("‚úì Datasets will be created for each experiment with appropriate normalization")

‚úì Dataset classes defined
‚úì Datasets will be created for each experiment with appropriate normalization


### Model Architecture

In [28]:
def get_backbone_and_features(backbone_name):
    """FIXED timm support."""
    if backbone_name == 'resnet18':
        model = models.resnet18(pretrained=True)
        backbone = nn.Sequential(*list(model.children())[:-1])
        feature_dim = 512
    elif backbone_name == 'densenet121':
        model = models.densenet121(pretrained=True)
        backbone = nn.Sequential(model.features, nn.ReLU(inplace=True), nn.AdaptiveAvgPool2d((1, 1)))
        feature_dim = 1024
    elif backbone_name in ['efficientnet_b0', 'mobilenetv3_large_100']:
        if not TIMM_AVAILABLE:
            raise ImportError(f"timm required for {backbone_name}")
        import timm

        class TimmBackbone(nn.Module):
            def __init__(self, model):
                super().__init__()
                self.model = model
            def forward(self, x):
                features = self.model.forward_features(x)
                if len(features.shape) == 4:
                    features = features.mean([2, 3])
                return features.unsqueeze(-1).unsqueeze(-1)

        model = timm.create_model(backbone_name, pretrained=True)
        backbone = TimmBackbone(model)
        feature_dim = 1280 if backbone_name == 'efficientnet_b0' else 960
    else:
        raise ValueError(f"Unknown: {backbone_name}")
    return backbone, feature_dim

class AuxiliaryPretrainedModel(nn.Module):
    def __init__(self, backbone_name='resnet18', num_outputs=5, hidden_dim=256,
                 dropout=0.2, num_states=4, num_species=15, use_batch_norm=False):
        super().__init__()
        self.backbone, feature_dim = get_backbone_and_features(backbone_name)
        self.backbone_name = backbone_name
        self.feature_dim = feature_dim
        
        # Auxiliary heads
        self.ndvi_head = nn.Linear(feature_dim, 1)
        self.height_head = nn.Linear(feature_dim, 1)
        self.weather_head = nn.Linear(feature_dim, 14)
        self.state_head = nn.Linear(feature_dim, num_states)
        self.species_head = nn.Linear(feature_dim, num_species)
        
        # Biomass head with optional BatchNorm
        if use_batch_norm:
            self.biomass_head = nn.Sequential(
                nn.Linear(feature_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU(),
                nn.Dropout(dropout),
                nn.Linear(hidden_dim, num_outputs)
            )
        else:
            self.biomass_head = nn.Sequential(
                nn.Linear(feature_dim, hidden_dim),
                nn.ReLU(),
                nn.Dropout(dropout),
                nn.Linear(hidden_dim, num_outputs)
            )
    
    def forward(self, x, mode='biomass'):
        features = self.backbone(x).flatten(1)
        if mode == 'auxiliary':
            return {
                'ndvi': self.ndvi_head(features),
                'height': self.height_head(features),
                'weather': self.weather_head(features),
                'state': self.state_head(features),
                'species': self.species_head(features)
            }
        else:
            return self.biomass_head(features)

print("‚úì Model with BatchNorm support")

‚úì Model with BatchNorm support


### Utility Functions

In [29]:
def calculate_competition_r2(predictions, targets):
    """Calculate competition R¬≤ (weighted)."""
    per_target_r2 = []
    competition_r2 = 0
    
    weights = COMPETITION_WEIGHTS.numpy()
    
    for i in range(5):
        r2 = r2_score(targets[:, i], predictions[:, i])
        per_target_r2.append(r2)
        competition_r2 += weights[i] * r2
    
    return competition_r2, per_target_r2

def auxiliary_loss(predictions, targets):
    """Multi-task loss for predicting tabular features."""
    loss_ndvi = F.mse_loss(predictions['ndvi'].squeeze(), targets['ndvi'])
    loss_height = F.mse_loss(predictions['height'].squeeze(), targets['height'])
    loss_weather = F.mse_loss(predictions['weather'], targets['weather'])
    loss_state = F.cross_entropy(predictions['state'], targets['state'])
    loss_species = F.cross_entropy(predictions['species'], targets['species'])
    
    # Weighted combination
    total_loss = (
        0.2 * loss_ndvi + 
        0.2 * loss_height + 
        0.3 * loss_weather + 
        0.15 * loss_state + 
        0.15 * loss_species
    )
    
    return total_loss, {
        'ndvi': loss_ndvi.item(),
        'height': loss_height.item(),
        'weather': loss_weather.item(),
        'state': loss_state.item(),
        'species': loss_species.item()
    }

print("‚úì Utility functions defined")

‚úì Utility functions defined


---
## Part 2: Training Functions

In [30]:
def train_phase1(model, train_loader, val_loader, num_epochs, lr, variation_name):
    """Phase 1: Train to predict tabular features from images."""
    print("\n" + "="*80)
    print(f"PHASE 1: AUXILIARY PRETRAINING - {variation_name}")
    print("="*80 + "\n")
    
    model = model.to(device)
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    
    history = {
        'train_loss': [], 
        'val_loss': [], 
        'val_ndvi_mae': [], 
        'val_state_acc': [],
        'val_species_acc': []
    }
    
    best_val_loss = float('inf')
    
    for epoch in range(num_epochs):
        # Train
        model.train()
        train_loss = 0
        
        for batch in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}', leave=False):
            images = batch['image'].to(device)
            targets = {
                'ndvi': batch['ndvi'].to(device),
                'height': batch['height'].to(device),
                'weather': batch['weather'].to(device),
                'state': batch['state'].to(device),
                'species': batch['species'].to(device)
            }
            
            optimizer.zero_grad()
            predictions = model(images, mode='auxiliary')
            loss, _ = auxiliary_loss(predictions, targets)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0)
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
        
        train_loss /= len(train_loader.dataset)
        
        # Validate
        model.eval()
        val_loss = 0
        ndvi_errors = []
        state_correct = 0
        state_total = 0
        species_correct = 0
        species_total = 0
        
        with torch.no_grad():
            for batch in val_loader:
                images = batch['image'].to(device)
                targets = {
                    'ndvi': batch['ndvi'].to(device),
                    'height': batch['height'].to(device),
                    'weather': batch['weather'].to(device),
                    'state': batch['state'].to(device),
                    'species': batch['species'].to(device)
                }
                
                predictions = model(images, mode='auxiliary')
                loss, _ = auxiliary_loss(predictions, targets)
                val_loss += loss.item() * images.size(0)
                
                # NDVI MAE
                ndvi_errors.extend((predictions['ndvi'].squeeze().cpu() - targets['ndvi'].cpu()).abs().tolist())
                
                # State accuracy
                state_pred = predictions['state'].argmax(dim=1)
                state_correct += (state_pred == targets['state']).sum().item()
                state_total += targets['state'].size(0)
                
                # Species accuracy
                species_pred = predictions['species'].argmax(dim=1)
                species_correct += (species_pred == targets['species']).sum().item()
                species_total += targets['species'].size(0)
        
        val_loss /= len(val_loader.dataset)
        ndvi_mae = np.mean(ndvi_errors)
        state_acc = state_correct / state_total
        species_acc = species_correct / species_total
        
        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)
        history['val_ndvi_mae'].append(ndvi_mae)
        history['val_state_acc'].append(state_acc)
        history['val_species_acc'].append(species_acc)
        
        # Print progress
        print(f"Epoch {epoch+1:2d}/{num_epochs}: "
              f"Train Loss={train_loss:.4f}, Val Loss={val_loss:.4f}, "
              f"NDVI MAE={ndvi_mae:.4f}, State Acc={state_acc:.2%}, Species Acc={species_acc:.2%}")
        
        # Save best checkpoint
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), f'model4b_{variation_name}_phase1_best.pth')
            print(f"  üíæ Saved Phase 1 checkpoint (val_loss={val_loss:.4f})")
    
    print(f"\n‚úì Phase 1 complete! Model learned to predict tabular features from images.")
    print(f"  Final State Accuracy: {state_acc:.2%} (model can 'see' location!)")
    print(f"  Final Species Accuracy: {species_acc:.2%}")
    print(f"  Final NDVI MAE: {ndvi_mae:.4f}\n")
    
    return history

print("‚úì train_phase1() defined")

‚úì train_phase1() defined


In [31]:
def train_phase2(model, train_loader, val_loader, config, target_means, target_stds, variation_name):
    """Phase 2: Fine-tune for biomass prediction with flexible config."""
    print("\n" + "="*80)
    print(f"PHASE 2: BIOMASS FINE-TUNING - {variation_name}")
    print("="*80 + "\n")
    
    num_epochs = config['phase2_epochs']
    lr_head = config['phase2_lr']
    lr_backbone = config['phase2_backbone_lr']
    weight_decay = config['weight_decay']
    use_scheduler = config.get('use_scheduler', False)
    
    print(f"Config: {num_epochs} epochs, head_lr={lr_head:.2e}, backbone_lr={lr_backbone:.2e}, ")
    print(f"        weight_decay={weight_decay:.2e}, scheduler={use_scheduler}")
    
    model = model.to(device)
    
    # Two learning rates: low for pretrained backbone, higher for new head
    optimizer = torch.optim.AdamW([
        {'params': model.backbone.parameters(), 'lr': lr_backbone},
        {'params': model.biomass_head.parameters(), 'lr': lr_head}
    ], weight_decay=weight_decay)
    
    # Optional LR scheduler
    scheduler = None
    if use_scheduler:
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
            optimizer, mode='max', factor=config['scheduler_factor'], 
            patience=config['scheduler_patience'], verbose=True
        )
        print(f"‚úì LR scheduler enabled (factor={config['scheduler_factor']}, patience={config['scheduler_patience']})")
    
    criterion = nn.MSELoss()
    history = {'train_loss': [], 'val_loss': [], 'val_r2': [], 'per_target_r2': [], 'epoch': []}
    best_r2 = -float('inf')
    
    for epoch in range(num_epochs):
        # Train
        model.train()
        train_loss = 0
        
        for batch in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}', leave=False):
            images = batch['image'].to(device)
            targets = batch['targets'].to(device)
            
            optimizer.zero_grad()
            outputs = model(images, mode='biomass')
            loss = criterion(outputs, targets)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0)
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
        
        train_loss /= len(train_loader.dataset)
        
        # Validate
        model.eval()
        val_loss = 0
        all_preds = []
        all_targets = []
        
        with torch.no_grad():
            for batch in val_loader:
                images = batch['image'].to(device)
                targets = batch['targets'].to(device)
                targets_original = batch['targets_original']
                
                outputs = model(images, mode='biomass')
                loss = criterion(outputs, targets)
                val_loss += loss.item() * images.size(0)
                
                # Denormalize for R¬≤ calculation
                outputs_denorm = outputs.cpu() * target_stds + target_means
                all_preds.append(outputs_denorm.numpy())
                all_targets.append(targets_original.numpy())
        
        val_loss /= len(val_loader.dataset)
        
        # Calculate R¬≤
        all_preds = np.vstack(all_preds)
        all_targets = np.vstack(all_targets)
        val_r2, per_target_r2 = calculate_competition_r2(all_preds, all_targets)
        
        # Step scheduler if enabled
        if scheduler is not None:
            scheduler.step(val_r2)
        
        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)
        history['val_r2'].append(val_r2)
        history['per_target_r2'].append(per_target_r2)
        history['epoch'].append(epoch + 1)
        
        # Print progress
        print(f"Epoch {epoch+1:2d}/{num_epochs}: "
              f"Train Loss={train_loss:.4f}, Val Loss={val_loss:.4f}, Val R¬≤={val_r2:+.4f}")
        
        # Save best checkpoint
        if val_r2 > best_r2:
            best_r2 = val_r2
            torch.save(model.state_dict(), f'model4b_{variation_name}_phase2_best.pth')
            print(f"  üíæ New best R¬≤={best_r2:+.4f} - checkpoint saved")
    
    print(f"\n‚úì Phase 2 complete! Best R¬≤={best_r2:+.4f}")
    print(f"\nPer-target R¬≤ scores (at best epoch):")
    best_epoch_idx = np.argmax(history['val_r2'])
    best_per_target = history['per_target_r2'][best_epoch_idx]
    for i, col in enumerate(TARGET_COLS):
        print(f"  {col:15s}: R¬≤={best_per_target[i]:+.4f}")
    print()
    
    return history, best_r2

print("‚úì train_phase2() defined (with LR scheduling support)")

‚úì train_phase2() defined (with LR scheduling support)


---
## Part 3: Train All EfficientNet Variants

Training 3 EfficientNet variants with different hyperparameters.

In [32]:
# Storage for results
new_results = {}

print("="*80)
print("TRAINING EFFICIENTNET VARIANTS")
print("="*80)
print(f"\nStarting: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

if DEBUG_MODE:
    print("\nüêõ DEBUG: 2 epochs/phase (~5 min/variant = ~15 min total)")
else:
    print("\nüöÄ FULL: 40 epochs phase2 (~30 min/variant = ~1.5 hours)")

print(f"\nWill train {len(VARIATIONS)} variants")
print()

TRAINING EFFICIENTNET VARIANTS

Starting: 2025-11-06 11:44:39

üöÄ FULL: 40 epochs phase2 (~30 min/variant = ~1.5 hours)

Will train 3 variants



In [33]:
# Train each variant
for var_name, config in VARIATIONS.items():
    print("\n\n")
    print("#" * 80)
    print(f"# TRAINING: {var_name}")
    print(f"# {config['description']}")
    print(f"# Expected Kaggle: {config['expected_kaggle']}")
    print("#" * 80)

    # Create datasets
    train_dataset_simple = NormalizedDataset(train_data, target_means, target_stds, augment=True)
    train_dataset_multi = MultimodalDataset(train_data, target_means, target_stds, augment=True)
    val_dataset_simple = NormalizedDataset(val_data, target_means, target_stds, augment=False)
    val_dataset_multi = MultimodalDataset(val_data, target_means, target_stds, augment=False)

    # Create model with BatchNorm support
    model = AuxiliaryPretrainedModel(
        backbone_name=config['backbone'],
        num_outputs=5,
        hidden_dim=config['hidden_dim'],
        dropout=config['dropout'],
        num_states=num_states,
        num_species=num_species,
        use_batch_norm=config.get('use_batch_norm', False)
    )

    print(f"‚úì Model: {config['backbone']}, {model.feature_dim} features")
    if config.get('use_batch_norm'):
        print(f"  + BatchNorm enabled in head")

    # Create dataloaders
    train_loader_multi = DataLoader(train_dataset_multi, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
    val_loader_multi = DataLoader(val_dataset_multi, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)
    train_loader_simple = DataLoader(train_dataset_simple, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
    val_loader_simple = DataLoader(val_dataset_simple, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)

    # Phase 1
    history_p1 = train_phase1(
        model=model,
        train_loader=train_loader_multi,
        val_loader=val_loader_multi,
        num_epochs=config['phase1_epochs'],
        lr=config['phase1_lr'],
        variation_name=var_name
    )

    # Phase 2
    history_p2, best_r2 = train_phase2(
        model=model,
        train_loader=train_loader_simple,
        val_loader=val_loader_simple,
        config=config,
        target_means=target_means,
        target_stds=target_stds,
        variation_name=var_name
    )

    # Store results
    new_results[var_name] = {
        'config': config,
        'history_phase1': history_p1,
        'history_phase2': history_p2,
        'best_r2': best_r2,
        'model_checkpoint': f'model4b_{var_name}_phase2_best.pth'
    }

    print(f"\n‚úÖ {var_name} complete! R¬≤={best_r2:+.4f}")
    print(f"   Checkpoint: {new_results[var_name]['model_checkpoint']}")
    print(f"   Expected Kaggle: {config['expected_kaggle']}")




################################################################################
# TRAINING: EfficientNet_HigherLR
# EfficientNet with 3.3x higher LR
# Expected Kaggle: 0.52-0.54
################################################################################
‚úì Model: efficientnet_b0, 1280 features

PHASE 1: AUXILIARY PRETRAINING - EfficientNet_HigherLR



Epoch 1/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/15: Train Loss=1.1298, Val Loss=1.0625, NDVI MAE=0.4511, State Acc=47.22%, Species Acc=36.11%
  üíæ Saved Phase 1 checkpoint (val_loss=1.0625)


Epoch 2/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/15: Train Loss=0.7741, Val Loss=0.7503, NDVI MAE=0.4046, State Acc=72.22%, Species Acc=45.83%
  üíæ Saved Phase 1 checkpoint (val_loss=0.7503)


Epoch 3/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/15: Train Loss=0.5587, Val Loss=0.5872, NDVI MAE=0.3542, State Acc=80.56%, Species Acc=58.33%
  üíæ Saved Phase 1 checkpoint (val_loss=0.5872)


Epoch 4/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/15: Train Loss=0.4957, Val Loss=0.5900, NDVI MAE=0.3904, State Acc=84.72%, Species Acc=62.50%


Epoch 5/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/15: Train Loss=0.4404, Val Loss=0.4960, NDVI MAE=0.3707, State Acc=90.28%, Species Acc=70.83%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4960)


Epoch 6/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/15: Train Loss=0.3439, Val Loss=0.4151, NDVI MAE=0.3418, State Acc=88.89%, Species Acc=69.44%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4151)


Epoch 7/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/15: Train Loss=0.2965, Val Loss=0.3705, NDVI MAE=0.3431, State Acc=90.28%, Species Acc=76.39%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3705)


Epoch 8/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/15: Train Loss=0.2469, Val Loss=0.3714, NDVI MAE=0.3189, State Acc=91.67%, Species Acc=72.22%


Epoch 9/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/15: Train Loss=0.2194, Val Loss=0.3554, NDVI MAE=0.3172, State Acc=91.67%, Species Acc=70.83%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3554)


Epoch 10/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/15: Train Loss=0.2677, Val Loss=0.3321, NDVI MAE=0.3444, State Acc=91.67%, Species Acc=73.61%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3321)


Epoch 11/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/15: Train Loss=0.2169, Val Loss=0.4001, NDVI MAE=0.3424, State Acc=90.28%, Species Acc=69.44%


Epoch 12/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/15: Train Loss=0.1926, Val Loss=0.3448, NDVI MAE=0.3607, State Acc=93.06%, Species Acc=81.94%


Epoch 13/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/15: Train Loss=0.1874, Val Loss=0.3037, NDVI MAE=0.3316, State Acc=93.06%, Species Acc=79.17%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3037)


Epoch 14/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/15: Train Loss=0.1582, Val Loss=0.3388, NDVI MAE=0.3440, State Acc=91.67%, Species Acc=75.00%


Epoch 15/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/15: Train Loss=0.1416, Val Loss=0.3063, NDVI MAE=0.3192, State Acc=93.06%, Species Acc=75.00%

‚úì Phase 1 complete! Model learned to predict tabular features from images.
  Final State Accuracy: 93.06% (model can 'see' location!)
  Final Species Accuracy: 75.00%
  Final NDVI MAE: 0.3192


PHASE 2: BIOMASS FINE-TUNING - EfficientNet_HigherLR

Config: 40 epochs, head_lr=1.00e-03, backbone_lr=5.00e-05, 
        weight_decay=5.00e-05, scheduler=False


Epoch 1/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/40: Train Loss=0.6194, Val Loss=0.5017, Val R¬≤=+0.4395
  üíæ New best R¬≤=+0.4395 - checkpoint saved


Epoch 2/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/40: Train Loss=0.3568, Val Loss=0.4461, Val R¬≤=+0.4794
  üíæ New best R¬≤=+0.4794 - checkpoint saved


Epoch 3/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/40: Train Loss=0.2863, Val Loss=0.4015, Val R¬≤=+0.5563
  üíæ New best R¬≤=+0.5563 - checkpoint saved


Epoch 4/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/40: Train Loss=0.2915, Val Loss=0.3902, Val R¬≤=+0.5678
  üíæ New best R¬≤=+0.5678 - checkpoint saved


Epoch 5/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/40: Train Loss=0.2649, Val Loss=0.4480, Val R¬≤=+0.4605


Epoch 6/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/40: Train Loss=0.2543, Val Loss=0.4097, Val R¬≤=+0.5490


Epoch 7/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/40: Train Loss=0.2143, Val Loss=0.3880, Val R¬≤=+0.5641


Epoch 8/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/40: Train Loss=0.2072, Val Loss=0.4582, Val R¬≤=+0.5003


Epoch 9/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/40: Train Loss=0.1737, Val Loss=0.4333, Val R¬≤=+0.5196


Epoch 10/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/40: Train Loss=0.1860, Val Loss=0.4528, Val R¬≤=+0.5038


Epoch 11/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/40: Train Loss=0.1556, Val Loss=0.3868, Val R¬≤=+0.5784
  üíæ New best R¬≤=+0.5784 - checkpoint saved


Epoch 12/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/40: Train Loss=0.1547, Val Loss=0.4029, Val R¬≤=+0.5605


Epoch 13/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/40: Train Loss=0.1394, Val Loss=0.4000, Val R¬≤=+0.5915
  üíæ New best R¬≤=+0.5915 - checkpoint saved


Epoch 14/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/40: Train Loss=0.1341, Val Loss=0.4343, Val R¬≤=+0.5443


Epoch 15/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/40: Train Loss=0.1283, Val Loss=0.4259, Val R¬≤=+0.5432


Epoch 16/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 16/40: Train Loss=0.1169, Val Loss=0.3867, Val R¬≤=+0.5988
  üíæ New best R¬≤=+0.5988 - checkpoint saved


Epoch 17/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 17/40: Train Loss=0.1045, Val Loss=0.4454, Val R¬≤=+0.5230


Epoch 18/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 18/40: Train Loss=0.1102, Val Loss=0.3939, Val R¬≤=+0.5596


Epoch 19/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 19/40: Train Loss=0.1019, Val Loss=0.3796, Val R¬≤=+0.5961


Epoch 20/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 20/40: Train Loss=0.0986, Val Loss=0.4139, Val R¬≤=+0.5348


Epoch 21/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 21/40: Train Loss=0.0967, Val Loss=0.4218, Val R¬≤=+0.5287


Epoch 22/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 22/40: Train Loss=0.0895, Val Loss=0.3991, Val R¬≤=+0.5741


Epoch 23/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 23/40: Train Loss=0.0855, Val Loss=0.4185, Val R¬≤=+0.5339


Epoch 24/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 24/40: Train Loss=0.0911, Val Loss=0.4114, Val R¬≤=+0.5493


Epoch 25/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 25/40: Train Loss=0.0809, Val Loss=0.3879, Val R¬≤=+0.5723


Epoch 26/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 26/40: Train Loss=0.1180, Val Loss=0.3761, Val R¬≤=+0.5962


Epoch 27/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 27/40: Train Loss=0.0841, Val Loss=0.4216, Val R¬≤=+0.5314


Epoch 28/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 28/40: Train Loss=0.0997, Val Loss=0.3922, Val R¬≤=+0.5692


Epoch 29/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 29/40: Train Loss=0.0867, Val Loss=0.3831, Val R¬≤=+0.5791


Epoch 30/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 30/40: Train Loss=0.1041, Val Loss=0.4554, Val R¬≤=+0.4828


Epoch 31/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 31/40: Train Loss=0.0766, Val Loss=0.4041, Val R¬≤=+0.5750


Epoch 32/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 32/40: Train Loss=0.0848, Val Loss=0.4081, Val R¬≤=+0.5762


Epoch 33/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 33/40: Train Loss=0.0688, Val Loss=0.4040, Val R¬≤=+0.5713


Epoch 34/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 34/40: Train Loss=0.0749, Val Loss=0.3937, Val R¬≤=+0.5884


Epoch 35/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 35/40: Train Loss=0.0679, Val Loss=0.4259, Val R¬≤=+0.5575


Epoch 36/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 36/40: Train Loss=0.0719, Val Loss=0.3767, Val R¬≤=+0.5965


Epoch 37/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 37/40: Train Loss=0.0801, Val Loss=0.3978, Val R¬≤=+0.5873


Epoch 38/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 38/40: Train Loss=0.0729, Val Loss=0.4024, Val R¬≤=+0.5705


Epoch 39/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 39/40: Train Loss=0.0825, Val Loss=0.4298, Val R¬≤=+0.5203


Epoch 40/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 40/40: Train Loss=0.0691, Val Loss=0.3934, Val R¬≤=+0.5800

‚úì Phase 2 complete! Best R¬≤=+0.5988

Per-target R¬≤ scores (at best epoch):
  Dry_Green_g    : R¬≤=+0.7716
  Dry_Dead_g     : R¬≤=+0.2665
  Dry_Clover_g   : R¬≤=+0.5074
  GDM_g          : R¬≤=+0.7702
  Dry_Total_g    : R¬≤=+0.5805


‚úÖ EfficientNet_HigherLR complete! R¬≤=+0.5988
   Checkpoint: model4b_EfficientNet_HigherLR_phase2_best.pth
   Expected Kaggle: 0.52-0.54



################################################################################
# TRAINING: EfficientNet_VeryHighLR
# EfficientNet with aggressive 6.7x LR
# Expected Kaggle: 0.51-0.53
################################################################################
‚úì Model: efficientnet_b0, 1280 features

PHASE 1: AUXILIARY PRETRAINING - EfficientNet_VeryHighLR



Epoch 1/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/15: Train Loss=1.1430, Val Loss=1.0787, NDVI MAE=0.4383, State Acc=55.56%, Species Acc=38.89%
  üíæ Saved Phase 1 checkpoint (val_loss=1.0787)


Epoch 2/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/15: Train Loss=0.7941, Val Loss=0.7237, NDVI MAE=0.3570, State Acc=68.06%, Species Acc=59.72%
  üíæ Saved Phase 1 checkpoint (val_loss=0.7237)


Epoch 3/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/15: Train Loss=0.5763, Val Loss=0.5721, NDVI MAE=0.3460, State Acc=80.56%, Species Acc=61.11%
  üíæ Saved Phase 1 checkpoint (val_loss=0.5721)


Epoch 4/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/15: Train Loss=0.4516, Val Loss=0.5471, NDVI MAE=0.3236, State Acc=77.78%, Species Acc=69.44%
  üíæ Saved Phase 1 checkpoint (val_loss=0.5471)


Epoch 5/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/15: Train Loss=0.3965, Val Loss=0.4476, NDVI MAE=0.3122, State Acc=90.28%, Species Acc=76.39%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4476)


Epoch 6/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/15: Train Loss=0.3372, Val Loss=0.4399, NDVI MAE=0.3323, State Acc=87.50%, Species Acc=73.61%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4399)


Epoch 7/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/15: Train Loss=0.3168, Val Loss=0.3793, NDVI MAE=0.3916, State Acc=91.67%, Species Acc=80.56%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3793)


Epoch 8/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/15: Train Loss=0.2723, Val Loss=0.3901, NDVI MAE=0.3483, State Acc=88.89%, Species Acc=76.39%


Epoch 9/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/15: Train Loss=0.2272, Val Loss=0.3468, NDVI MAE=0.3808, State Acc=90.28%, Species Acc=80.56%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3468)


Epoch 10/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/15: Train Loss=0.2146, Val Loss=0.3508, NDVI MAE=0.3527, State Acc=87.50%, Species Acc=77.78%


Epoch 11/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/15: Train Loss=0.1918, Val Loss=0.3514, NDVI MAE=0.3073, State Acc=91.67%, Species Acc=83.33%


Epoch 12/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/15: Train Loss=0.1840, Val Loss=0.3166, NDVI MAE=0.3375, State Acc=93.06%, Species Acc=83.33%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3166)


Epoch 13/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/15: Train Loss=0.1830, Val Loss=0.3299, NDVI MAE=0.3521, State Acc=90.28%, Species Acc=79.17%


Epoch 14/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/15: Train Loss=0.1973, Val Loss=0.3171, NDVI MAE=0.3381, State Acc=93.06%, Species Acc=84.72%


Epoch 15/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/15: Train Loss=0.1449, Val Loss=0.3103, NDVI MAE=0.3351, State Acc=93.06%, Species Acc=81.94%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3103)

‚úì Phase 1 complete! Model learned to predict tabular features from images.
  Final State Accuracy: 93.06% (model can 'see' location!)
  Final Species Accuracy: 81.94%
  Final NDVI MAE: 0.3351


PHASE 2: BIOMASS FINE-TUNING - EfficientNet_VeryHighLR

Config: 40 epochs, head_lr=2.00e-03, backbone_lr=1.00e-04, 
        weight_decay=5.00e-05, scheduler=False


Epoch 1/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/40: Train Loss=0.6627, Val Loss=0.5917, Val R¬≤=+0.3211
  üíæ New best R¬≤=+0.3211 - checkpoint saved


Epoch 2/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/40: Train Loss=0.4182, Val Loss=0.4688, Val R¬≤=+0.4676
  üíæ New best R¬≤=+0.4676 - checkpoint saved


Epoch 3/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/40: Train Loss=0.3161, Val Loss=0.4158, Val R¬≤=+0.5319
  üíæ New best R¬≤=+0.5319 - checkpoint saved


Epoch 4/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/40: Train Loss=0.2838, Val Loss=0.4976, Val R¬≤=+0.4217


Epoch 5/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/40: Train Loss=0.2588, Val Loss=0.4477, Val R¬≤=+0.4754


Epoch 6/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/40: Train Loss=0.2577, Val Loss=0.4380, Val R¬≤=+0.4890


Epoch 7/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/40: Train Loss=0.1944, Val Loss=0.4558, Val R¬≤=+0.4655


Epoch 8/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/40: Train Loss=0.1746, Val Loss=0.4384, Val R¬≤=+0.4609


Epoch 9/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/40: Train Loss=0.1834, Val Loss=0.3973, Val R¬≤=+0.5455
  üíæ New best R¬≤=+0.5455 - checkpoint saved


Epoch 10/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/40: Train Loss=0.1661, Val Loss=0.3990, Val R¬≤=+0.5108


Epoch 11/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/40: Train Loss=0.1638, Val Loss=0.4404, Val R¬≤=+0.4853


Epoch 12/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/40: Train Loss=0.1377, Val Loss=0.3635, Val R¬≤=+0.6022
  üíæ New best R¬≤=+0.6022 - checkpoint saved


Epoch 13/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/40: Train Loss=0.1317, Val Loss=0.3745, Val R¬≤=+0.5757


Epoch 14/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/40: Train Loss=0.1320, Val Loss=0.3842, Val R¬≤=+0.5887


Epoch 15/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/40: Train Loss=0.1456, Val Loss=0.3209, Val R¬≤=+0.6532
  üíæ New best R¬≤=+0.6532 - checkpoint saved


Epoch 16/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 16/40: Train Loss=0.1106, Val Loss=0.3468, Val R¬≤=+0.6123


Epoch 17/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 17/40: Train Loss=0.1285, Val Loss=0.3602, Val R¬≤=+0.6002


Epoch 18/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 18/40: Train Loss=0.1282, Val Loss=0.3757, Val R¬≤=+0.6006


Epoch 19/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 19/40: Train Loss=0.1152, Val Loss=0.3644, Val R¬≤=+0.5780


Epoch 20/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 20/40: Train Loss=0.1146, Val Loss=0.4127, Val R¬≤=+0.4862


Epoch 21/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 21/40: Train Loss=0.1154, Val Loss=0.3845, Val R¬≤=+0.5620


Epoch 22/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 22/40: Train Loss=0.1040, Val Loss=0.3805, Val R¬≤=+0.5732


Epoch 23/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 23/40: Train Loss=0.1258, Val Loss=0.4098, Val R¬≤=+0.5162


Epoch 24/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 24/40: Train Loss=0.1405, Val Loss=0.3888, Val R¬≤=+0.5540


Epoch 25/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 25/40: Train Loss=0.1284, Val Loss=0.3549, Val R¬≤=+0.5909


Epoch 26/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 26/40: Train Loss=0.1141, Val Loss=0.3232, Val R¬≤=+0.6269


Epoch 27/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 27/40: Train Loss=0.0941, Val Loss=0.3317, Val R¬≤=+0.6176


Epoch 28/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 28/40: Train Loss=0.0899, Val Loss=0.3329, Val R¬≤=+0.6357


Epoch 29/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 29/40: Train Loss=0.0765, Val Loss=0.3571, Val R¬≤=+0.6076


Epoch 30/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 30/40: Train Loss=0.0791, Val Loss=0.3494, Val R¬≤=+0.6163


Epoch 31/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 31/40: Train Loss=0.0868, Val Loss=0.3488, Val R¬≤=+0.6247


Epoch 32/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 32/40: Train Loss=0.0746, Val Loss=0.3366, Val R¬≤=+0.6223


Epoch 33/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 33/40: Train Loss=0.0805, Val Loss=0.3525, Val R¬≤=+0.6025


Epoch 34/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 34/40: Train Loss=0.0914, Val Loss=0.3698, Val R¬≤=+0.6046


Epoch 35/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 35/40: Train Loss=0.0761, Val Loss=0.3311, Val R¬≤=+0.6463


Epoch 36/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 36/40: Train Loss=0.0783, Val Loss=0.3686, Val R¬≤=+0.5724


Epoch 37/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 37/40: Train Loss=0.0920, Val Loss=0.3727, Val R¬≤=+0.5708


Epoch 38/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 38/40: Train Loss=0.0686, Val Loss=0.3546, Val R¬≤=+0.6177


Epoch 39/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 39/40: Train Loss=0.0760, Val Loss=0.3754, Val R¬≤=+0.5646


Epoch 40/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 40/40: Train Loss=0.0752, Val Loss=0.3361, Val R¬≤=+0.6140

‚úì Phase 2 complete! Best R¬≤=+0.6532

Per-target R¬≤ scores (at best epoch):
  Dry_Green_g    : R¬≤=+0.6291
  Dry_Dead_g     : R¬≤=+0.5141
  Dry_Clover_g   : R¬≤=+0.6524
  GDM_g          : R¬≤=+0.7014
  Dry_Total_g    : R¬≤=+0.6667


‚úÖ EfficientNet_VeryHighLR complete! R¬≤=+0.6532
   Checkpoint: model4b_EfficientNet_VeryHighLR_phase2_best.pth
   Expected Kaggle: 0.51-0.53



################################################################################
# TRAINING: EfficientNet_WithBatchNorm
# EfficientNet with BatchNorm in head
# Expected Kaggle: 0.52-0.54
################################################################################
‚úì Model: efficientnet_b0, 1280 features
  + BatchNorm enabled in head

PHASE 1: AUXILIARY PRETRAINING - EfficientNet_WithBatchNorm



Epoch 1/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/15: Train Loss=1.1170, Val Loss=1.0582, NDVI MAE=0.4176, State Acc=48.61%, Species Acc=31.94%
  üíæ Saved Phase 1 checkpoint (val_loss=1.0582)


Epoch 2/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/15: Train Loss=0.7599, Val Loss=0.7941, NDVI MAE=0.3784, State Acc=65.28%, Species Acc=61.11%
  üíæ Saved Phase 1 checkpoint (val_loss=0.7941)


Epoch 3/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/15: Train Loss=0.5772, Val Loss=0.6255, NDVI MAE=0.3853, State Acc=75.00%, Species Acc=65.28%
  üíæ Saved Phase 1 checkpoint (val_loss=0.6255)


Epoch 4/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/15: Train Loss=0.4914, Val Loss=0.5848, NDVI MAE=0.4079, State Acc=81.94%, Species Acc=72.22%
  üíæ Saved Phase 1 checkpoint (val_loss=0.5848)


Epoch 5/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/15: Train Loss=0.4067, Val Loss=0.4820, NDVI MAE=0.3850, State Acc=88.89%, Species Acc=72.22%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4820)


Epoch 6/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/15: Train Loss=0.3420, Val Loss=0.4765, NDVI MAE=0.3960, State Acc=87.50%, Species Acc=79.17%
  üíæ Saved Phase 1 checkpoint (val_loss=0.4765)


Epoch 7/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/15: Train Loss=0.2827, Val Loss=0.3840, NDVI MAE=0.3466, State Acc=87.50%, Species Acc=73.61%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3840)


Epoch 8/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/15: Train Loss=0.2677, Val Loss=0.3303, NDVI MAE=0.3400, State Acc=90.28%, Species Acc=86.11%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3303)


Epoch 9/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/15: Train Loss=0.2287, Val Loss=0.3197, NDVI MAE=0.3520, State Acc=90.28%, Species Acc=79.17%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3197)


Epoch 10/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/15: Train Loss=0.2152, Val Loss=0.3051, NDVI MAE=0.3424, State Acc=90.28%, Species Acc=86.11%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3051)


Epoch 11/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/15: Train Loss=0.1839, Val Loss=0.3033, NDVI MAE=0.3298, State Acc=93.06%, Species Acc=86.11%
  üíæ Saved Phase 1 checkpoint (val_loss=0.3033)


Epoch 12/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/15: Train Loss=0.1915, Val Loss=0.3100, NDVI MAE=0.3798, State Acc=90.28%, Species Acc=83.33%


Epoch 13/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/15: Train Loss=0.1762, Val Loss=0.3109, NDVI MAE=0.3730, State Acc=94.44%, Species Acc=84.72%


Epoch 14/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/15: Train Loss=0.1380, Val Loss=0.3315, NDVI MAE=0.3485, State Acc=90.28%, Species Acc=83.33%


Epoch 15/15:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/15: Train Loss=0.1291, Val Loss=0.2944, NDVI MAE=0.3568, State Acc=93.06%, Species Acc=84.72%
  üíæ Saved Phase 1 checkpoint (val_loss=0.2944)

‚úì Phase 1 complete! Model learned to predict tabular features from images.
  Final State Accuracy: 93.06% (model can 'see' location!)
  Final Species Accuracy: 84.72%
  Final NDVI MAE: 0.3568


PHASE 2: BIOMASS FINE-TUNING - EfficientNet_WithBatchNorm

Config: 40 epochs, head_lr=1.00e-03, backbone_lr=5.00e-05, 
        weight_decay=5.00e-05, scheduler=False


Epoch 1/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  1/40: Train Loss=0.7679, Val Loss=0.4542, Val R¬≤=+0.5081
  üíæ New best R¬≤=+0.5081 - checkpoint saved


Epoch 2/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  2/40: Train Loss=0.4346, Val Loss=0.4142, Val R¬≤=+0.5307
  üíæ New best R¬≤=+0.5307 - checkpoint saved


Epoch 3/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  3/40: Train Loss=0.4341, Val Loss=0.4118, Val R¬≤=+0.5752
  üíæ New best R¬≤=+0.5752 - checkpoint saved


Epoch 4/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  4/40: Train Loss=0.2973, Val Loss=0.3540, Val R¬≤=+0.6212
  üíæ New best R¬≤=+0.6212 - checkpoint saved


Epoch 5/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  5/40: Train Loss=0.3039, Val Loss=0.3849, Val R¬≤=+0.5580


Epoch 6/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  6/40: Train Loss=0.3195, Val Loss=0.3745, Val R¬≤=+0.5724


Epoch 7/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  7/40: Train Loss=0.2406, Val Loss=0.3732, Val R¬≤=+0.5738


Epoch 8/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  8/40: Train Loss=0.2231, Val Loss=0.3628, Val R¬≤=+0.6011


Epoch 9/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch  9/40: Train Loss=0.2140, Val Loss=0.3447, Val R¬≤=+0.6364
  üíæ New best R¬≤=+0.6364 - checkpoint saved


Epoch 10/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 10/40: Train Loss=0.2390, Val Loss=0.3534, Val R¬≤=+0.6216


Epoch 11/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 11/40: Train Loss=0.2208, Val Loss=0.3511, Val R¬≤=+0.6384
  üíæ New best R¬≤=+0.6384 - checkpoint saved


Epoch 12/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 12/40: Train Loss=0.2203, Val Loss=0.3757, Val R¬≤=+0.5870


Epoch 13/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 13/40: Train Loss=0.1932, Val Loss=0.3420, Val R¬≤=+0.6249


Epoch 14/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 14/40: Train Loss=0.2361, Val Loss=0.3532, Val R¬≤=+0.6151


Epoch 15/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 15/40: Train Loss=0.1867, Val Loss=0.3535, Val R¬≤=+0.6186


Epoch 16/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 16/40: Train Loss=0.2101, Val Loss=0.4016, Val R¬≤=+0.5718


Epoch 17/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 17/40: Train Loss=0.2067, Val Loss=0.3752, Val R¬≤=+0.6043


Epoch 18/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 18/40: Train Loss=0.1593, Val Loss=0.3598, Val R¬≤=+0.6055


Epoch 19/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 19/40: Train Loss=0.1999, Val Loss=0.3400, Val R¬≤=+0.6261


Epoch 20/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 20/40: Train Loss=0.2035, Val Loss=0.3530, Val R¬≤=+0.6084


Epoch 21/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 21/40: Train Loss=0.1550, Val Loss=0.3477, Val R¬≤=+0.6195


Epoch 22/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 22/40: Train Loss=0.1932, Val Loss=0.3684, Val R¬≤=+0.6018


Epoch 23/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 23/40: Train Loss=0.1798, Val Loss=0.3574, Val R¬≤=+0.6102


Epoch 24/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 24/40: Train Loss=0.1629, Val Loss=0.3704, Val R¬≤=+0.5866


Epoch 25/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 25/40: Train Loss=0.1625, Val Loss=0.3586, Val R¬≤=+0.6199


Epoch 26/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 26/40: Train Loss=0.1485, Val Loss=0.3754, Val R¬≤=+0.5680


Epoch 27/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 27/40: Train Loss=0.1698, Val Loss=0.3315, Val R¬≤=+0.6421
  üíæ New best R¬≤=+0.6421 - checkpoint saved


Epoch 28/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 28/40: Train Loss=0.1550, Val Loss=0.3527, Val R¬≤=+0.6258


Epoch 29/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 29/40: Train Loss=0.1585, Val Loss=0.3388, Val R¬≤=+0.6450
  üíæ New best R¬≤=+0.6450 - checkpoint saved


Epoch 30/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 30/40: Train Loss=0.1124, Val Loss=0.3240, Val R¬≤=+0.6583
  üíæ New best R¬≤=+0.6583 - checkpoint saved


Epoch 31/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 31/40: Train Loss=0.1467, Val Loss=0.3413, Val R¬≤=+0.6332


Epoch 32/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 32/40: Train Loss=0.1380, Val Loss=0.3561, Val R¬≤=+0.6083


Epoch 33/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 33/40: Train Loss=0.1216, Val Loss=0.3398, Val R¬≤=+0.6252


Epoch 34/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 34/40: Train Loss=0.1414, Val Loss=0.3633, Val R¬≤=+0.5954


Epoch 35/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 35/40: Train Loss=0.1303, Val Loss=0.3328, Val R¬≤=+0.6426


Epoch 36/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 36/40: Train Loss=0.1542, Val Loss=0.3617, Val R¬≤=+0.6219


Epoch 37/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 37/40: Train Loss=0.1442, Val Loss=0.3379, Val R¬≤=+0.6417


Epoch 38/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 38/40: Train Loss=0.1376, Val Loss=0.3404, Val R¬≤=+0.6361


Epoch 39/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 39/40: Train Loss=0.1512, Val Loss=0.3690, Val R¬≤=+0.5935


Epoch 40/40:   0%|          | 0/18 [00:00<?, ?it/s]

Epoch 40/40: Train Loss=0.1278, Val Loss=0.3290, Val R¬≤=+0.6496

‚úì Phase 2 complete! Best R¬≤=+0.6583

Per-target R¬≤ scores (at best epoch):
  Dry_Green_g    : R¬≤=+0.7631
  Dry_Dead_g     : R¬≤=+0.3676
  Dry_Clover_g   : R¬≤=+0.6512
  GDM_g          : R¬≤=+0.7482
  Dry_Total_g    : R¬≤=+0.6610


‚úÖ EfficientNet_WithBatchNorm complete! R¬≤=+0.6583
   Checkpoint: model4b_EfficientNet_WithBatchNorm_phase2_best.pth
   Expected Kaggle: 0.52-0.54


---
## Part 4: Compare Results

Compare the 3 EfficientNet tuning experiments.

In [34]:
print("\n\n")
print("="*80)
print("RESULTS: EFFICIENTNET TUNING")
print("="*80 + "\n")

# Create comparison
comparison_data = []
for var_name, results in new_results.items():
    comparison_data.append({
        'Variant': var_name,
        'Head LR': results['config']['phase2_lr'],
        'Backbone LR': results['config']['phase2_backbone_lr'],
        'BatchNorm': results['config'].get('use_batch_norm', False),
        'Best R¬≤': results['best_r2'],
        'Expected Kaggle': results['config']['expected_kaggle']
    })

if comparison_data:
    comparison_df = pd.DataFrame(comparison_data)
    comparison_df = comparison_df.sort_values('Best R¬≤', ascending=False)

    print("Results:")
    print(comparison_df.to_string(index=False))

    # Best
    best = comparison_df.iloc[0]
    print(f"\nüèÜ BEST: {best['Variant']}")
    print(f"   Val R¬≤: {best['Best R¬≤']:+.4f}")
    print(f"   Expected Kaggle: {best['Expected Kaggle']}")

    # vs Baseline
    baseline_r2 = 0.6852
    baseline_kaggle = 0.51
    improvement = best['Best R¬≤'] - baseline_r2

    print(f"\nüìä vs BASELINE:")
    print(f"   Baseline (ResNet18): Val={baseline_r2:+.4f}, Kaggle={baseline_kaggle:+.2f}")
    print(f"   Best EfficientNet:   Val={best['Best R¬≤']:+.4f}")
    print(f"   Val difference: {improvement:+.4f}")

    if improvement > 0:
        print(f"\n‚ú® BEATS BASELINE! Should improve Kaggle score!")
    elif improvement > -0.01:
        print(f"\n‚öñÔ∏è  Similar to baseline")
    else:
        print(f"\nüìâ Below baseline")
        # Check if it beat original EfficientNet
        orig_eff_r2 = 0.5903
        if best['Best R¬≤'] > orig_eff_r2:
            orig_improvement = best['Best R¬≤'] - orig_eff_r2
            print(f"\n‚úÖ But beats original EfficientNet by {orig_improvement:+.4f}!")
            print(f"   Original EfficientNet: {orig_eff_r2:+.4f}")
            print(f"   This confirms hyperparameter tuning helped!")
else:
    print("No results yet")

print("\n" + "="*80)




RESULTS: EFFICIENTNET TUNING

Results:
                   Variant  Head LR  Backbone LR  BatchNorm  Best R¬≤ Expected Kaggle
EfficientNet_WithBatchNorm    0.001      0.00005       True 0.658316       0.52-0.54
   EfficientNet_VeryHighLR    0.002      0.00010      False 0.653211       0.51-0.53
     EfficientNet_HigherLR    0.001      0.00005      False 0.598841       0.52-0.54

üèÜ BEST: EfficientNet_WithBatchNorm
   Val R¬≤: +0.6583
   Expected Kaggle: 0.52-0.54

üìä vs BASELINE:
   Baseline (ResNet18): Val=+0.6852, Kaggle=+0.51
   Best EfficientNet:   Val=+0.6583
   Val difference: -0.0269

üìâ Below baseline

‚úÖ But beats original EfficientNet by +0.0680!
   Original EfficientNet: +0.5903
   This confirms hyperparameter tuning helped!



---
## Summary

### Original Problem
EfficientNet with ResNet18's hyperparameters: R¬≤=0.5903

### Hypothesis
EfficientNet's 1280 features need higher learning rates than ResNet18's 512

### Results
[See output above]

### Next Steps
If best variant beats baseline (R¬≤>0.6852):
- Create Kaggle submission
- Compare with other experiments

If below baseline but better than original EfficientNet (R¬≤>0.5903):
- Confirms hyperparameter tuning helped
- Consider other architectures (Option 4: K-Fold)