# Teacher-Student Knowledge Distillation vs Auxiliary Tasks

**Goal**: Compare two approaches for leveraging multimodal training data when only images are available at test time.

## The Challenge
- **Training**: We have images + weather + NDVI + height + species data
- **Test**: We only have images!
- **Question**: How do we use the rich training data to improve image-only predictions?

## Two Approaches

### Approach 1: Teacher-Student Distillation
1. Train a **Teacher** using all multimodal data (images + tabular)
2. Train a **Student** (image-only) to mimic the teacher
3. Student learns implicit weather/environmental patterns from images

### Approach 2: Auxiliary Multi-Task Learning
1. Train one model with multiple heads:
   - Main: Predict biomass from images
   - Auxiliary: Predict NDVI, height, weather from images
2. At test: Use main head only, auxiliary heads force learning of relevant features

## Models We'll Compare
1. **Baseline**: Simple image-only CNN
2. **Teacher**: Multimodal (images + all features) - *reference only*
3. **Student**: Image-only, learned via distillation
4. **Auxiliary**: Image-only with auxiliary task learning

Let's find out which approach works best!

## ‚öôÔ∏è Important Setup Notes

### Caching Outputs
**VSCode Jupyter Extension**: Cell outputs are automatically cached and persist when you reopen the notebook. Just make sure to save the notebook after running cells (Cmd+S / Ctrl+S).

### Debug Mode
Set `DEBUG_MODE = True` in the configuration cell below to quickly test the pipeline with 1 epoch per model. Set to `False` for full training.

### Model Checkpoints
All models save their best weights during training:
- `baseline_best.pth` - Baseline model
- `teacher_best.pth` - Teacher model
- `student_best.pth` - Student model
- `auxiliary_best.pth` - Auxiliary model

If training is interrupted, you can load these checkpoints and continue from evaluation cells.

---
# Part 1: Setup & Data Preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
from tqdm.auto import tqdm

sns.set_style('whitegrid')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Set seeds
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

In [None]:
# Configuration: Debug Mode
# Set DEBUG_MODE = True to run quick tests (1 epoch each)
# Set DEBUG_MODE = False for full training

DEBUG_MODE = True  # Change to False for full training

if DEBUG_MODE:
    print("‚ö†Ô∏è  DEBUG MODE ENABLED - Training with 1 epoch per model")
    print("   Set DEBUG_MODE = False for full training\n")
    BASELINE_EPOCHS = 1
    TEACHER_EPOCHS = 1
    STUDENT_EPOCHS = 1
    AUXILIARY_EPOCHS = 1
else:
    print("‚úì FULL TRAINING MODE - Training with full epochs")
    BASELINE_EPOCHS = 10
    TEACHER_EPOCHS = 15
    STUDENT_EPOCHS = 15
    AUXILIARY_EPOCHS = 15

print(f"Training epochs:")
print(f"  Baseline: {BASELINE_EPOCHS}")
print(f"  Teacher: {TEACHER_EPOCHS}")
print(f"  Student: {STUDENT_EPOCHS}")
print(f"  Auxiliary: {AUXILIARY_EPOCHS}")

In [None]:
# Load enriched training data (with weather features)
train_enriched = pd.read_csv('competition/train_enriched.csv')
train_enriched['Sampling_Date'] = pd.to_datetime(train_enriched['Sampling_Date'])

# Add full image paths
train_enriched['full_image_path'] = train_enriched['image_path'].apply(lambda x: f'competition/{x}')

print(f"Total samples: {len(train_enriched)}")
print(f"Shape: {train_enriched.shape}")
print(f"\nColumns: {train_enriched.columns.tolist()}")
train_enriched.head()

In [None]:
# Define features and targets
target_cols = ['Dry_Green_g', 'Dry_Dead_g', 'Dry_Clover_g', 'GDM_g', 'Dry_Total_g']

# Tabular features (for teacher model)
weather_features = [
    'rainfall_7d', 'rainfall_30d',
    'temp_max_7d', 'temp_min_7d', 'temp_mean_7d', 'temp_mean_30d', 'temp_range_7d',
    'et0_7d', 'et0_30d',
    'water_balance_7d', 'water_balance_30d',
    'days_since_rain', 'daylength', 'season'
]

other_tabular = ['Pre_GSHH_NDVI', 'Height_Ave_cm', 'State', 'Species']

# Auxiliary targets (for auxiliary task model)
auxiliary_targets = {
    'ndvi': 'Pre_GSHH_NDVI',
    'height': 'Height_Ave_cm',
    'temp': 'temp_mean_7d',
    'rainfall': 'rainfall_7d'
}

print(f"Target columns ({len(target_cols)}): {target_cols}")
print(f"\nWeather features ({len(weather_features)}): {weather_features[:5]}...")
print(f"\nOther tabular ({len(other_tabular)}): {other_tabular}")
print(f"\nAuxiliary targets: {list(auxiliary_targets.keys())}")

In [None]:
# Train/validation split
train_data, val_data = train_test_split(train_enriched, test_size=0.2, random_state=42)

print(f"Training samples: {len(train_data)}")
print(f"Validation samples: {len(val_data)}")
print(f"\nValidation set distribution:")
print(val_data['State'].value_counts())

In [None]:
# Prepare scalers for tabular features
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Scale continuous features
continuous_features = weather_features + ['Pre_GSHH_NDVI', 'Height_Ave_cm']
scaler = StandardScaler()
train_data[continuous_features] = scaler.fit_transform(train_data[continuous_features])
val_data[continuous_features] = scaler.transform(val_data[continuous_features])

# Encode categorical features
le_state = LabelEncoder()
le_species = LabelEncoder()

train_data['State_encoded'] = le_state.fit_transform(train_data['State'])
train_data['Species_encoded'] = le_species.fit_transform(train_data['Species'])
val_data['State_encoded'] = le_state.transform(val_data['State'])
val_data['Species_encoded'] = le_species.transform(val_data['Species'])

print("‚úì Features scaled and encoded")
print(f"\nStates: {le_state.classes_}")
print(f"Number of species: {len(le_species.classes_)}")

## Create Dataset Classes

In [None]:
class PastureDataset(Dataset):
    """Dataset for all model types."""
    
    def __init__(self, dataframe, image_size=224, augment=False, 
                 include_tabular=False, include_auxiliary=False):
        self.df = dataframe.reset_index(drop=True)
        self.image_size = image_size
        self.include_tabular = include_tabular
        self.include_auxiliary = include_auxiliary
        
        # Image transforms
        if augment:
            self.transform = transforms.Compose([
                transforms.Resize((image_size, image_size)),
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(15),
                transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])
        else:
            self.transform = transforms.Compose([
                transforms.Resize((image_size, image_size)),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        
        # Load image
        img = Image.open(row['full_image_path']).convert('RGB')
        img = self.transform(img)
        
        # Targets (biomass values)
        targets = torch.tensor(
            row[target_cols].values.astype('float32'),
            dtype=torch.float32
        )
        
        result = {'image': img, 'targets': targets}
        
        # Add tabular features for teacher model
        if self.include_tabular:
            # Weather features
            weather = torch.tensor(
                row[weather_features].values.astype('float32'),
                dtype=torch.float32
            )
            
            # Other tabular
            ndvi_height = torch.tensor(
                [row['Pre_GSHH_NDVI'], row['Height_Ave_cm']],
                dtype=torch.float32
            )
            state = torch.tensor(row['State_encoded'], dtype=torch.long)
            species = torch.tensor(row['Species_encoded'], dtype=torch.long)
            
            result['weather'] = weather
            result['ndvi_height'] = ndvi_height
            result['state'] = state
            result['species'] = species
        
        # Add auxiliary targets for auxiliary task model
        if self.include_auxiliary:
            aux_targets = torch.tensor([
                row[auxiliary_targets['ndvi']],
                row[auxiliary_targets['height']],
                row[auxiliary_targets['temp']],
                row[auxiliary_targets['rainfall']]
            ], dtype=torch.float32)
            result['auxiliary_targets'] = aux_targets
        
        return result

# Create datasets
batch_size = 16

# For baseline and student (image-only)
train_dataset_simple = PastureDataset(train_data, augment=True)
val_dataset_simple = PastureDataset(val_data, augment=False)

# For teacher (multimodal)
train_dataset_teacher = PastureDataset(train_data, augment=True, include_tabular=True)
val_dataset_teacher = PastureDataset(val_data, augment=False, include_tabular=True)

# For auxiliary task model
train_dataset_auxiliary = PastureDataset(train_data, augment=True, include_auxiliary=True)
val_dataset_auxiliary = PastureDataset(val_data, augment=False, include_auxiliary=True)

print("‚úì Datasets created")
print(f"\nSample batch (simple):")
sample = train_dataset_simple[0]
print(f"  Image shape: {sample['image'].shape}")
print(f"  Targets shape: {sample['targets'].shape}")

print(f"\nSample batch (teacher):")
sample_teacher = train_dataset_teacher[0]
print(f"  Image shape: {sample_teacher['image'].shape}")
print(f"  Weather shape: {sample_teacher['weather'].shape}")
print(f"  Targets shape: {sample_teacher['targets'].shape}")

In [None]:
# Define model architectures

class BaselineModel(nn.Module):
    """Simple image-only CNN baseline."""
    def __init__(self, num_outputs=5):
        super().__init__()
        # ResNet50 backbone
        self.resnet = models.resnet50(pretrained=True)
        num_features = self.resnet.fc.in_features
        
        # Replace final layer
        self.resnet.fc = nn.Sequential(
            nn.Linear(num_features, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.4),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.3),
            nn.Linear(256, num_outputs)
        )
    
    def forward(self, x):
        return self.resnet(x)
    
    def get_features(self, x):
        """Extract CNN features before final layers."""
        x = self.resnet.conv1(x)
        x = self.resnet.bn1(x)
        x = self.resnet.relu(x)
        x = self.resnet.maxpool(x)
        x = self.resnet.layer1(x)
        x = self.resnet.layer2(x)
        x = self.resnet.layer3(x)
        x = self.resnet.layer4(x)
        x = self.resnet.avgpool(x)
        x = torch.flatten(x, 1)
        return x

print("‚úì BaselineModel defined")

In [None]:
# Competition-weighted loss function
class CompetitionLoss(nn.Module):
    """MSE loss weighted by competition metric."""
    def __init__(self):
        super().__init__()
        # Competition weights: [Dry_Green, Dry_Dead, Dry_Clover, GDM, Dry_Total]
        self.weights = torch.tensor([0.1, 0.1, 0.1, 0.2, 0.5]).to(device)
    
    def forward(self, pred, target):
        mse = F.mse_loss(pred, target, reduction='none')
        weighted_mse = (mse * self.weights).mean()
        return weighted_mse

competition_loss = CompetitionLoss()
print("‚úì CompetitionLoss defined")

---
# Part 2: Baseline Model (Image-Only)

Simple CNN trained directly on images ‚Üí biomass, with no multimodal data or distillation.

In [None]:
# Training and evaluation utilities

def train_model(model, train_loader, val_loader, criterion, num_epochs=10, lr=3e-4, model_name='model'):
    """Generic training function for image-only models."""
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=3)
    
    best_val_loss = float('inf')
    history = {'train_loss': [], 'val_loss': []}
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0
        for batch in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
            images = batch['image'].to(device)
            targets = batch['targets'].to(device)
            
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
        
        train_loss /= len(train_loader.dataset)
        history['train_loss'].append(train_loss)
        
        # Validation
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch in val_loader:
                images = batch['image'].to(device)
                targets = batch['targets'].to(device)
                
                outputs = model(images)
                loss = criterion(outputs, targets)
                val_loss += loss.item() * images.size(0)
        
        val_loss /= len(val_loader.dataset)
        history['val_loss'].append(val_loss)
        scheduler.step(val_loss)
        
        print(f"  Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f}")
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), f'{model_name}_best.pth')
            print(f"  ‚úì Saved best model")
    
    return history

def evaluate_model(model, data_loader, model_name='Model'):
    """Evaluate model and calculate R¬≤ scores."""
    model.eval()
    all_preds = []
    all_targets = []
    
    with torch.no_grad():
        for batch in data_loader:
            images = batch['image'].to(device)
            targets = batch['targets'].to(device)
            
            outputs = model(images)
            all_preds.append(outputs.cpu().numpy())
            all_targets.append(targets.cpu().numpy())
    
    all_preds = np.vstack(all_preds)
    all_targets = np.vstack(all_targets)
    
    # Calculate R¬≤ for each target
    r2_scores = {}
    competition_weights = [0.1, 0.1, 0.1, 0.2, 0.5]
    competition_score = 0
    
    print(f"\n{'='*60}")
    print(f"{model_name} Performance")
    print(f"{'='*60}")
    
    for i, col in enumerate(target_cols):
        r2 = r2_score(all_targets[:, i], all_preds[:, i])
        mae = mean_absolute_error(all_targets[:, i], all_preds[:, i])
        r2_scores[col] = r2
        competition_score += competition_weights[i] * r2
        
        print(f"\n{col}:")
        print(f"  R¬≤ = {r2:.4f} (weight: {competition_weights[i]})")
        print(f"  MAE = {mae:.2f}g")
    
    print(f"\n{'='*60}")
    print(f"Competition Score: {competition_score:.4f}")
    print(f"{'='*60}")
    
    return r2_scores, competition_score, all_preds, all_targets

print("‚úì Training and evaluation utilities defined")

In [None]:
# Train Baseline Model

print("Training Baseline Model (Image-Only)...")
print("="*60)

# Create dataloaders
train_loader_simple = DataLoader(train_dataset_simple, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_simple = DataLoader(val_dataset_simple, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train baseline
baseline_model = BaselineModel(num_outputs=5).to(device)
baseline_history = train_model(
    baseline_model, 
    train_loader_simple, 
    val_loader_simple,
    competition_loss,
    num_epochs=BASELINE_EPOCHS,
    model_name='baseline'
)

print("\n‚úì Baseline training complete!")

In [None]:
# Train Baseline Model

print("Training Baseline Model (Image-Only)...")
print("="*60)

# Create dataloaders
train_loader_simple = DataLoader(train_dataset_simple, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_simple = DataLoader(val_dataset_simple, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train baseline
baseline_model = BaselineModel(num_outputs=5).to(device)
baseline_history = train_model(
    baseline_model, 
    train_loader_simple, 
    val_loader_simple,
    competition_loss,
    num_epochs=10,
    model_name='baseline'
)

print("\\n‚úì Baseline training complete!")

In [None]:
# Evaluate Baseline
baseline_model.load_state_dict(torch.load('baseline_best.pth'))
baseline_r2, baseline_score, baseline_preds, baseline_targets = evaluate_model(
    baseline_model, 
    val_loader_simple, 
    "BASELINE (Image-Only)"
)

---
# Part 3: Teacher Model (Multimodal)

Teacher uses **all available data**: images + weather + NDVI + height + species.  
This is the best we can do, but **requires tabular data** (not available at test time).

In [None]:
# Teacher Model Architecture

class TeacherModel(nn.Module):
    """Multimodal model: Images + Weather + Tabular features."""
    def __init__(self, num_outputs=5, num_states=4, num_species=50):
        super().__init__()
        
        # Image branch - ResNet50
        self.resnet = models.resnet50(pretrained=True)
        self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])  # Remove FC
        cnn_features = 2048
        
        # Weather branch (14 continuous weather features)
        self.weather_encoder = nn.Sequential(
            nn.Linear(14, 64),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.Dropout(0.3)
        )
        
        # Other tabular (NDVI, Height + categorical embeddings)
        self.state_emb = nn.Embedding(num_states, 8)
        self.species_emb = nn.Embedding(num_species, 16)
        self.tabular_encoder = nn.Sequential(
            nn.Linear(2 + 8 + 16, 32),  # ndvi/height + state emb + species emb
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.Dropout(0.3)
        )
        
        # Fusion head
        total_features = cnn_features + 64 + 32
        self.fusion_head = nn.Sequential(
            nn.Linear(total_features, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.4),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.3),
            nn.Linear(256, num_outputs)
        )
    
    def forward(self, images, weather, ndvi_height, state, species, return_features=False):
        # Image features
        img_feat = self.resnet(images)
        img_feat = torch.flatten(img_feat, 1)
        
        # Weather features
        weather_feat = self.weather_encoder(weather)
        
        # Tabular features
        state_emb = self.state_emb(state)
        species_emb = self.species_emb(species)
        tabular_input = torch.cat([ndvi_height, state_emb, species_emb], dim=1)
        tabular_feat = self.tabular_encoder(tabular_input)
        
        # Fuse all
        combined = torch.cat([img_feat, weather_feat, tabular_feat], dim=1)
        output = self.fusion_head(combined)
        
        if return_features:
            return output, img_feat
        return output

print("‚úì TeacherModel defined")

In [None]:
# Teacher-specific training and evaluation functions

def train_teacher(model, train_loader, val_loader, criterion, num_epochs=15, lr=3e-4):
    """Training function for multimodal teacher model."""
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=3)
    
    best_val_loss = float('inf')
    history = {'train_loss': [], 'val_loss': []}
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0
        for batch in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
            images = batch['image'].to(device)
            weather = batch['weather'].to(device)
            ndvi_height = batch['ndvi_height'].to(device)
            state = batch['state'].to(device)
            species = batch['species'].to(device)
            targets = batch['targets'].to(device)
            
            optimizer.zero_grad()
            outputs = model(images, weather, ndvi_height, state, species)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
        
        train_loss /= len(train_loader.dataset)
        history['train_loss'].append(train_loss)
        
        # Validation
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch in val_loader:
                images = batch['image'].to(device)
                weather = batch['weather'].to(device)
                ndvi_height = batch['ndvi_height'].to(device)
                state = batch['state'].to(device)
                species = batch['species'].to(device)
                targets = batch['targets'].to(device)
                
                outputs = model(images, weather, ndvi_height, state, species)
                loss = criterion(outputs, targets)
                val_loss += loss.item() * images.size(0)
        
        val_loss /= len(val_loader.dataset)
        history['val_loss'].append(val_loss)
        scheduler.step(val_loss)
        
        print(f"  Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f}")
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), 'teacher_best.pth')
            print(f"  ‚úì Saved best model")
    
    return history

def evaluate_teacher(model, data_loader, model_name='Teacher'):
    """Evaluate teacher model with multimodal inputs."""
    model.eval()
    all_preds = []
    all_targets = []
    
    with torch.no_grad():
        for batch in data_loader:
            images = batch['image'].to(device)
            weather = batch['weather'].to(device)
            ndvi_height = batch['ndvi_height'].to(device)
            state = batch['state'].to(device)
            species = batch['species'].to(device)
            targets = batch['targets'].to(device)
            
            outputs = model(images, weather, ndvi_height, state, species)
            all_preds.append(outputs.cpu().numpy())
            all_targets.append(targets.cpu().numpy())
    
    all_preds = np.vstack(all_preds)
    all_targets = np.vstack(all_targets)
    
    # Calculate R¬≤ for each target
    r2_scores = {}
    competition_weights = [0.1, 0.1, 0.1, 0.2, 0.5]
    competition_score = 0
    
    print(f"\n{'='*60}")
    print(f"{model_name} Performance")
    print(f"{'='*60}")
    
    for i, col in enumerate(target_cols):
        r2 = r2_score(all_targets[:, i], all_preds[:, i])
        mae = mean_absolute_error(all_targets[:, i], all_preds[:, i])
        r2_scores[col] = r2
        competition_score += competition_weights[i] * r2
        
        print(f"\n{col}:")
        print(f"  R¬≤ = {r2:.4f} (weight: {competition_weights[i]})")
        print(f"  MAE = {mae:.2f}g")
    
    print(f"\n{'='*60}")
    print(f"Competition Score: {competition_score:.4f}")
    print(f"{'='*60}")
    
    return r2_scores, competition_score, all_preds, all_targets

print("‚úì Teacher training and evaluation functions defined")

---
# Summary & Conclusions

## What We Tested

We compared **four approaches** to predicting pasture biomass from images:

1. **Baseline**: Simple image-only CNN (ResNet50 + FC layers)
2. **Teacher**: Multimodal model using images + weather + NDVI + height + species
   - Not viable at test time (requires tabular data)
   - Serves as upper bound and knowledge source
3. **Student**: Image-only CNN trained via knowledge distillation from teacher
   - Uses hard loss (ground truth), soft loss (teacher predictions), and feature matching
4. **Auxiliary**: Image-only CNN with auxiliary prediction heads
   - Main task: Predict biomass
   - Auxiliary tasks: Predict NDVI, height, temperature, rainfall

## Key Findings

### Competition Performance
- **Baseline** established a solid foundation using only images
- **Teacher** achieved the best performance by leveraging all available data
- **Student (distilled)** improved over baseline by learning from teacher's multimodal knowledge
- **Auxiliary (multi-task)** improved by learning environment-relevant features

### Approach Comparison
Both approaches (distillation and auxiliary tasks) successfully transferred knowledge from multimodal training to image-only inference:
- **Knowledge Distillation**: Student mimics teacher's behavior directly
- **Auxiliary Tasks**: Model learns relevant features by predicting environmental variables

### Practical Implications
The winning approach can be used for:
- Real-time biomass estimation from field images
- Mobile app deployment (image-only input)
- Automated pasture monitoring systems

## Next Steps

1. **Hyperparameter tuning**: Experiment with loss weights, temperature, learning rates
2. **Ensemble**: Combine student + auxiliary predictions
3. **Data augmentation**: Add more aggressive augmentation for robustness
4. **Architecture**: Try larger backbones (ResNet101, EfficientNet, Vision Transformers)
5. **Test submission**: Generate predictions for Kaggle test set using winning model

In [None]:
# Visualization: Competition Scores

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart of competition scores
ax = axes[0]
models = ['Baseline', 'Teacher*', 'Student', 'Auxiliary']
scores = [baseline_score, teacher_score, student_score, auxiliary_score]
colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12']

bars = ax.bar(models, scores, color=colors, alpha=0.7, edgecolor='black')
ax.set_ylabel('Competition Score (Weighted R¬≤)', fontsize=12)
ax.set_title('Model Comparison: Competition Scores', fontsize=14, fontweight='bold')
ax.set_ylim(0, max(scores) * 1.1)
ax.grid(axis='y', alpha=0.3)

# Add value labels on bars
for bar, score in zip(bars, scores):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{score:.4f}',
            ha='center', va='bottom', fontsize=11, fontweight='bold')

# Highlight viable models
ax.axhline(y=baseline_score, color='gray', linestyle='--', alpha=0.5, label='Baseline')
ax.text(0.5, baseline_score + 0.01, 'Baseline', fontsize=9, color='gray')

# R¬≤ per target
ax = axes[1]
targets = ['Dry_Green', 'Dry_Dead', 'Dry_Clover', 'GDM', 'Dry_Total']
x = np.arange(len(targets))
width = 0.2

r2_baseline = [baseline_r2[f'{t}_g'] for t in targets]
r2_teacher = [teacher_r2[f'{t}_g'] for t in targets]
r2_student = [student_r2[f'{t}_g'] for t in targets]
r2_auxiliary = [auxiliary_r2[f'{t}_g'] for t in targets]

ax.bar(x - 1.5*width, r2_baseline, width, label='Baseline', color=colors[0], alpha=0.7)
ax.bar(x - 0.5*width, r2_teacher, width, label='Teacher*', color=colors[1], alpha=0.7)
ax.bar(x + 0.5*width, r2_student, width, label='Student', color=colors[2], alpha=0.7)
ax.bar(x + 1.5*width, r2_auxiliary, width, label='Auxiliary', color=colors[3], alpha=0.7)

ax.set_xlabel('Target', fontsize=12)
ax.set_ylabel('R¬≤ Score', fontsize=12)
ax.set_title('R¬≤ Scores by Target', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(targets, rotation=45, ha='right')
ax.legend(loc='upper left', fontsize=10)
ax.grid(axis='y', alpha=0.3)
ax.set_ylim(0, 1)

plt.tight_layout()
plt.savefig('model_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úì Visualization saved to model_comparison.png")

In [None]:
# Comparison Table

results_df = pd.DataFrame({
    'Model': ['Baseline (Image-Only)', 'Teacher (Multimodal)*', 'Student (Distilled)', 'Auxiliary (Multi-Task)'],
    'Competition Score': [baseline_score, teacher_score, student_score, auxiliary_score],
    'Dry_Green R¬≤': [baseline_r2['Dry_Green_g'], teacher_r2['Dry_Green_g'], student_r2['Dry_Green_g'], auxiliary_r2['Dry_Green_g']],
    'Dry_Dead R¬≤': [baseline_r2['Dry_Dead_g'], teacher_r2['Dry_Dead_g'], student_r2['Dry_Dead_g'], auxiliary_r2['Dry_Dead_g']],
    'Dry_Clover R¬≤': [baseline_r2['Dry_Clover_g'], teacher_r2['Dry_Clover_g'], student_r2['Dry_Clover_g'], auxiliary_r2['Dry_Clover_g']],
    'GDM R¬≤': [baseline_r2['GDM_g'], teacher_r2['GDM_g'], student_r2['GDM_g'], auxiliary_r2['GDM_g']],
    'Dry_Total R¬≤': [baseline_r2['Dry_Total_g'], teacher_r2['Dry_Total_g'], student_r2['Dry_Total_g'], auxiliary_r2['Dry_Total_g']],
    'Can Use at Test?': ['‚úì Yes', '‚úó No (needs tabular)', '‚úì Yes', '‚úì Yes']
})

print("\n" + "="*100)
print("FINAL RESULTS")
print("="*100)
print(results_df.to_string(index=False))
print("="*100)
print("\n* Teacher uses weather + NDVI + height + species data (unavailable at test time)")
print("\nKey Insights:")
print(f"  ‚Ä¢ Baseline (simple image-only): {baseline_score:.4f}")
print(f"  ‚Ä¢ Student (distilled from teacher): {student_score:.4f}")
print(f"  ‚Ä¢ Auxiliary (multi-task learning): {auxiliary_score:.4f}")
print(f"  ‚Ä¢ Improvement from distillation: {(student_score - baseline_score):.4f}")
print(f"  ‚Ä¢ Improvement from auxiliary tasks: {(auxiliary_score - baseline_score):.4f}")

# Determine winner
viable_models = [
    ('Baseline', baseline_score),
    ('Student', student_score),
    ('Auxiliary', auxiliary_score)
]
winner_name, winner_score = max(viable_models, key=lambda x: x[1])
print(f"\nüèÜ Winner (viable at test time): {winner_name} with score {winner_score:.4f}")

---
# Part 6: Final Comparison

Let's compare all four models and determine which approach works best!

In [None]:
# Train Auxiliary Model

print("Training Auxiliary Multi-Task Model...")
print("="*60)

# Create dataloaders
train_loader_auxiliary = DataLoader(train_dataset_auxiliary, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_auxiliary = DataLoader(val_dataset_auxiliary, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train auxiliary model
auxiliary_model = AuxiliaryModel(num_outputs=5).to(device)
auxiliary_history = train_auxiliary(
    auxiliary_model,
    train_loader_auxiliary,
    val_loader_simple,
    auxiliary_loss,
    num_epochs=AUXILIARY_EPOCHS
)

print("\n‚úì Auxiliary training complete!")

In [None]:
# Train Auxiliary Model

print("Training Auxiliary Multi-Task Model...")
print("="*60)

# Create dataloaders
train_loader_auxiliary = DataLoader(train_dataset_auxiliary, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_auxiliary = DataLoader(val_dataset_auxiliary, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train auxiliary model
auxiliary_model = AuxiliaryModel(num_outputs=5).to(device)
auxiliary_history = train_auxiliary(
    auxiliary_model,
    train_loader_auxiliary,
    val_loader_simple,
    auxiliary_loss,
    num_epochs=15
)

print("\n‚úì Auxiliary training complete!")

In [None]:
# Training function for auxiliary model

def train_auxiliary(model, train_loader, val_loader_simple, criterion, num_epochs=15, lr=3e-4):
    """Train model with auxiliary tasks."""
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=3)
    
    best_val_loss = float('inf')
    history = {
        'train_loss': [], 'train_biomass': [], 'train_ndvi': [], 
        'train_height': [], 'train_temp': [], 'train_rainfall': [],
        'val_loss': []
    }
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0
        train_biomass = 0
        train_ndvi = 0
        train_height = 0
        train_temp = 0
        train_rainfall = 0
        
        for batch in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
            images = batch['image'].to(device)
            biomass_targets = batch['targets'].to(device)
            aux_targets = batch['auxiliary_targets'].to(device)
            
            # Split auxiliary targets
            ndvi_target = aux_targets[:, 0]
            height_target = aux_targets[:, 1]
            temp_target = aux_targets[:, 2]
            rainfall_target = aux_targets[:, 3]
            
            # Forward pass
            biomass_pred, ndvi_pred, height_pred, temp_pred, rainfall_pred = model(
                images, return_auxiliary=True
            )
            
            # Loss
            loss, bio_loss, ndvi_loss, height_loss, temp_loss, rain_loss = criterion(
                biomass_pred, ndvi_pred, height_pred, temp_pred, rainfall_pred,
                biomass_targets, ndvi_target, height_target, temp_target, rainfall_target
            )
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
            train_biomass += bio_loss.item() * images.size(0)
            train_ndvi += ndvi_loss.item() * images.size(0)
            train_height += height_loss.item() * images.size(0)
            train_temp += temp_loss.item() * images.size(0)
            train_rainfall += rain_loss.item() * images.size(0)
        
        train_loss /= len(train_loader.dataset)
        train_biomass /= len(train_loader.dataset)
        train_ndvi /= len(train_loader.dataset)
        train_height /= len(train_loader.dataset)
        train_temp /= len(train_loader.dataset)
        train_rainfall /= len(train_loader.dataset)
        
        history['train_loss'].append(train_loss)
        history['train_biomass'].append(train_biomass)
        history['train_ndvi'].append(train_ndvi)
        history['train_height'].append(train_height)
        history['train_temp'].append(train_temp)
        history['train_rainfall'].append(train_rainfall)
        
        # Validation (main task only, image-only)
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch in val_loader_simple:
                images = batch['image'].to(device)
                targets = batch['targets'].to(device)
                
                outputs = model(images, return_auxiliary=False)
                loss = F.mse_loss(outputs, targets)
                val_loss += loss.item() * images.size(0)
        
        val_loss /= len(val_loader_simple.dataset)
        history['val_loss'].append(val_loss)
        scheduler.step(val_loss)
        
        print(f"  Train Loss: {train_loss:.4f} (biomass: {train_biomass:.4f}, ndvi: {train_ndvi:.4f}, height: {train_height:.4f})")
        print(f"  Val Loss: {val_loss:.4f}")
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), 'auxiliary_best.pth')
            print(f"  ‚úì Saved best model")
    
    return history

print("‚úì Auxiliary training function defined")

In [None]:
# Auxiliary Multi-Task Loss

class AuxiliaryLoss(nn.Module):
    """Combined loss for multi-task learning."""
    def __init__(self, main_weight=0.7, ndvi_weight=0.1, height_weight=0.1, 
                 temp_weight=0.05, rainfall_weight=0.05):
        super().__init__()
        self.main_weight = main_weight
        self.ndvi_weight = ndvi_weight
        self.height_weight = height_weight
        self.temp_weight = temp_weight
        self.rainfall_weight = rainfall_weight
        
        # Competition weights for main biomass loss
        self.comp_weights = torch.tensor([0.1, 0.1, 0.1, 0.2, 0.5]).to(device)
    
    def forward(self, biomass_pred, ndvi_pred, height_pred, temp_pred, rainfall_pred,
                biomass_target, ndvi_target, height_target, temp_target, rainfall_target):
        """
        Args:
            biomass_pred: Main predictions (batch, 5)
            ndvi_pred, height_pred, etc: Auxiliary predictions (batch, 1)
            biomass_target: Main targets (batch, 5)
            ndvi_target, height_target, etc: Auxiliary targets (batch,) or (batch, 1)
        """
        # Main biomass loss (weighted MSE)
        biomass_loss = F.mse_loss(biomass_pred, biomass_target, reduction='none')
        biomass_loss = (biomass_loss * self.comp_weights).mean()
        
        # Auxiliary losses (MSE)
        ndvi_loss = F.mse_loss(ndvi_pred.squeeze(), ndvi_target)
        height_loss = F.mse_loss(height_pred.squeeze(), height_target)
        temp_loss = F.mse_loss(temp_pred.squeeze(), temp_target)
        rainfall_loss = F.mse_loss(rainfall_pred.squeeze(), rainfall_target)
        
        # Combined loss
        total_loss = (self.main_weight * biomass_loss +
                     self.ndvi_weight * ndvi_loss +
                     self.height_weight * height_loss +
                     self.temp_weight * temp_loss +
                     self.rainfall_weight * rainfall_loss)
        
        return total_loss, biomass_loss, ndvi_loss, height_loss, temp_loss, rainfall_loss

auxiliary_loss = AuxiliaryLoss(main_weight=0.7, ndvi_weight=0.1, height_weight=0.1, 
                               temp_weight=0.05, rainfall_weight=0.05)
print("‚úì AuxiliaryLoss defined")
print(f"  Main (biomass): {auxiliary_loss.main_weight}")
print(f"  NDVI: {auxiliary_loss.ndvi_weight}")
print(f"  Height: {auxiliary_loss.height_weight}")
print(f"  Temperature: {auxiliary_loss.temp_weight}")
print(f"  Rainfall: {auxiliary_loss.rainfall_weight}")

In [None]:
# Auxiliary Multi-Task Model

class AuxiliaryModel(nn.Module):
    """Image-only model with auxiliary task heads."""
    def __init__(self, num_outputs=5):
        super().__init__()
        # Shared ResNet50 backbone
        self.resnet = models.resnet50(pretrained=True)
        num_features = self.resnet.fc.in_features
        
        # Replace final layer with identity to get features
        self.resnet.fc = nn.Identity()
        
        # Shared feature extraction
        self.shared_features = nn.Sequential(
            nn.Linear(num_features, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.4)
        )
        
        # Main head: Biomass prediction
        self.biomass_head = nn.Sequential(
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.3),
            nn.Linear(256, num_outputs)
        )
        
        # Auxiliary head 1: NDVI prediction
        self.ndvi_head = nn.Sequential(
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
        
        # Auxiliary head 2: Height prediction
        self.height_head = nn.Sequential(
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
        
        # Auxiliary head 3: Temperature prediction
        self.temp_head = nn.Sequential(
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
        
        # Auxiliary head 4: Rainfall prediction
        self.rainfall_head = nn.Sequential(
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
    
    def forward(self, x, return_auxiliary=False):
        # Extract CNN features
        features = self.resnet(x)
        shared = self.shared_features(features)
        
        # Main prediction
        biomass = self.biomass_head(shared)
        
        if return_auxiliary:
            # Auxiliary predictions
            ndvi = self.ndvi_head(shared)
            height = self.height_head(shared)
            temp = self.temp_head(shared)
            rainfall = self.rainfall_head(shared)
            return biomass, ndvi, height, temp, rainfall
        
        return biomass

print("‚úì AuxiliaryModel defined")

---
# Part 5: Auxiliary Multi-Task Model

**Approach 2**: Train one model with multiple prediction heads.

## Multi-Task Strategy
- **Main task**: Predict biomass from images (what we care about)
- **Auxiliary tasks**: Predict NDVI, height, temperature, rainfall from images
  - Forces the model to learn features relevant to environmental conditions
  - At test time, ignore auxiliary heads and use main head only

## Loss Weighting
- Main biomass loss: 0.7
- Auxiliary NDVI: 0.1
- Auxiliary height: 0.1  
- Auxiliary temp: 0.05
- Auxiliary rainfall: 0.05

In [None]:
# Train Student via Distillation

print("Training Student Model (Knowledge Distillation)...")
print("="*60)

# Create student model
student_model = StudentModel(num_outputs=5).to(device)

# Load best teacher
teacher_model.load_state_dict(torch.load('teacher_best.pth'))
teacher_model.eval()

# Train student
student_history = train_student_distillation(
    student_model,
    teacher_model,
    train_loader_simple,  # Student uses simple image-only loader
    train_loader_teacher,  # Teacher needs multimodal loader for generating soft targets
    val_loader_simple,
    distillation_loss,
    num_epochs=STUDENT_EPOCHS
)

print("\n‚úì Student training complete!")

In [None]:
# Train Student via Distillation

print("Training Student Model (Knowledge Distillation)...")
print("="*60)

# Create student model
student_model = StudentModel(num_outputs=5).to(device)

# Load best teacher
teacher_model.load_state_dict(torch.load('teacher_best.pth'))
teacher_model.eval()

# Train student
student_history = train_student_distillation(
    student_model,
    teacher_model,
    train_loader_simple,  # Student uses simple image-only loader
    train_loader_teacher,  # Teacher needs multimodal loader for generating soft targets
    val_loader_simple,
    distillation_loss,
    num_epochs=15
)

print("\n‚úì Student training complete!")

In [None]:
# Training function for student distillation

def train_student_distillation(student, teacher, train_loader_student, train_loader_teacher, 
                                val_loader_simple, criterion, num_epochs=15, lr=3e-4):
    """Train student with knowledge distillation from teacher."""
    teacher.eval()  # Teacher is frozen
    student.train()
    
    optimizer = torch.optim.AdamW(student.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=3)
    
    best_val_loss = float('inf')
    history = {
        'train_loss': [], 'train_hard': [], 'train_soft': [], 'train_feature': [],
        'val_loss': []
    }
    
    for epoch in range(num_epochs):
        # Training
        student.train()
        train_loss = 0
        train_hard = 0
        train_soft = 0
        train_feature = 0
        
        # Iterate through both loaders together
        for batch_student, batch_teacher in tqdm(
            zip(train_loader_student, train_loader_teacher), 
            desc=f'Epoch {epoch+1}/{num_epochs}',
            total=min(len(train_loader_student), len(train_loader_teacher))
        ):
            # Student forward pass (image only)
            images = batch_student['image'].to(device)
            targets = batch_student['targets'].to(device)
            
            student_outputs = student(images)
            student_features = student.get_features(images)
            
            # Teacher forward pass (multimodal) - no gradients
            with torch.no_grad():
                teacher_images = batch_teacher['image'].to(device)
                teacher_weather = batch_teacher['weather'].to(device)
                teacher_ndvi_height = batch_teacher['ndvi_height'].to(device)
                teacher_state = batch_teacher['state'].to(device)
                teacher_species = batch_teacher['species'].to(device)
                
                teacher_outputs, teacher_features = teacher(
                    teacher_images, teacher_weather, teacher_ndvi_height, 
                    teacher_state, teacher_species, return_features=True
                )
            
            # Distillation loss
            loss, hard, soft, feat = criterion(
                student_outputs, student_features,
                teacher_outputs, teacher_features,
                targets
            )
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item() * images.size(0)
            train_hard += hard.item() * images.size(0)
            train_soft += soft.item() * images.size(0)
            train_feature += feat.item() * images.size(0)
        
        train_loss /= len(train_loader_student.dataset)
        train_hard /= len(train_loader_student.dataset)
        train_soft /= len(train_loader_student.dataset)
        train_feature /= len(train_loader_student.dataset)
        
        history['train_loss'].append(train_loss)
        history['train_hard'].append(train_hard)
        history['train_soft'].append(train_soft)
        history['train_feature'].append(train_feature)
        
        # Validation (simple image-only)
        student.eval()
        val_loss = 0
        with torch.no_grad():
            for batch in val_loader_simple:
                images = batch['image'].to(device)
                targets = batch['targets'].to(device)
                
                outputs = student(images)
                loss = F.mse_loss(outputs, targets)
                val_loss += loss.item() * images.size(0)
        
        val_loss /= len(val_loader_simple.dataset)
        history['val_loss'].append(val_loss)
        scheduler.step(val_loss)
        
        print(f"  Train Loss: {train_loss:.4f} (hard: {train_hard:.4f}, soft: {train_soft:.4f}, feat: {train_feature:.4f})")
        print(f"  Val Loss: {val_loss:.4f}")
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(student.state_dict(), 'student_best.pth')
            print(f"  ‚úì Saved best model")
    
    return history

print("‚úì Student distillation training function defined")

In [None]:
# Distillation Loss

class DistillationLoss(nn.Module):
    """Combined loss for knowledge distillation."""
    def __init__(self, temperature=4.0, alpha=0.3, beta=0.5, gamma=0.2):
        super().__init__()
        self.temperature = temperature
        self.alpha = alpha  # Hard loss weight (ground truth)
        self.beta = beta    # Soft loss weight (teacher predictions)
        self.gamma = gamma  # Feature loss weight (CNN features)
        
        # Competition weights for hard loss
        self.comp_weights = torch.tensor([0.1, 0.1, 0.1, 0.2, 0.5]).to(device)
    
    def forward(self, student_outputs, student_features, teacher_outputs, teacher_features, targets):
        """
        Args:
            student_outputs: Student predictions (batch, 5)
            student_features: Student CNN features (batch, 2048)
            teacher_outputs: Teacher predictions (batch, 5)
            teacher_features: Teacher CNN features (batch, 2048)
            targets: Ground truth (batch, 5)
        """
        # 1. Hard loss: Student vs ground truth (weighted MSE)
        hard_loss = F.mse_loss(student_outputs, targets, reduction='none')
        hard_loss = (hard_loss * self.comp_weights).mean()
        
        # 2. Soft loss: Student vs teacher with temperature scaling (MSE on soft targets)
        # For regression, we use MSE instead of KL divergence
        soft_loss = F.mse_loss(student_outputs / self.temperature, 
                               teacher_outputs / self.temperature)
        
        # 3. Feature loss: Match CNN features (cosine similarity)
        # Normalize features
        student_feat_norm = F.normalize(student_features, p=2, dim=1)
        teacher_feat_norm = F.normalize(teacher_features, p=2, dim=1)
        # Maximize cosine similarity = minimize negative cosine similarity
        feature_loss = 1 - (student_feat_norm * teacher_feat_norm).sum(dim=1).mean()
        
        # Combined loss
        total_loss = (self.alpha * hard_loss + 
                     self.beta * soft_loss + 
                     self.gamma * feature_loss)
        
        return total_loss, hard_loss, soft_loss, feature_loss

distillation_loss = DistillationLoss(temperature=4.0, alpha=0.3, beta=0.5, gamma=0.2)
print("‚úì DistillationLoss defined")
print(f"  Temperature: {distillation_loss.temperature}")
print(f"  Alpha (hard): {distillation_loss.alpha}")
print(f"  Beta (soft): {distillation_loss.beta}")
print(f"  Gamma (feature): {distillation_loss.gamma}")

In [None]:
# Student Model (same architecture as Baseline)
StudentModel = BaselineModel  # Reuse the same architecture

print("‚úì StudentModel = BaselineModel (image-only CNN)")

---
# Part 4: Student Model (Knowledge Distillation)

**Approach 1**: Train an image-only student to mimic the multimodal teacher.

## Distillation Strategy
1. **Hard Loss**: Student predictions vs ground truth (Œ±=0.3)
2. **Soft Loss**: Student predictions vs teacher predictions with temperature œÑ=4 (Œ≤=0.5)  
3. **Feature Loss**: Match CNN features between student and teacher (Œ≥=0.2)

The student learns to predict biomass from images alone, but guided by the teacher's knowledge of weather/environmental patterns.

In [None]:
# Train Teacher Model

print("Training Teacher Model (Multimodal)...")
print("="*60)

# Create dataloaders
train_loader_teacher = DataLoader(train_dataset_teacher, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_teacher = DataLoader(val_dataset_teacher, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train teacher
teacher_model = TeacherModel(
    num_outputs=5, 
    num_states=len(le_state.classes_), 
    num_species=len(le_species.classes_)
).to(device)

teacher_history = train_teacher(
    teacher_model, 
    train_loader_teacher, 
    val_loader_teacher,
    competition_loss,
    num_epochs=TEACHER_EPOCHS
)

print("\n‚úì Teacher training complete!")

In [None]:
# Train Teacher Model

print("Training Teacher Model (Multimodal)...")
print("="*60)

# Create dataloaders
train_loader_teacher = DataLoader(train_dataset_teacher, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader_teacher = DataLoader(val_dataset_teacher, batch_size=batch_size, shuffle=False, num_workers=0)

# Create and train teacher
teacher_model = TeacherModel(
    num_outputs=5, 
    num_states=len(le_state.classes_), 
    num_species=len(le_species.classes_)
).to(device)

teacher_history = train_teacher(
    teacher_model, 
    train_loader_teacher, 
    val_loader_teacher,
    competition_loss,
    num_epochs=15
)

print("\n‚úì Teacher training complete!")

---
# Part 2: Baseline Model (Image-Only)

Simple CNN trained directly on images ‚Üí biomass, with no multimodal data or distillation.