# CSIRO Image2Biomass — Project Overview (Brief)

## 1. Objective
Predict pasture biomass components from top-view images and associated metadata such as NDVI, height, and species information.  
The model estimates five targets — `Dry_Clover_g`, `Dry_Dead_g`, `Dry_Green_g`, `Dry_Total_g`, and `GDM_g` — using a **single multi-task deep learning model**.

---

## 2. Dataset
The dataset contains:
- RGB pasture images (`train/` and `test/` folders)
- Metadata and biomass targets in `train.csv`
- `test.csv` for inference and `sample_submission.csv` for submission format

Each image has multiple biomass component measurements, resulting in a **long-format dataset**.

---

## 3. Preprocessing
- **Handling missing values:** All missing `target` values are filled with the mean of their respective target columns to prevent training instability.  
- **Pivoting:** The dataset is transformed from long to wide format so each image has one row with five target columns.  
- **Encoding:** Categorical columns like `State` and `Species` are label-encoded.  
- **Scaling:** Numerical features such as NDVI and height are standardized.  
- **Augmentation:** Training images are augmented using random crops, flips, brightness, and rotation to improve generalization.

---

## 4. Model Architecture
- A **single multi-task model** built using a `timm` CNN backbone (e.g., EfficientNet).
- Image features and tabular features (NDVI, height, state, species) are extracted separately and then fused.
- The final layer outputs five regression values corresponding to the biomass components.
- **Loss:** Mean Squared Error (MSE)  
- **Optimizer:** AdamW  
- **Metric:** Root Mean Squared Error (RMSE)

---

## 5. Training Strategy
- **Cross-validation:** 5-Fold GroupKFold is used to ensure all data from the same image stays in one fold.
- Each fold trains independently, and the best-performing checkpoint (lowest validation RMSE) is saved.
- Results are averaged across folds for robustness.

---

## 6. Inference & Submission
- Trained fold models are loaded for inference.
- Predictions from all folds are averaged to create final results.
- Negative predictions are clipped to zero.
- The final output follows the Kaggle submission format (`sample_id`, `target`) and is saved as `submission.csv`.

---

## 7. Key Takeaways
- Filling NaNs early is essential for stable training.
- GroupKFold prevents data leakage across the same image.
- Multi-task learning improves efficiency and correlation between targets.
- Ensemble averaging across folds yields smoother, more reliable predictions.
- Model checkpoints and preprocessing artifacts (encoders/scaler) are stored for reproducibility.

---

## 8. Limitations

- **Limited dataset size:** The number of unique images is relatively small, which may restrict the model’s generalization on unseen vegetation patterns or lighting conditions.  
- **Imbalanced targets:** Certain biomass components (e.g., Dry Clover vs. Dry Dead) may have uneven distributions, potentially biasing the model toward more common classes.  
- **Simplified feature fusion:** The model combines image and tabular features through simple concatenation, which might not fully capture complex interactions between environmental and visual cues.  
- **Environmental variability:** Differences in lighting, soil color, and camera orientation can introduce noise that basic augmentations may not fully mitigate.  
- **2D representation only:** The approach relies solely on RGB imagery — lacking depth or spectral information (e.g., multispectral or hyperspectral data) that could improve biomass estimation accuracy.  
- **Limited explainability:** While Grad-CAM or other XAI techniques can be applied, this implementation does not deeply interpret model decisions.  






In [1]:
#libraries
import os
import math
import random
from pathlib import Path
import joblib
import pandas as pd
import numpy as np
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import mean_squared_error
from tqdm import tqdm

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import timm
import albumentations as A
from albumentations.pytorch import ToTensorV2
import cv2



In [2]:
#configs
class CFG:
    seed = 42
    data_dir = Path('/kaggle/input/csiro-biomass')   # <--- your path
    train_csv = data_dir / 'train.csv'
    test_csv = data_dir / 'test.csv'
    train_image_dir = data_dir / 'train'
    test_image_dir = data_dir / 'test'
    img_size = 384
    backbone = 'tf_efficientnet_b3.ns_jft_in1k'
    pretrained = True
    batch_size = 16
    n_epochs = 5
    lr = 1e-4
    weight_decay = 1e-6
    n_splits = 5
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    num_workers = 4
    targets = ['Dry_Clover_g','Dry_Dead_g','Dry_Green_g','Dry_Total_g','GDM_g']
    model_dir = Path('./models')
CFG.model_dir.mkdir(parents=True, exist_ok=True)

In [3]:
#helpers
def seed_everything(seed=CFG.seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
seed_everything()

def rmse(y_true, y_pred):
    return math.sqrt(mean_squared_error(y_true, y_pred))

In [12]:
#read CSVs
train_long = pd.read_csv(CFG.train_csv) 
test_long = pd.read_csv(CFG.test_csv)    


In [5]:
train_long.head()

Unnamed: 0,sample_id,image_path,Sampling_Date,State,Species,Pre_GSHH_NDVI,Height_Ave_cm,target_name,target
0,ID1011485656__Dry_Clover_g,train/ID1011485656.jpg,2015/9/4,Tas,Ryegrass_Clover,0.62,4.6667,Dry_Clover_g,0.0
1,ID1011485656__Dry_Dead_g,train/ID1011485656.jpg,2015/9/4,Tas,Ryegrass_Clover,0.62,4.6667,Dry_Dead_g,31.9984
2,ID1011485656__Dry_Green_g,train/ID1011485656.jpg,2015/9/4,Tas,Ryegrass_Clover,0.62,4.6667,Dry_Green_g,16.2751
3,ID1011485656__Dry_Total_g,train/ID1011485656.jpg,2015/9/4,Tas,Ryegrass_Clover,0.62,4.6667,Dry_Total_g,48.2735
4,ID1011485656__GDM_g,train/ID1011485656.jpg,2015/9/4,Tas,Ryegrass_Clover,0.62,4.6667,GDM_g,16.275


In [21]:
#read train.csv first
train_long = pd.read_csv(CFG.train_csv)

#fill NaN targets with mean of their target_name column
train_long['target'] = train_long.groupby('target_name')['target'].transform(
    lambda x: x.fillna(x.mean())
)

#if there are still NaNs (all values missing for a target_name), fill with 0
train_long['target'] = train_long['target'].fillna(0.0)

In [22]:
#pivot train to image-level (wide) so each row = image with 5 target columns
def pivot_train_long_to_wide(df_long):
    meta_cols = ['sample_id','image_path','Sampling_Date','State','Species','Pre_GSHH_NDVI','Height_Ave_cm']
    pivot = df_long.pivot(index='sample_id', columns='target_name', values='target').reset_index()
    meta = df_long[meta_cols].drop_duplicates('sample_id').set_index('sample_id')
    pivot = pivot.set_index('sample_id').join(meta).reset_index()
    for t in CFG.targets:
        if t not in pivot.columns:
            pivot[t] = 0.0
    return pivot

train_wide = pivot_train_long_to_wide(train_long)

In [23]:
#fill NaNs in the long-format train table
train_long = pd.read_csv(CFG.train_csv)   # if not already loaded
train_long['target'] = train_long.groupby('target_name')['target'].transform(lambda x: x.fillna(x.mean()))
#final fallback if a target_name had all NaNs (mean would be NaN)
train_long['target'] = train_long['target'].fillna(0.0)

#optional quick check in long df
print("Per-target NaNs in train_long (should be 0):")
print(train_long.groupby('target_name')['target'].apply(lambda x: x.isna().sum()))

train_wide = pivot_train_long_to_wide(train_long)

#if any NaNs persist in the wide target columns, fill per-column mean
for t in CFG.targets:
    if train_wide[t].isna().any():
        mean_val = train_wide[t].mean()
        if np.isnan(mean_val):
            mean_val = 0.0
        train_wide[t] = train_wide[t].fillna(mean_val)

# no NaNs confirmation
print("Per-target NaNs in train_wide after filling:")
print(train_wide[CFG.targets].isna().sum())


Per-target NaNs in train_long (should be 0):
target_name
Dry_Clover_g    0
Dry_Dead_g      0
Dry_Green_g     0
Dry_Total_g     0
GDM_g           0
Name: target, dtype: int64
Per-target NaNs in train_wide after filling:
Dry_Clover_g    0
Dry_Dead_g      0
Dry_Green_g     0
Dry_Total_g     0
GDM_g           0
dtype: int64


In [24]:
#resolve local image paths (ensure these paths exist on your machine / notebook server)
def resolve_image_local_path(img_path, images_root):
    p = Path(img_path)
    if p.exists():
        return str(p)
    candidate = images_root / p.name
    if candidate.exists():
        return str(candidate)
    candidate2 = images_root / img_path
    if candidate2.exists():
        return str(candidate2)
    #fallback: return the full joined path attempt
    return str(images_root / p.name)

train_wide['image_local_path'] = train_wide['image_path'].apply(lambda p: resolve_image_local_path(p, CFG.train_image_dir))
test_long['image_local_path'] = test_long['image_path'].apply(lambda p: resolve_image_local_path(p, CFG.test_image_dir))



In [26]:
#Encoders & Scaler
#fit LabelEncoders on combined values

def safe_col(df, col):
    return df[col] if col in df.columns else pd.Series(['NA'] * len(df))

le_state = LabelEncoder()
le_species = LabelEncoder()
state_combined = pd.concat([train_wide['State'].fillna('NA'), safe_col(test_long,'State').fillna('NA')])
species_combined = pd.concat([train_wide['Species'].fillna('NA'), safe_col(test_long,'Species').fillna('NA')])
le_state.fit(state_combined)
le_species.fit(species_combined)
train_wide['State_enc'] = le_state.transform(train_wide['State'].fillna('NA'))
train_wide['Species_enc'] = le_species.transform(train_wide['Species'].fillna('NA'))

#save encoders
joblib.dump(le_state, CFG.model_dir / 'le_state.joblib')
joblib.dump(le_species, CFG.model_dir / 'le_species.joblib')

#fit scaler for NDVI & Height
num_cols = ['Pre_GSHH_NDVI','Height_Ave_cm']
scaler = StandardScaler()
scaler.fit(train_wide[num_cols].fillna(0.0))
train_wide[['ndvi_scaled','height_scaled']] = scaler.transform(train_wide[num_cols].fillna(0.0))
joblib.dump(scaler, CFG.model_dir / 'scaler.joblib')


['models/scaler.joblib']

In [27]:
#load dataset
class BiomassDataset(Dataset):
    def __init__(self, df_wide_or_long, mode='train', transforms=None):
        self.mode = mode
        self.transforms = transforms
        rows = []
        if mode == 'train':
            #df_wide: expand each image into multiple rows (one per target)
            for _, r in df_wide_or_long.iterrows():
                for t_idx, tname in enumerate(CFG.targets):
                    rows.append({
                        'sample_id': f"{r['sample_id']}__{tname}",
                        'image_local_path': r['image_local_path'],
                        'state_enc': int(r['State_enc']) if 'State_enc' in r else 0,
                        'species_enc': int(r['Species_enc']) if 'Species_enc' in r else 0,
                        'ndvi': float(r['ndvi_scaled']) if 'ndvi_scaled' in r else 0.0,
                        'height': float(r['height_scaled']) if 'height_scaled' in r else 0.0,
                        'target_name': tname,
                        'target_idx': t_idx,
                        'target': float(r[tname])
                    })
        else:
            #inference: df_long rows (test_long), map enc & scaled numeric (if present)
            le_st = joblib.load(CFG.model_dir / 'le_state.joblib')
            le_sp = joblib.load(CFG.model_dir / 'le_species.joblib')
            scal = joblib.load(CFG.model_dir / 'scaler.joblib')
            for _, r in df_wide_or_long.iterrows():
                state_val = r.get('State', 'NA')
                species_val = r.get('Species', 'NA')
                #encode if seen else assign 0
                try:
                    state_enc = int(le_st.transform([state_val.fillna('NA')])[0]) if hasattr(state_val, 'fillna') else int(le_st.transform([state_val])[0])
                except Exception:
                    state_enc = 0
                try:
                    species_enc = int(le_sp.transform([species_val.fillna('NA')])[0]) if hasattr(species_val, 'fillna') else int(le_sp.transform([species_val])[0])
                except Exception:
                    species_enc = 0
                # scale numeric if present
                ndvi_val = r.get('Pre_GSHH_NDVI', 0.0)
                height_val = r.get('Height_Ave_cm', 0.0)
                try:
                    ndvi_scaled, height_scaled = scal.transform([[ndvi_val, height_val]])[0]
                except Exception:
                    ndvi_scaled, height_scaled = 0.0, 0.0
                rows.append({
                    'sample_id': r['sample_id'],
                    'image_local_path': r['image_local_path'],
                    'state_enc': state_enc,
                    'species_enc': species_enc,
                    'ndvi': float(ndvi_scaled),
                    'height': float(height_scaled),
                    'target_name': r['target_name'],
                    'target_idx': CFG.targets.index(r['target_name']),
                    'target': 0.0
                })
        self.df = pd.DataFrame(rows).reset_index(drop=True)

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        img_path = r['image_local_path']
        if Path(img_path).exists():
            img = cv2.imread(img_path)[:,:,::-1]
        else:
            img = np.zeros((CFG.img_size, CFG.img_size, 3), dtype=np.uint8)
        if self.transforms:
            img = self.transforms(image=img)['image']
        tab = np.array([r['ndvi'], r['height'], r['state_enc'], r['species_enc']], dtype=np.float32)
        return {
            'image': img,
            'tab': torch.tensor(tab, dtype=torch.float32),
            'target': torch.tensor(r['target'], dtype=torch.float32),
            'target_idx': torch.tensor(r['target_idx'], dtype=torch.long),
            'sample_id': r['sample_id'],
            'target_name': r['target_name']
        }


In [31]:
train_transforms = A.Compose([
    A.SmallestMaxSize(max_size=CFG.img_size),
    A.RandomCrop(height=CFG.img_size, width=CFG.img_size),
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.4),
    A.ShiftScaleRotate(p=0.5),
    A.Normalize(),
    ToTensorV2()
])

valid_transforms = A.Compose([
    A.Resize(height=CFG.img_size, width=CFG.img_size),
    A.Normalize(),
    ToTensorV2()
])

In [32]:
# ---------------- Model ----------------
class MultiTaskModel(nn.Module):
    def __init__(self, backbone_name=CFG.backbone, pretrained=CFG.pretrained, n_tab=4, n_out=5, dropout=0.3):
        super().__init__()
        try:
            self.backbone = timm.create_model(backbone_name, pretrained=pretrained, num_classes=0, global_pool='avg')
        except Exception:
            self.backbone = timm.create_model('resnet50', pretrained=pretrained, num_classes=0, global_pool='avg')
        #try to get channel dimension (num_features). If not available, we'll infer in forward (but timm usually provides num_features).
        feat_dim = getattr(self.backbone, 'num_features', None)
        if feat_dim is None:
            #fallback: assume a safe default (will be corrected dynamically in forward if necessary)
            feat_dim = 1536

        self.img_head = nn.Sequential(nn.Linear(feat_dim, 512), nn.ReLU(), nn.Dropout(dropout))
        self.tab_head = nn.Sequential(nn.Linear(n_tab, 128), nn.ReLU(), nn.Dropout(dropout))
        self.fusion = nn.Sequential(nn.Linear(512 + 128, 256), nn.ReLU(), nn.Dropout(dropout), nn.Linear(256, n_out))

    def forward(self, x_img, x_tab):
        feat = self.backbone.forward_features(x_img)
        #some backbones return (B, C, H, W), some return (B, C) or tuple/list
        if isinstance(feat, (tuple, list)):
            feat = feat[0]
        if feat.dim() == 4:
            #global pool spatial dims to get (B, C, 1, 1) -> (B, C)
            feat = nn.functional.adaptive_avg_pool2d(feat, 1).squeeze(-1).squeeze(-1)
        #now feat is (B, C)
        #if img_head input size mismatch (rare), resize first linear accordingly:
        if feat.size(1) != self.img_head[0].in_features:
            #recreate img_head with correct in_features (keeps dropout and activation)
            in_feat = feat.size(1)
            self.img_head = nn.Sequential(nn.Linear(in_feat, 512), nn.ReLU(), nn.Dropout(self.img_head[2].p if isinstance(self.img_head[2], nn.Dropout) else 0.3))
        img_v = self.img_head(feat)
        tab_v = self.tab_head(x_tab)
        x = torch.cat([img_v, tab_v], dim=1)
        out = self.fusion(x)
        return out

In [33]:
#loss selection helper
def compute_batch_loss(preds, target_vals, target_idxs, criterion):
    batch_idx = torch.arange(preds.size(0), device=preds.device)
    pred_selected = preds[batch_idx, target_idxs]   # shape (B,)
    loss = criterion(pred_selected, target_vals)
    return loss, pred_selected.detach().cpu().numpy()


In [35]:
#GroupKFold CV and training
gkf = GroupKFold(n_splits=CFG.n_splits)
groups = train_wide['sample_id'].values

fold_models = []
oof_predictions = []

for fold, (tr_idx, val_idx) in enumerate(gkf.split(train_wide, groups=groups)):
    print(f"\n=== Fold {fold} ===")
    train_rows = train_wide.iloc[tr_idx].reset_index(drop=True)
    val_rows = train_wide.iloc[val_idx].reset_index(drop=True)

    train_ds = BiomassDataset(train_rows, mode='train', transforms=train_transforms)
    val_ds = BiomassDataset(val_rows, mode='train', transforms=valid_transforms)

    train_loader = DataLoader(train_ds, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers, pin_memory=True)
    val_loader = DataLoader(val_ds, batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers, pin_memory=True)

    model = MultiTaskModel().to(CFG.device)
    optimizer = torch.optim.AdamW(model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
    criterion = nn.MSELoss()

    best_rmse = float('inf')
    best_state = None

    for epoch in range(CFG.n_epochs):
        #train
        model.train()
        train_losses = []
        for batch in tqdm(train_loader, desc=f"Fold{fold}-Train", leave=False):
            imgs = batch['image'].to(CFG.device)
            tabs = batch['tab'].to(CFG.device)
            targets = batch['target'].to(CFG.device)
            tidx = batch['target_idx'].to(CFG.device)
            preds = model(imgs, tabs)
            loss, _ = compute_batch_loss(preds, targets, tidx, criterion)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            train_losses.append(loss.item())
        train_loss = np.mean(train_losses)

        #valid
        model.eval()
        val_preds = []
        val_targs = []
        val_losses = []
        with torch.no_grad():
            for batch in tqdm(val_loader, desc=f"Fold{fold}-Valid", leave=False):
                imgs = batch['image'].to(CFG.device)
                tabs = batch['tab'].to(CFG.device)
                targets = batch['target'].to(CFG.device)
                tidx = batch['target_idx'].to(CFG.device)
                preds = model(imgs, tabs)
                loss, pred_sel = compute_batch_loss(preds, targets, tidx, criterion)
                val_losses.append(loss.item())
                val_preds.append(pred_sel)
                val_targs.append(targets.detach().cpu().numpy())
        if len(val_preds) == 0:
            val_rmse = float('inf')
        else:
            #val_preds = np.concatenate(val_preds)
            #val_targs = np.concatenate(val_targs)
            if len(val_preds) == 0 or len(val_targs) == 0:
                    # no predictions collected at all
                val_rmse = float('nan')
                print("Warning: no validation predictions/targets were collected this epoch/fold.")
            else:
                val_preds = np.concatenate(val_preds)
                val_targs = np.concatenate(val_targs)

    #filter finite rows
                mask = np.isfinite(val_preds) & np.isfinite(val_targs)
                if mask.sum() == 0:
                   val_rmse = float('nan')
                   print("Warning: all validation predictions or targets are NaN/inf for this epoch/fold.")
                else:
                   if mask.sum() < len(val_preds):
                       print(f"Warning: dropping {len(val_preds) - mask.sum()} NaN/inf rows from validation before RMSE.")
                   val_rmse = rmse(val_targs[mask], val_preds[mask])
        print(f"Fold {fold} Epoch {epoch} - train_loss: {train_loss:.4f}  val_rmse: {val_rmse:.4f}")

        if val_rmse < best_rmse:
            best_rmse = val_rmse
            best_state = model.state_dict()

    #save best model for fold
    model_path = CFG.model_dir / f"model_fold{fold}.pth"
    torch.save(best_state, model_path)
    fold_models.append(model_path)
    print(f"Fold {fold} done - best val RMSE: {best_rmse:.4f} - saved {model_path}")



=== Fold 0 ===


model.safetensors:   0%|          | 0.00/49.3M [00:00<?, ?B/s]

                                                              

Fold 0 Epoch 0 - train_loss: 233.4422  val_rmse: 9.8698


                                                              

Fold 0 Epoch 1 - train_loss: 99.8467  val_rmse: 9.4217


                                                              

Fold 0 Epoch 2 - train_loss: 97.1740  val_rmse: 9.2706


                                                              

Fold 0 Epoch 3 - train_loss: 96.8171  val_rmse: 9.2437


                                                              

Fold 0 Epoch 4 - train_loss: 94.4893  val_rmse: 9.0588
Fold 0 done - best val RMSE: 9.0588 - saved models/model_fold0.pth

=== Fold 1 ===


                                                              

Fold 1 Epoch 0 - train_loss: 218.1188  val_rmse: 9.8670


                                                              

Fold 1 Epoch 1 - train_loss: 99.0569  val_rmse: 9.7137


                                                              

Fold 1 Epoch 2 - train_loss: 96.4264  val_rmse: 9.5892


                                                              

Fold 1 Epoch 3 - train_loss: 95.1854  val_rmse: 9.9165


                                                              

Fold 1 Epoch 4 - train_loss: 92.4092  val_rmse: 9.6908
Fold 1 done - best val RMSE: 9.5892 - saved models/model_fold1.pth

=== Fold 2 ===


                                                              

Fold 2 Epoch 0 - train_loss: 236.9987  val_rmse: 8.6577


                                                              

Fold 2 Epoch 1 - train_loss: 102.1559  val_rmse: 8.6350


                                                              

Fold 2 Epoch 2 - train_loss: 101.2696  val_rmse: 9.6582


                                                              

Fold 2 Epoch 3 - train_loss: 101.4635  val_rmse: 9.0793


                                                              

Fold 2 Epoch 4 - train_loss: 100.4191  val_rmse: 8.6196
Fold 2 done - best val RMSE: 8.6196 - saved models/model_fold2.pth

=== Fold 3 ===


                                                              

Fold 3 Epoch 0 - train_loss: 225.8767  val_rmse: 10.7296


                                                              

Fold 3 Epoch 1 - train_loss: 95.9180  val_rmse: 10.2803


                                                              

Fold 3 Epoch 2 - train_loss: 94.9097  val_rmse: 10.5610


                                                              

Fold 3 Epoch 3 - train_loss: 92.4140  val_rmse: 10.1019


                                                              

Fold 3 Epoch 4 - train_loss: 90.9564  val_rmse: 10.0626
Fold 3 done - best val RMSE: 10.0626 - saved models/model_fold3.pth

=== Fold 4 ===


                                                              

Fold 4 Epoch 0 - train_loss: 217.7361  val_rmse: 10.3441


                                                              

Fold 4 Epoch 1 - train_loss: 97.4368  val_rmse: 9.8352


                                                              

Fold 4 Epoch 2 - train_loss: 95.5750  val_rmse: 9.9244


                                                              

Fold 4 Epoch 3 - train_loss: 95.0858  val_rmse: 9.6352


                                                              

Fold 4 Epoch 4 - train_loss: 91.7431  val_rmse: 9.7351
Fold 4 done - best val RMSE: 9.6352 - saved models/model_fold4.pth


In [36]:
#base-level test frame (one row per base image)
def base_id_from_sample_id(sid):
    return sid.split('__')[0] if '__' in sid else sid

test_long['base_id'] = test_long['sample_id'].apply(base_id_from_sample_id)
test_base = test_long.drop_duplicates('base_id').reset_index(drop=True)

#add encodings and scaled numerics for test_base
le_st = joblib.load(CFG.model_dir / 'le_state.joblib')
le_sp = joblib.load(CFG.model_dir / 'le_species.joblib')
scal = joblib.load(CFG.model_dir / 'scaler.joblib')

def get_state_enc(val):
    try:
        return int(le_st.transform([val if pd.notna(val) else 'NA'])[0])
    except Exception:
        return 0
def get_species_enc(val):
    try:
        return int(le_sp.transform([val if pd.notna(val) else 'NA'])[0])
    except Exception:
        return 0
state_encs = [get_state_enc(v) for v in test_base.get('State', ['NA']*len(test_base))]
spec_encs = [get_species_enc(v) for v in test_base.get('Species', ['NA']*len(test_base))]

#scale numeric features (NDVI, Height) if present
ndvi_list = test_base.get('Pre_GSHH_NDVI', [0.0]*len(test_base)).fillna(0.0) if 'Pre_GSHH_NDVI' in test_base else [0.0]*len(test_base)
height_list = test_base.get('Height_Ave_cm', [0.0]*len(test_base)).fillna(0.0) if 'Height_Ave_cm' in test_base else [0.0]*len(test_base)
scaled = scal.transform(np.column_stack([ndvi_list, height_list]))
test_base['ndvi_scaled'] = scaled[:,0]
test_base['height_scaled'] = scaled[:,1]
test_base['State_enc'] = state_encs
test_base['Species_enc'] = spec_encs
test_base['image_local_path'] = test_base['image_path'].apply(lambda p: resolve_image_local_path(p, CFG.test_image_dir))

#for each fold model, predict per base image
fold_preds_list = []
for model_path in fold_models:
    model = MultiTaskModel().to(CFG.device)
    model.load_state_dict(torch.load(model_path, map_location=CFG.device))
    model.eval()
    preds_for_fold = {}
    with torch.no_grad():
        for _, r in tqdm(test_base.iterrows(), total=len(test_base), desc=f"Infer {model_path.name}", leave=False):
            img_path = r['image_local_path']
            if Path(img_path).exists():
                img = cv2.imread(img_path)[:,:,::-1]
            else:
                img = np.zeros((CFG.img_size, CFG.img_size, 3), dtype=np.uint8)
            img_t = valid_transforms(image=img)['image'].unsqueeze(0).to(CFG.device)
            tab = torch.tensor([[r['ndvi_scaled'], r['height_scaled'], int(r['State_enc']), int(r['Species_enc'])]], dtype=torch.float32).to(CFG.device)
            out = model(img_t, tab).detach().cpu().numpy()[0]   # shape (5,)
            out = np.clip(out, 0.0, None)
            preds_for_fold[r['base_id']] = out
    fold_preds_list.append(preds_for_fold)

#average predictions across folds
avg_preds = {}
for base_id in fold_preds_list[0].keys():
    stacked = np.stack([fold_preds[base_id] for fold_preds in fold_preds_list], axis=0)
    avg = np.mean(stacked, axis=0)
    avg_preds[base_id] = avg

#submission format
preds_df = pd.DataFrame([{'sample_id': bid, **{CFG.targets[i]: avg_preds[bid][i] for i in range(len(CFG.targets))}} for bid in avg_preds.keys()])
preds_long = preds_df.set_index('sample_id')[CFG.targets].stack().reset_index()
preds_long.columns = ['sample_id_base','target_name','target']
preds_long['sample_id'] = preds_long['sample_id_base'].astype(str) + '__' + preds_long['target_name'].astype(str)
submission = preds_long[['sample_id','target']].reset_index(drop=True)
submission.to_csv('submission.csv', index=False)
print("Saved submission.csv with shape:", submission.shape)
print(submission.head(8))

                                                                    

Saved submission.csv with shape: (5, 2)
                    sample_id     target
0  ID1001187975__Dry_Clover_g   6.173340
1    ID1001187975__Dry_Dead_g  11.612491
2   ID1001187975__Dry_Green_g  25.195471
3   ID1001187975__Dry_Total_g  43.003220
4         ID1001187975__GDM_g  31.838558


