# HMS - PyTorch Baseline Inference

**Comments welcome!**

One of my goals in this competition is to learn more PyTorch.

This is an **inference** notebook; the respetive training notebook is [HMS - PyTorch Baseline Training](https://www.kaggle.com/code/morodertobias/hms-pytorch-baseline-training/notebook), and its trained models have been registered as a versioned dataset [HMS - PyTorch Baseline Training Dataset](https://www.kaggle.com/datasets/morodertobias/hms-pytorch-baseline-training-dataset).

The model uses squashed spectrograms, as done in the reference notebooks. I try to use my way of coding, but naturally it is similar. 

This version uses the current version of the notebook, version 1 of dataset, and the last successful notebook run, version 8, hence 10 models in total. Each one is an EfficientNetB0 which have been fined-tuned from noisy student weights.

## Core References
- [HMS: Inference (LB: 0.42)](https://www.kaggle.com/code/andreasbis/hms-inference-lb-0-42)
- [HMS-HBAC: ResNet34d Baseline [Inference]](https://www.kaggle.com/code/ttahara/hms-hbac-resnet34d-baseline-inference)
- [HMS: Train EfficientNetB0](https://www.kaggle.com/code/andreasbis/hms-train-efficientnetb0)
- [HMS baseline_resnet34d(512*512 Training 5 folds)](https://www.kaggle.com/code/yunsuxiaozi/hms-baseline-resnet34d-512-512-training-5-folds)
- [https://www.kaggle.com/code/ttahara/hms-hbac-resnet34d-baseline-training/](https://www.kaggle.com/code/ttahara/hms-hbac-resnet34d-baseline-training/)

## Table of Contents
- [Imports](#Imports)
- [Config](#Config)
- [Prepare data](#Prepare-data)
- [Prepare model](#Prepare-model)
- [Predict](#Predict)
- [Finalize submission](#Finalize-submission)

# Imports

In [None]:
import os
import pathlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
import timm
from torch.utils.data import Dataset, DataLoader

# Config

In [None]:
class CFG:
    base_dir = pathlib.Path("/kaggle/input/hms-harmful-brain-activity-classification")
    path_test = base_dir / "test.csv"
    path_submission = base_dir / "sample_submission.csv"
    spec_dir = base_dir / "test_spectrograms"
    model_name = "tf_efficientnet_b0_ns"
    model_weights = sorted(
        list(pathlib.Path("/kaggle/input/hms-pytorch-baseline-training-dataset").glob("*.pt"))
        + list(pathlib.Path("/kaggle/input/hms-pytorch-baseline-training").glob("*.pt"))
    )
    transform = transforms.Resize((512, 512), antialias=False)
    batch_size = 16
    label_columns = [
        "seizure_vote",
        "lpd_vote",
        "gpd_vote",
        "lrda_vote",
        "grda_vote",
        "other_vote",
    ]


CFG.model_weights

# Prepare data
- Load test dataframe.
- Prepare Dataset and DataLoader.
- Check one example to see that everything is correct.

In [None]:
test = pd.read_csv(CFG.path_test)
submission = pd.read_csv(CFG.path_submission)
submission = pd.merge(submission, test, how="inner", on="eeg_id")
submission["path"] = submission["spectrogram_id"].map(lambda x: CFG.spec_dir / f"{x}.parquet")
submission

In [None]:
def preprocess(x):
    x = np.clip(x, np.exp(-6), np.exp(10))
    x = np.log(x)
    m, s = x.mean(), x.std()
    x = (x - m) / (s + 1e-6)
    return x


class SpecDataset(Dataset):
    
    def __init__(self, df, transform=CFG.transform):
        self.df = df
        self.transform = transform
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        row = self.df.iloc[index]
        # input
        x = pd.read_parquet(row.path)
        x = x.fillna(-1).values[:, 1:].T
        x = preprocess(x)
        x = torch.Tensor(x[None, :])
        if self.transform:
            x = self.transform(x)
        # output
        y = np.array(row.loc[CFG.label_columns].values, 'float32')
        y = torch.Tensor(y)
        return x, y

In [None]:
data_ds = SpecDataset(df=submission)
data_loader = DataLoader(dataset=data_ds, num_workers=os.cpu_count())
data_loader

In [None]:
x, y = next(iter(data_loader))
x.shape, x

In [None]:
plt.imshow(x[0, 0])
plt.show()

# Prepare model

In [None]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"DEVICE: {DEVICE}")

In [None]:
model = timm.create_model(model_name=CFG.model_name, pretrained=False, num_classes=6, in_chans=1)
model.to(DEVICE)
num_parameter = sum(x.numel() for x in model.parameters())
print(f"Model has {num_parameter} parameters.")

# Predict
- Load weights and compute individual predictions.
- Note, the output of the model are logits.
- Final predicition is the ensemble of all invidiual predictions.

In [None]:
prediction = pd.DataFrame(0.0, columns=CFG.label_columns, index=submission.index)
for i, path_weight in enumerate(CFG.model_weights):
    print(f"Model {i}: {path_weight}")
    model.load_state_dict(torch.load(path_weight))
    model.eval()
    with torch.no_grad():
        res = []
        for x, y in data_loader:
            x = x.to(DEVICE)
            pred = model(x)
            pred = F.softmax(pred, dim=1)
            pred = pred.detach().cpu().numpy()
            res.append(pred)
        res = np.concatenate(res)
        res = pd.DataFrame(res, columns=CFG.label_columns, index=submission.index)
        display(res)
        prediction = prediction + res
        print("\n")
prediction = prediction / len(CFG.model_weights)

In [None]:
prediction

# Finalize submission

In [None]:
submission[CFG.label_columns] = prediction
submission = submission[["eeg_id"] + CFG.label_columns]
submission

In [None]:
submission.to_csv("submission.csv", index=None)

In [None]:
!head submission.csv