# About this notebook

This is the inference of this model trained [here](https://www.kaggle.com/code/medali1992/hms-resnet1d-gru-train?scriptVersionId=163239333).

## Version 1

* `CV=0.5162483866506282` `LB=0.48`

### Hyperparams

```

   scheduler='CosineAnnealingWarmRestarts' 
   # CosineAnnealingWarmRestarts params
    cosanneal_res_params={
        'T_0':20,
        'eta_min':1e-6,
        'T_mult':1,
        'last_epoch':-1}
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_lstm'
    optimizer='Adam'
    epochs = 20
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    in_channels = 8
    fc_dim = 512
    batch_size = 64
    weight_decay = 1e-3
    seed = 2024
```
## Version 2

* Changed the model architecture
* `CV=0.5162483866506282` `LB=0.55`

### Hyperparams

```

   scheduler='CosineAnnealingWarmRestarts' 
   # CosineAnnealingWarmRestarts params
    cosanneal_res_params={
        'T_0':20,
        'eta_min':1e-6,
        'T_mult':1,
        'last_epoch':-1}
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_lstm'
    optimizer='Adam'
    epochs = 20
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    in_channels = 1
    batch_size = 32
    weight_decay = 1e-3
    seed = 2024
```

## Version 3

* Added sequence pooling for the rrnn output
The Sequence Pooling Layer is used instead of a [CLASS] token in CCTs. This layer introduces a learnable weight which allows the model to perform a weighted average over all the sequences instead of taking output from one special [CLASS] token or from simple average across all the sequences.
Taken from this [notebook](https://www.kaggle.com/code/utsavnandi/compact-convolutional-transformer-using-pytorch).
* `CV=0.5238907711166703` `LB=0.49`

```
class SeqPool(nn.Module):
    def __init__(self, emb_dim=192):
        super().__init__()
        self.dense = nn.Linear(emb_dim, 1)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        bs, seq_len, emb_dim = x.shape
        identity = x
        x = self.dense(x)
        x = x.permute(0, 2, 1)
        x = self.softmax(x)
        x = x @ identity
        x = x.reshape(x.shape[0], -1)
        return x
```

### Hyperparams

```

   scheduler='CosineAnnealingWarmRestarts' 
   # CosineAnnealingWarmRestarts params
    cosanneal_res_params={
        'T_0':20,
        'eta_min':1e-6,
        'T_mult':1,
        'last_epoch':-1}
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_lstm'
    optimizer='Adan'
    epochs = 20
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 64
    weight_decay = 1e-2
    max_grad_norm = 1e7
    seed = 2024
```

## Version 4

* `LB=0.43`
I divided my data set into two population and trained a two stage model from version1. I took the idea from this [notebook](https://www.kaggle.com/code/seanbearden/effnetb0-2-pop-model-train-twice-lb-0-39/notebook).

### Hyperparams

```

   scheduler='CosineAnnealingWarmRestarts' 
   # CosineAnnealingWarmRestarts params
    cosanneal_res_params={
        'T_0':20,
        'eta_min':1e-6,
        'T_mult':1,
        'last_epoch':-1}
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_lstm'
    optimizer='Adan'
    epochs = 20
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 64
    weight_decay = 1e-2
    max_grad_norm = 1e7
    seed = 2024
```

## Version 5

* Changed the CV Scheme to prevent data leakage.
* `LB=0.44`

```

   scheduler='CosineAnnealingWarmRestarts' 
   # CosineAnnealingWarmRestarts params
    cosanneal_res_params={
        'T_0':20,
        'eta_min':1e-6,
        'T_mult':1,
        'last_epoch':-1}
    print_freq=50
    num_workers=1
    model_name='resnet501d_lstm'
    optimizer='Adan'
    stage1_epochs=10
    stage2_epochs=20
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    in_channels=8
    batch_size=64
    weight_decay=1e-2
    max_grad_norm=1e7
    seed=2024
```

## Version 6

* I changed the CV sheme, first stage train on all data second stage train on data with total_evaluators >= 10

### Hyperparams

```

   scheduler='OneCycleLR' 
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_gru_transformer'
    optimizer='Adan'
    stage1_epochs = 10
    stage2_epochs = 20
    eps = 1e-6
    lr = 1e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 100
    weight_decay = 1e-2
    max_grad_norm = 1e7
    seed = 2024
```

## Version 7

* I changed the CV sheme, first stage train on all data second stage train on train_pop2

### Hyperparams

```

    scheduler='OneCycleLR' 
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_gru'
    optimizer='Adan'
    stage1_epochs = 10
    stage2_epochs = 10
    eps = 1e-6
    lr = 1e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 64
    weight_decay = 1e-2
    max_grad_norm = 1e7
    seed = 2024
    
```

## Version 8

* Apply downsampling of factor five

### Hyperparams

```

    scheduler='CosineAnnealingWarmRestarts' 
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_gru'
    optimizer='Adan'
    stage1_epochs = 10
    stage2_epochs = 20
    eps = 1e-6
    lr = 1e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 64
    weight_decay = 1e-2
    downsample_factor = 5
    max_grad_norm = 1e7
    seed = 2024
    
```

## Version 9

* Stage2 votes >= 5

### Hyperparams

```

    scheduler='CosineAnnealingWarmRestarts' 
    print_freq=50
    num_workers = 1
    model_name = 'resnet501d_gru'
    optimizer='Adan'
    stage1_epochs = 10
    stage2_epochs = 10
    eps = 1e-6
    lr = 1e-3
    min_lr = 1e-6
    in_channels = 8
    batch_size = 64
    weight_decay = 1e-2
    downsample_factor = 5
    max_grad_norm = 1e7
    seed = 2024
    
```

# Import libraries

In [None]:
import os
import gc
from glob import glob
import sys
import math
import time
import random
import shutil
from pathlib import Path
from typing import Dict, List
from scipy.stats import entropy
from scipy.signal import butter, lfilter, freqz
from contextlib import contextmanager
from collections import defaultdict, Counter
sys.path.append('/kaggle/input/kaggle-kl-div')
from kaggle_kl_div import score
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.metrics import accuracy_score, log_loss
from tqdm.auto import tqdm
from functools import partial
import cv2
from PIL import Image
import torch
import torch.nn as nn
import pytorch_lightning as pl
import torch.nn.functional as F
from torch.optim import Adam, SGD, AdamW
import torchvision.models as models
from torch.nn.parameter import Parameter
from torch.utils.data import DataLoader, Dataset
from torch.optim.lr_scheduler import ReduceLROnPlateau, OneCycleLR, CosineAnnealingLR, CosineAnnealingWarmRestarts
from sklearn.preprocessing import LabelEncoder
from torchvision.transforms import v2
from sklearn.model_selection import GroupKFold
from sklearn.model_selection import train_test_split
import albumentations as A
from albumentations import (Compose, Normalize, Resize, RandomResizedCrop, HorizontalFlip, VerticalFlip, ShiftScaleRotate, Transpose)
from albumentations.pytorch import ToTensorV2
from albumentations import ImageOnlyTransform
import timm
import warnings 
warnings.filterwarnings('ignore')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
from matplotlib import pyplot as plt
import joblib
os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"
VERSION=9

# Config

In [None]:
class CFG:
    PATH = '/kaggle/input/hms-harmful-brain-activity-classification/'
    test_eeg = "/kaggle/input/hms-harmful-brain-activity-classification/test_eegs/"
    test_csv = "/kaggle/input/hms-harmful-brain-activity-classification/test.csv"
    model_name = 'resnet1d_gru'
    seed = 2024
    in_channels = 8
    target_size = 6
    batch_size = 32
    num_workers = 1

    
model_weights = [x for x in glob("/kaggle/input/resnet1d-gru-weights/pop_2_weight_oof/*.pth")]
model_weights

# Utils

In [None]:
def eeg_from_parquet(parquet_path: str) -> np.ndarray:
    """
    This function reads a parquet file and extracts the middle 50 seconds of readings. Then it fills NaN values
    with the mean value (ignoring NaNs).
    :param parquet_path: path to parquet file.
    :param display: whether to display EEG plots or not.
    :return data: np.array of shape  (time_steps, eeg_features) -> (10_000, 8)
    """
    # === Extract middle 50 seconds ===
    eeg = pd.read_parquet(parquet_path, columns=eeg_features)
    rows = len(eeg)
    offset = (rows - 10_000) // 2 # 50 * 200 = 10_000
    eeg = eeg.iloc[offset:offset+10_000] # middle 50 seconds, has the same amount of readings to left and right
    # === Convert to numpy ===
    data = np.zeros((10_000, len(eeg_features))) # create placeholder of same shape with zeros
    for index, feature in enumerate(eeg_features):
        x = eeg[feature].values.astype('float32') # convert to float32
        mean = np.nanmean(x) # arithmetic mean along the specified axis, ignoring NaNs
        nan_percentage = np.isnan(x).mean() # percentage of NaN values in feature
        # === Fill nan values ===
        if nan_percentage < 1: # if some values are nan, but not all
            x = np.nan_to_num(x, nan=mean)
        else: # if all values are nan
            x[:] = 0
        data[:, index] = x
   
    return data


def seed_everything(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed) 
    
    
def sep():
    print("-"*100)

    
target_preds = [x + "_pred" for x in ['seizure_vote', 'lpd_vote', 'gpd_vote', 'lrda_vote', 'grda_vote', 'other_vote']]
label_to_num = {'Seizure': 0, 'LPD': 1, 'GPD': 2, 'LRDA': 3, 'GRDA': 4, 'Other':5}
num_to_label = {v: k for k, v in label_to_num.items()}
seed_everything(CFG.seed)

# Load data

In [None]:
test_df = pd.read_csv(CFG.test_csv)
print(f"Test dataframe shape is: {test_df.shape}")
test_df.head()

In [None]:
eeg_parquet_paths = glob(CFG.test_eeg+ "*.parquet")
eeg_df = pd.read_parquet(eeg_parquet_paths[0])
eeg_features = eeg_df.columns
print(f'There are {len(eeg_features)} raw eeg features')
print(list(eeg_features))
eeg_features = ['Fp1','T3','C3','O1','Fp2','C4','T4','O2']
feature_to_index = {x:y for x,y in zip(eeg_features, range(len(eeg_features)))}

In [None]:
%%time

CREATE_EEGS = False
all_eegs = {}
visualize = 1
eeg_paths = glob(CFG.test_eeg + "*.parquet")
eeg_ids = test_df.eeg_id.unique()

for i, eeg_id in tqdm(enumerate(eeg_ids)):  
    # Save EEG to Python dictionary of numpy arrays
    eeg_path = CFG.test_eeg + str(eeg_id) + ".parquet"
    data = eeg_from_parquet(eeg_path)              
    all_eegs[eeg_id] = data

In [None]:
from scipy.signal import butter, lfilter

def butter_lowpass_filter(data, cutoff_freq: int = 20, sampling_rate: int = 200, order: int = 4):
    nyquist = 0.5 * sampling_rate
    normal_cutoff = cutoff_freq / nyquist
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    filtered_data = lfilter(b, a, data, axis=0)
    return filtered_data

# Dataset

In [None]:
class EEGDataset(Dataset):
    def __init__(
        self, df: pd.DataFrame, config, mode: str = 'train',
        eegs: Dict[int, np.ndarray] = all_eegs, downsample: int = None
    ): 
        self.df = df
        self.config = config
        self.mode = mode
        self.eegs = eegs
        self.downsample = downsample
        
    def __len__(self):
        """
        Length of dataset.
        """
        return len(self.df)
        
    def __getitem__(self, index):
        """
        Get one item.
        """
        X, y = self.__data_generation(index)
        if self.downsample is not None:
            X = X[::self.downsample,:]
        output = {
            "eeg": torch.tensor(X, dtype=torch.float32),
            "labels": torch.tensor(y, dtype=torch.float32)
        }
        return output
                        
    def __data_generation(self, index):
        row = self.df.iloc[index]
        X = np.zeros((10_000, 8), dtype='float32')
        y = np.zeros(6, dtype='float32')
        data = self.eegs[row.eeg_id]

        # === Feature engineering ===
        X[:,0] = data[:,feature_to_index['Fp1']] - data[:,feature_to_index['T3']]
        X[:,1] = data[:,feature_to_index['T3']] - data[:,feature_to_index['O1']]

        X[:,2] = data[:,feature_to_index['Fp1']] - data[:,feature_to_index['C3']]
        X[:,3] = data[:,feature_to_index['C3']] - data[:,feature_to_index['O1']]

        X[:,4] = data[:,feature_to_index['Fp2']] - data[:,feature_to_index['C4']]
        X[:,5] = data[:,feature_to_index['C4']] - data[:,feature_to_index['O2']]

        X[:,6] = data[:,feature_to_index['Fp2']] - data[:,feature_to_index['T4']]
        X[:,7] = data[:,feature_to_index['T4']] - data[:,feature_to_index['O2']]

        # === Standarize ===
        X = np.clip(X,-1024, 1024)
        X = np.nan_to_num(X, nan=0) / 32.0

        # === Butter Low-pass Filter ===
        X = butter_lowpass_filter(X)
        if self.mode != 'test':
            y = row[self.config.target_cols].values.astype(np.float32)
            
        return X, y

# DataLoader

In [None]:
test_dataset = EEGDataset(test_df, CFG, mode='test')
test_loader = DataLoader(
    test_dataset,
    batch_size=CFG.batch_size,
    shuffle=False,
    num_workers=CFG.num_workers,
    pin_memory=True,
    drop_last=False
)
output = test_dataset[0]
X = output["eeg"]
print(f"X shape: {X.shape}")

# Model

In [None]:
class ResNet_1D_Block(nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, downsampling):
        super(ResNet_1D_Block, self).__init__()
        self.bn1 = nn.BatchNorm1d(num_features=in_channels)
        self.relu = nn.ReLU(inplace=False)
        self.dropout = nn.Dropout(p=0.0, inplace=False)
        self.conv1 = nn.Conv1d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
                               stride=stride, padding=padding, bias=False)
        self.bn2 = nn.BatchNorm1d(num_features=out_channels)
        self.conv2 = nn.Conv1d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size,
                               stride=stride, padding=padding, bias=False)
        self.maxpool = nn.MaxPool1d(kernel_size=2, stride=2, padding=0)
        self.downsampling = downsampling

    def forward(self, x):
        identity = x

        out = self.bn1(x)
        out = self.relu(out)
        out = self.dropout(out)
        out = self.conv1(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.dropout(out)
        out = self.conv2(out)

        out = self.maxpool(out)
        identity = self.downsampling(x)

        out += identity
        return out


class EEGNet(nn.Module):

    def __init__(self, kernels, in_channels=20, fixed_kernel_size=17, num_classes=6):
        super(EEGNet, self).__init__()
        self.kernels = kernels
        self.planes = 24
        self.parallel_conv = nn.ModuleList()
        self.in_channels = in_channels
        
        for i, kernel_size in enumerate(list(self.kernels)):
            sep_conv = nn.Conv1d(in_channels=in_channels, out_channels=self.planes, kernel_size=(kernel_size),
                               stride=1, padding=0, bias=False,)
            self.parallel_conv.append(sep_conv)

        self.bn1 = nn.BatchNorm1d(num_features=self.planes)
        self.relu = nn.ReLU(inplace=False)
        self.conv1 = nn.Conv1d(in_channels=self.planes, out_channels=self.planes, kernel_size=fixed_kernel_size,
                               stride=2, padding=2, bias=False)
        self.block = self._make_resnet_layer(kernel_size=fixed_kernel_size, stride=1, padding=fixed_kernel_size//2)
        self.bn2 = nn.BatchNorm1d(num_features=self.planes)
        self.avgpool = nn.AvgPool1d(kernel_size=6, stride=6, padding=2)
        self.rnn = nn.GRU(input_size=self.in_channels, hidden_size=128, num_layers=1, bidirectional=True)
        self.fc = nn.Linear(in_features=424, out_features=num_classes)

    def _make_resnet_layer(self, kernel_size, stride, blocks=9, padding=0):
        layers = []
        downsample = None
        base_width = self.planes

        for i in range(blocks):
            downsampling = nn.Sequential(
                    nn.MaxPool1d(kernel_size=2, stride=2, padding=0)
                )
            layers.append(ResNet_1D_Block(in_channels=self.planes, out_channels=self.planes, kernel_size=kernel_size,
                                       stride=stride, padding=padding, downsampling=downsampling))

        return nn.Sequential(*layers)
    def extract_features(self, x):
        x = x.permute(0, 2, 1)
        out_sep = []

        for i in range(len(self.kernels)):
            sep = self.parallel_conv[i](x)
            out_sep.append(sep)

        out = torch.cat(out_sep, dim=2)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv1(out)  

        out = self.block(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.avgpool(out)  
        
        out = out.reshape(out.shape[0], -1)  
        rnn_out, _ = self.rnn(x.permute(0, 2, 1))
        new_rnn_h = rnn_out[:, -1, :]  

        new_out = torch.cat([out, new_rnn_h], dim=1) 
        return new_out
    
    def forward(self, x):
        new_out = self.extract_features(x)
        result = self.fc(new_out)  

        return result

# Inference Function

In [None]:
def inference_function(test_loader, model, device):
    model.eval() # set model in evaluation mode
    softmax = nn.Softmax(dim=1)
    prediction_dict = {}
    preds = []
    with tqdm(test_loader, unit="test_batch", desc='Inference') as tqdm_test_loader:
        for step, batch in enumerate(tqdm_test_loader):
            X = batch.pop("eeg").to(device) # send inputs to `device`
            batch_size = X.size(0)
            with torch.no_grad():
                y_preds = model(X) # forward propagation pass
            y_preds = softmax(y_preds)
            preds.append(y_preds.to('cpu').numpy()) # save predictions
                
    prediction_dict["predictions"] = np.concatenate(preds) # np.array() of shape (fold_size, target_cols)
    return prediction_dict

# Inference 

In [None]:
predictions = []

for model_weight in model_weights:
    test_dataset = EEGDataset(test_df, CFG, mode='test')
    train_loader = DataLoader(
        test_dataset,
        batch_size=CFG.batch_size,
        shuffle=False,
        num_workers=CFG.num_workers,
        pin_memory=True,
        drop_last=False
    )
    model = EEGNet(kernels=[3,5,7,9], in_channels=CFG.in_channels, fixed_kernel_size=5, num_classes=CFG.target_size)
    checkpoint = torch.load(model_weight, map_location=device)
    model.load_state_dict(checkpoint["model"])
    model.to(device)
    prediction_dict = inference_function(test_loader, model, device)
    predictions.append(prediction_dict["predictions"])
    torch.cuda.empty_cache()
    gc.collect()
    
predictions = np.array(predictions)
predictions = np.mean(predictions, axis=0)

# Submission

In [None]:
TARGETS = ['seizure_vote', 'lpd_vote', 'gpd_vote', 'lrda_vote', 'grda_vote', 'other_vote']
sub = pd.DataFrame({'eeg_id': test_df.eeg_id.values})
sub[TARGETS] = predictions
sub.to_csv(f'submission.csv',index=False)
print(f'Submission shape: {sub.shape}')
sub.head()