# Descripion

本notebookは、[HMS Resnet1D-GRU Train notebook by Med Ali Bouchhioua](https://www.kaggle.com/code/medali1992/hms-resnet1d-gru-train?scriptVersionId=163575181)をベースにしたnotebookをさらにベースにした。

## Changes 1 [LB:0.40]:
- [3,5,7,9,11]の畳み込みカーネル
- 損失関数: Hardswish  SiLU
- オプティマイザ: Adan -> AdamW
- 最小0.5Hzのバンドパスフィルター
- Total Evaluator: 最初のデータセットでは0 ～ 5, 2番目のデータセットで6 ～最大 が使用される
- データ拡張ライブラリAlbumentations: 10 ～ 25 Hz の範囲のバンドパス フィルターでランダムな周波数をカット
- 2 ステージの 20 エポック

## Changes 2 [LB:0.38]:
- フィルターの次数が 6 から 2 に変更
    - 次数について、次数の6は、急激なジャンプがある場合に信号に非常に強い影響
- ハイカットオフ周波数が 25 Hz から 20 Hz に変更

## Changes 3 [LB:0.38]:
- アノテーターの合計について [0..2]、[3..5]、[6..1000] の 3 つの部分に分ける

## Changes 4 [LB:0.40]:
- アノテーターの合計について、[0..5]、[6..1000] の 2 つの部分に分ける
- [Med Ali Bouchhioua](https://www.kaggle.com/code/konstantinboyko/hms-resnet1d-gru-v22-human-6-train/comments#2681934) のアドバイスに従って、正則化値 0.166666667 を削除
- stage/fold セクションだけでなく、逆fold/stage セクションでもモデルをトレーニングできるコードを追加
- フィルタパラメータを追加

## Changes 5 [LB:0.39]:
- データ拡張ライブラリAlbumentations: 信号全体を誤って見逃す

## Changes 6 [LB:0.38]
- 信号サイズを半分に縮小し、信号全体からランダムに選択

## Changes 7 [LB:0.36]
- 信号サイズを5分の1に縮小し、信号全体からランダムに選択

## Changes 8 [LB:0.39]
- アノテーターの合計が [5..Max] の範囲にある 1ステージモデル

## Changes 9 [LB:0.38]
- 信号サイズを5分の1に縮小し、信号全体からランダムに選択
- アノテーターの合計: [2..2 + 6..28]

## Changes 10 [LB:]
- 信号のランダムな一般反転
- 信号のランダムな上下反転

## Changes 11 [LB:0.37]
- 鑑定者の合計は[1..2 + 4..5]、[6..28]の2つの部分に分かれています。

## Changes 12 [LB:]
- 鑑定者の合計は[1..5 -4(GPD)]、[6..28]の2つの部分に分かれています。

## [Final Dataset](https://www.kaggle.com/datasets/konstantinboyko/hms-resnet1d-gru-weights-v82)

## [Previous Train](https://www.kaggle.com/code/konstantinboyko/hms-resnet1d-gru-v33-human-5-stage-1-train)

## [Inference Notebook for 82nd Dataset](https://www.kaggle.com/code/konstantinboyko/hms-resnet1d-gru-inference-1-5-dataset)


# Library

In [1]:
import os
import gc
import sys
import math
import time
import random
import datetime as dt
import numpy as np
import pandas as pd
import wandb

from glob import glob
from pathlib import Path
from typing import Dict, List, Union
import scipy.signal as scisig
from scipy.signal import butter, lfilter, freqz
from matplotlib import pyplot as plt
from tqdm.auto import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam, SGD, AdamW
from torch.utils.data import DataLoader, Dataset
from torch.optim.lr_scheduler import (
    ReduceLROnPlateau,
    OneCycleLR,
    CosineAnnealingLR,
    CosineAnnealingWarmRestarts,
)
from torch.optim.optimizer import Optimizer
from sklearn.model_selection import GroupKFold

#import cupy as cp
#import cupyx.scipy.signal as cpsig

sys.path.append("/kaggle/input/kaggle-kl-div")
import kaggle_kl_div

import warnings

warnings.filterwarnings("ignore")

device = torch.device("cuda")
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

!cat /etc/os-release | grep -oP "PRETTY_NAME=\"\K([^\"]*)"
print(f"BUILD_DATE={os.environ['BUILD_DATE']}, CONTAINER_NAME={os.environ['CONTAINER_NAME']}")

try:
    print(
        f"PyTorch Version:{torch.__version__}, CUDA is available:{torch.cuda.is_available()}, Version CUDA:{torch.version.cuda}"
    )
    print(
        f"Device Capability:{torch.cuda.get_device_capability()}, {torch.cuda.get_arch_list()}"
    )
    print(
        f"CuDNN Enabled:{torch.backends.cudnn.enabled}, Version:{torch.backends.cudnn.version()}"
    )
except Exception:
    pass

Ubuntu 20.04.6 LTS
BUILD_DATE=20240318-171459, CONTAINER_NAME=tf2-gpu/2-15+cu121
PyTorch Version:2.1.2, CUDA is available:True, Version CUDA:12.1
Device Capability:(7, 5), ['sm_60', 'sm_70', 'sm_75', 'compute_70', 'compute_75']
CuDNN Enabled:True, Version:8900


# Directory settings

In [2]:
class APP:
    jupyter = "ipykernel" in globals()
    if not jupyter:
        try:
            if "IPython" in globals().get("__doc__", ""):
                jupyter = True
        except Exception as inst:
            print(inst)

    kaggle = os.environ.get("KAGGLE_KERNEL_RUN_TYPE", "") != ""
    local = os.environ.get("DOCKER_USING", "") == "LOCAL"
    date_time_start = dt.datetime.now()
    dt_start_ymd_hms = date_time_start.strftime("%Y.%m.%d_%H-%M-%S")

    file_run_path = ""
    if jupyter:
        try:
            file_run_path = Path(globals().get("__vsc_ipynb_file__", ""))
        except Exception as inst:
            print(inst)

    else:
        try:
            file_run_path = Path(__file__)
        except Exception as inst:
            print(inst)

    file_run_name = file_run_path.stem
    path_app = file_run_path.parent
    path_run = Path(os.getcwd())
    path_out = (
        Path("/kaggle/working")
        if kaggle
        else file_run_path / f"{file_run_name}_{dt_start_ymd_hms}"
    )


OUTPUT_DIR = "./"
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

print(f"jupyter:{APP.jupyter}, kaggle:{APP.kaggle}, local:{APP.local}")
print(APP.file_run_path)
print(APP.path_out)

jupyter:True, kaggle:True, local:False
.
/kaggle/working


# Config

In [3]:
class CFG:
    VERSION = '2'

    wandb = False
    debug = False
    create_eegs = False
    apex = True
    visualize = False
    save_all_models = True

    # リソースの設定
    if debug:
        num_workers = 0
        parallel = False
    else:
        num_workers = os.cpu_count()
        parallel = True

    # モデル学習の設定
    model_name = "resnet1d_gru"
    # optimizer = "Adan"
    optimizer = "AdamW"
    # 学習のハイパラメータ
    factor = 0.9
    eps = 1e-6
    lr = 8e-3
    min_lr = 1e-6
    batch_size = 64
    batch_koef_valid = 2
    batch_scheduler = True
    weight_decay = 1e-2
    gradient_accumulation_steps = 1
    max_grad_norm = 1e7

    fixed_kernel_size = 5

    # linear_layer_features = 424
    # kernels = [3, 5, 7, 9]
    #linear_layer_features = 448  # Full Signal = 10_000
    #linear_layer_features = 352  # Half Signal = 5_000
    linear_layer_features = 304   # 1/4, 1/5, 1/6  Signal = 2_000
    #linear_layer_features = 280  # 1/10  Signal = 1_000
    # linear_layer_features = 1000

    kernels = [3, 5, 7, 9, 11]
    # kernels = [5, 7, 9, 11, 13]

    seq_length = 50  # Second's
    sampling_rate = 200  # Hz
    nsamples = seq_length * sampling_rate  # サンプル数 10_000
    n_split_samples = 5
    out_samples = nsamples // n_split_samples  # 2_000
    sample_delta = nsamples - out_samples  # 8_000
    sample_offset = sample_delta // 2
    multi_validation = False

    # 2-stageの設定
    train_by_stages = True
    train_by_folds = False

    # 'GPD', 'GRDA', 'LPD', 'LRDA', 'Other', 'Seizure'
    n_stages = 2
    match n_stages:
        # case 1:
        #     train_stages = [0]
        #     epochs = [100]
        #     test_total_eval = 2
        #     total_evals_old = [[(2, 3), (6, 29)]]  # Deprecated
        #     total_evaluators = [ 
        #         [   
        #             {'band':(2, 2), 'excl_evals':[]}, 
        #             {'band':(6, 28), 'excl_evals':[]},
        #         ], 
        #     ]            
        case 2:
            train_stages = [0, 1]
            epochs = [50, 100]
            test_total_eval = 0
            total_evaluators = [ 
                [   
                    {'band':(0, 9), 'excl_evals':[]},
                ], 
                [   
                    {'band':(10, 10000), 'excl_evals':[]}, 
                ], 
            ]            
        # case 3:
        #     train_stages = [0, 1, 2]
        #     epochs = [20, 50, 100]
        #     test_total_eval = 0
        #     total_evals_old = [(0, 3), (3, 6), (6, 29)]  # Deprecated
        #     total_evaluators = [ 
        #         [   
        #             {'band':(0, 2), 'excl_evals':[]}, 
        #         ], 
        #         [   
        #             {'band':(3, 5), 'excl_evals':[]}, 
        #         ], 
        #         [   
        #             {'band':(6, 28), 'excl_evals':[]},
        #         ], 
        #     ]            
    
    n_fold = 5
    train_folds = [0, 1, 2, 3, 4]
    # train_folds = [0]

    patience = 11
    seed = 2024

    bandpass_filter = {"low": 0.5, "high": 20, "order": 2}
    rand_filter = {"probab": 0.1, "low": 10, "high": 20, "band": 1.0, "order": 2}
    freq_channels = []  # [(8.0, 12.0)]; [(0.5, 4.5)]
    filter_order = 2

    random_divide_signal = 0.05
    random_close_zone = 0.05
    random_common_negative_signal = 0.0
    random_common_reverse_signal = 0.0
    random_negative_signal = 0.05
    random_reverse_signal = 0.05

    log_step = 100  # ワークアウト表示ステップ
    log_show = False

    scheduler = "CosineAnnealingWarmRestarts"  # ['ReduceLROnPlateau', 'CosineAnnealingLR', 'CosineAnnealingWarmRestarts','OneCycleLR']

    # CosineAnnealingLR params
    cosanneal_params = {
        "T_max": 6,
        "eta_min": 1e-5,
        "last_epoch": -1,
    }

    # ReduceLROnPlateau params
    reduce_params = {
        "mode": "min",
        "factor": 0.2,
        "patience": 4,
        "eps": 1e-6,
        "verbose": True,
    }

    # CosineAnnealingWarmRestarts params
    cosanneal_res_params = {
        "T_0": 20,
        "eta_min": 1e-6,
        "T_mult": 1,
        "last_epoch": -1,
    }

    target_cols = [
        "seizure_vote",
        "lpd_vote",
        "gpd_vote",
        "lrda_vote",
        "grda_vote",
        "other_vote",
    ]

    pred_cols = [x + "_pred" for x in target_cols]

    map_features = [
        ("Fp1", "T3"),
        ("T3", "O1"),
        ("Fp1", "C3"),
        ("C3", "O1"),
        ("Fp2", "C4"),
        ("C4", "O2"),
        ("Fp2", "T4"),
        ("T4", "O2"),
        #('Fz', 'Cz'), ('Cz', 'Pz'),
    ]

    eeg_features = ["Fp1", "T3", "C3", "O1", "Fp2", "C4", "T4", "O2"]  # 'Fz', 'Cz', 'Pz'
        # 'F3', 'P3', 'F7', 'T5', 'Fz', 'Cz', 'Pz', 'F4', 'P4', 'F8', 'T6', 'EKG']
    feature_to_index = {x: y for x, y in zip(eeg_features, range(len(eeg_features)))}
    simple_features = []  # 'Fz', 'Cz', 'Pz', 'EKG'

    # eeg_features = [row for row in feature_to_index]
    # eeg_feat_size = len(eeg_features)
    
    n_map_features = len(map_features)
    in_channels = n_map_features + n_map_features * len(freq_channels) + len(simple_features)
    target_size = len(target_cols)

    # PATHに関する設定
    # path_inp = Path("/kaggle/input")
    path_inp = Path("../input")
    path_src = path_inp / "hms-harmful-brain-activity-classification/"
    file_train = path_src / "train.csv"
    path_train = path_src / "train_eegs"
    file_features_test = path_train / "100261680.parquet"
    file_eeg_specs = path_inp / "eeg-spectrogram-by-lead-id-unique/eeg_specs.npy"   # eegから作ったspectrogram
    file_raw_eeg = path_inp / "brain-eegs/eegs.npy"                                 # 生のeeg
    #file_raw_eeg = path_inp / "brain-eegs-plus/eegs.npy"
    #file_raw_eeg = path_inp / "brain-eegs-full/eegs.npy"

    if APP.kaggle:
        num_workers = 2
        parallel = True
        # GPU_DEVICES = "auto"


# print(CFG.eeg_feat_size, CFG.in_channels)
print(CFG.feature_to_index)
print(CFG.eeg_features)

{'Fp1': 0, 'T3': 1, 'C3': 2, 'O1': 3, 'Fp2': 4, 'C4': 5, 'T4': 6, 'O2': 7}
['Fp1', 'T3', 'C3', 'O1', 'Fp2', 'C4', 'T4', 'O2']


# Utils

In [4]:
def init_logger(log_file=OUTPUT_DIR + "train.log"):
    from logging import getLogger, INFO, FileHandler, Formatter, StreamHandler

    logger = getLogger(__name__)
    logger.setLevel(INFO)
    handler1 = StreamHandler()
    handler1.setFormatter(Formatter("%(message)s"))
    handler2 = FileHandler(filename=log_file)
    handler2.setFormatter(Formatter("%(message)s"))
    logger.addHandler(handler1)
    logger.addHandler(handler2)
    return logger


LOGGER = init_logger()


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return "%dm %ds" % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return "%s (remain %s)" % (asMinutes(s), asMinutes(rs))


def quantize_data(data, classes):
    mu_x = mu_law_encoding(data, classes)
    return mu_x  # quantized


def mu_law_encoding(data, mu):
    mu_x = np.sign(data) * np.log(1 + mu * np.abs(data)) / np.log(mu + 1)
    return mu_x


def mu_law_expansion(data, mu):
    s = np.sign(data) * (np.exp(np.abs(data) * np.log(mu + 1)) - 1) / mu
    return s


def butter_bandpass(lowcut, highcut, fs, order=5):
    return butter(order, [lowcut, highcut], fs=fs, btype="band")


def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = lfilter(b, a, data)
    return y


def butter_lowpass_filter(
    data, cutoff_freq=20, sampling_rate=CFG.sampling_rate, order=4
):
    nyquist = 0.5 * sampling_rate
    normal_cutoff = cutoff_freq / nyquist
    b, a = butter(order, normal_cutoff, btype="low", analog=False)
    filtered_data = lfilter(b, a, data, axis=0)
    return filtered_data


def denoise_filter(x):
    # サンプリング周波数と希望のカットオフ周波数 (Hz 単位)。
     # ノイズの多い信号をフィルタリングして除去する
    y = butter_bandpass_filter(x, CFG.lowcut, CFG.highcut, CFG.sampling_rate, order=6)
    y = (y + np.roll(y, -1) + np.roll(y, -2) + np.roll(y, -3)) / 4
    y = y[0:-1:4]
    return y

# Parquet to EEG Signals Numpy Processing

In [5]:
def eeg_from_parquet(
    parquet_path: str, display: bool = False, seq_length=CFG.seq_length
) -> np.ndarray:
    """
    この関数はparquetファイルを読み取り、読み取り値の中央の 50 秒を抽出します。
    次に、NaNを無視した平均値でNaN 値を埋めます
    @Args:
     :param parquet_path: 寄木細工ファイルへのパス。
     :param display: EEG プロットを表示するかどうか。
    @Returs:
     :return  np.array (time_steps, eeg_features) -> (10_000, 8)
    """

    # === 中央の50秒を取得する ===
    # 読み込み
    eeg = pd.read_parquet(parquet_path, columns=CFG.eeg_features)
    rows = len(eeg)

    # 中央を切り取るためのデータの開始オフセット
    offset = (rows - CFG.nsamples) // 2

    # 平均 50 秒、左右で同じ数の読み取り値がある
    eeg = eeg.iloc[offset : offset + CFG.nsamples]

    if display:
        plt.figure(figsize=(10, 5))
        offset = 0

    # === NumPyに変換 ===

    # 同じサイズ（行×列）のゼロで初期化されたNumPy配列
    data = np.zeros((CFG.nsamples, len(CFG.eeg_features)))

    for index, feature in enumerate(CFG.eeg_features):
        x = eeg[feature].values.astype("float32")   # float32に変換

        # NaNを無視した、指定した軸方向の平均を計算
        mean = np.nanmean(x)
        nan_percentage = np.isnan(x).mean()  # 特徴量内のNaNの割合

        # Nan 値を埋める
        # NaN を要素ごとにチェックし、結果を論理配列として返します。
        if nan_percentage < 1:  # 一部の値が Nan であるが、すべてが Nan ではない場合
            x = np.nan_to_num(x, nan=mean)
        else:  # すべての値がNanの場合
            x[:] = 0
        data[:, index] = x

        if display:
            if index != 0:
                offset += x.max()
            plt.plot(range(CFG.nsamples), x - offset, label=feature)
            offset -= x.min()

    if display:
        plt.legend()
        name = parquet_path.split("/")[-1].split(".")[0]
        plt.yticks([])
        plt.title(f"EEG {name}", size=16)
        plt.show()

    return data

# Dataset

In [6]:
class EEGDataset(Dataset):
    def __init__(
        self,
        df: pd.DataFrame,
        batch_size: int,
        eegs: Dict[int, np.ndarray],
        mode: str = "train",
        downsample: int = None,
        bandpass_filter: Dict[str, Union[int, float]] = None,
        rand_filter: Dict[str, Union[int, float]] = None,
    ):
        self.df = df
        self.batch_size = batch_size
        self.mode = mode
        self.eegs = eegs
        self.downsample = downsample
        self.offset = None
        self.bandpass_filter = bandpass_filter
        self.rand_filter = rand_filter
        
    def __len__(self):
        """
        Length of dataset.
        """
        # エポックあたりのパケット数を示します
        return len(self.df)

    def __getitem__(self, index):
        """
        Get one item.
        """
        # 1つのデータパケットを生成する
        X, y_prob = self.__data_generation(index)
        if self.downsample is not None:
            X = X[:: self.downsample, :]
        output = {
            "eeg": torch.tensor(X, dtype=torch.float32),
            "labels": torch.tensor(y_prob, dtype=torch.float32),
        }
        return output

    def set_offset(self, offset: int):
        self.offset = offset

    def __data_generation(self, index):
        """
        EEGデータの前処理部分。元コードからの変更多い。
        """
        # バッチサイズのサンプルを含むデータを生成
        X = np.zeros(
            (CFG.out_samples, CFG.in_channels), dtype="float32"
        )  # Size=(10000, 14)

        random_divide_signal = False    # ランダムに信号を分割するかのフラグ
        row = self.df.iloc[index]  # pandasのデータフレームから指定されたインデックスの行を取得
        data = self.eegs[row.eeg_id]  # 脳波データを取得, Size=(10000, 8)

        # データの一部を切り出す処理
        if CFG.nsamples != CFG.out_samples:
            if self.mode == "train":
                offset = (CFG.sample_delta * random.randint(0, 1000)) // 1000 # ランダムなオフセットを設定し、EEGデータから一部を切り出す
            elif not self.offset is None:   # オフセットが指定されている場合
                offset = self.offset
            else:   # デフォルトのオフセットを使用
                offset = CFG.sample_offset

            # train で信号をランダムに分割する場合
            if self.mode == "train" and CFG.random_divide_signal > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_divide_signal:
                random_divide_signal = True
                multipliers = [(1, 2), (2, 3), (3, 4), (3, 5)] # 信号分割の倍率のリスト
                koef_1, koef_2 = multipliers[random.randint(0, 3)] # ランダムに倍率を選択
                offset = (koef_1 * offset) // koef_2 # オフセットを調整
                data = data[offset:offset+(CFG.out_samples * koef_2) // koef_1,:] # 信号を分割して取得
            else:
                data = data[offset:offset+CFG.out_samples,:] # 指定された範囲の信号を取得

        reverse_signal = False # 信号を反転するかどうかのフラグ
        negative_signal = False # 信号を負の値にするかどうかのフラグ
        if self.mode == "train":
            # ランダムに信号を反転する場合
            if CFG.random_common_reverse_signal > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_common_reverse_signal:
                reverse_signal = True # フラグをTrueに設定
            # ランダムに信号を負の値にする場合
            if CFG.random_common_negative_signal > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_common_negative_signal:
                negative_signal = True # フラグをTrueに設定

        # 設定された特徴量ペアに対してループ
        for i, (feat_a, feat_b) in enumerate(CFG.map_features):
            if self.mode == "train" and CFG.random_close_zone > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_close_zone:  # 訓練モードでランダムにゾーンをクローズする場合
                continue
            
            diff_feat = (
                data[:, CFG.feature_to_index[feat_a]]
                - data[:, CFG.feature_to_index[feat_b]]
            ) # 指定された2つの特徴量の差分を計算 Size=(10000,)

            if self.mode == "train":
                if reverse_signal or CFG.random_reverse_signal > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_reverse_signal:
                    diff_feat = np.flip(diff_feat) # 信号を反転
                if negative_signal or CFG.random_negative_signal > 0.0 and random.uniform(0.0, 1.0) <= CFG.random_negative_signal:
                    diff_feat = -diff_feat # 信号を負の値に変換

            if not self.bandpass_filter is None: # バンドパスフィルタが設定されている場合
                diff_feat = butter_bandpass_filter(
                    diff_feat,
                    self.bandpass_filter["low"],
                    self.bandpass_filter["high"],
                    CFG.sampling_rate,
                    order=self.bandpass_filter["order"],
                ) # バンドパスフィルタを適用
            
            if random_divide_signal: # 信号が分割されている場合
                #diff_feat = cp.asnumpy(cpsig.upfirdn([1.0, 1, 1.0], diff_feat, 2, 3))  # linear interp, rate 2/3
                diff_feat = scisig.upfirdn([1.0, 1, 1.0], diff_feat, koef_1, koef_2)  # 線形補間を行いサンプリングレートを変更
                diff_feat = diff_feat[0:CFG.out_samples] # 必要な長さに切り詰める

            if (
                self.mode == "train"
                and not self.rand_filter is None
                and random.uniform(0.0, 1.0) <= self.rand_filter["probab"]
            ): # 訓練モードでランダムフィルタが設定されている場合
                lowcut = random.randint(
                    self.rand_filter["low"], self.rand_filter["high"]
                ) # ランダムに低域カットオフ周波数を選択
                highcut = lowcut + self.rand_filter["band"] # 高域カットオフ周波数を設定
                diff_feat = butter_bandpass_filter(
                    diff_feat,
                    lowcut,
                    highcut,
                    CFG.sampling_rate,
                    order=self.rand_filter["order"],
                ) # ランダムフィルタを適用

            X[:, i] = diff_feat # 前処理された特徴量を入力データXに格納

        n = CFG.n_map_features # 次の特徴量のインデックス
        if len(CFG.freq_channels) > 0: # 周波数領域の特徴量が設定されている場合
            for i in range(CFG.n_map_features): # 既存の特徴量に対してループ
                diff_feat = X[:, i] # 特徴量を取得
                for j, (lowcut, highcut) in enumerate(CFG.freq_channels): # 設定された周波数帯域に対してループ
                    band_feat = butter_bandpass_filter(
                        diff_feat, lowcut, highcut, CFG.sampling_rate, order=CFG.filter_order,  # 6
                    ) # バンドパスフィルタを適用
                    X[:, n] = band_feat # フィルタリングされた特徴量を入力データXに格納
                    n += 1  # 次の特徴量のインデックスに進む

        for spml_feat in CFG.simple_features: # 単純な特徴量が設定されている場合
            feat_val = data[:, CFG.feature_to_index[spml_feat]] # 特徴量を取得
            
            if not self.bandpass_filter is None: # バンドパスフィルタが設定されている場合
                feat_val = butter_bandpass_filter(
                    feat_val,
                    self.bandpass_filter["low"],
                    self.bandpass_filter["high"],
                    CFG.sampling_rate,
                    order=self.bandpass_filter["order"],
                ) # バンドパスフィルタを適用

            if (
                self.mode == "train"
                and not self.rand_filter is None
                and random.uniform(0.0, 1.0) <= self.rand_filter["probab"]
            ): # 訓練モードでランダムフィルタが設定されている場合
                lowcut = random.randint(
                    self.rand_filter["low"], self.rand_filter["high"]
                ) # ランダムに低域カットオフ周波数を選択
                highcut = lowcut + self.rand_filter["band"]
                feat_val = butter_bandpass_filter(
                    feat_val,
                    lowcut,
                    highcut,
                    CFG.sampling_rate,
                    order=self.rand_filter["order"],
                ) # ランダムフィルタを適用

            X[:, n] = feat_val # 前処理された特徴量を入力データXに格納
            n += 1 # 次の特徴量のインデックスに進む
            
        # [-1024, 1024] を超えるエッジをトリム(切り詰め)
        X = np.clip(X, -1024, 1024)

        # NaN をゼロに置き換え、すべてを 32 で割ります
        X = np.nan_to_num(X, nan=0) / 32.0

        # バンドパスフィルターで20Hzの上限をカットします。
        X = butter_lowpass_filter(X, order=CFG.filter_order)  # 4

        y_prob = np.zeros(CFG.target_size, dtype="float32")  # ターゲット変数y_probを初期化 (Size=(6,))
        if self.mode != "test":  # テストモードでない場合
            y_prob = row[CFG.target_cols].values.astype(np.float32)  # データフレームからターゲット変数の値を取得

        return X, y_prob # 前処理された入力データXとターゲット変数y_probを返す

# Helper functions

In [7]:
class KLDivLossWithLogits(nn.KLDivLoss):
    def __init__(self):
        super().__init__(reduction="batchmean")

    def forward(self, y, t):
        y = nn.functional.log_softmax(y, dim=1)
        loss = super().forward(y, t)
        return loss


class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


def seed_torch(seed):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    # torch.backends.cudnn.benchmark = True  # このオプションには大量の GPU メモリが必要です
    # pl.seed_everything(seed)

# Model

In [8]:
class ResNet_1D_Block(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size,
        stride,
        padding,
        downsampling,
        dilation=1,
        groups=1,
        dropout=0.0,
    ):
        super(ResNet_1D_Block, self).__init__()

        self.bn1 = nn.BatchNorm1d(num_features=in_channels) # バッチ正規化
        # self.relu = nn.ReLU(inplace=False)                # ReLU2層
        # self.relu_1 = nn.PReLU()
        # self.relu_2 = nn.PReLU()
        self.relu_1 = nn.Hardswish()
        self.relu_2 = nn.Hardswish()

        self.dropout = nn.Dropout(p=dropout, inplace=False) # ドロップアウト
        self.conv1 = nn.Conv1d(                             # 畳み込み層
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            groups=groups,
            bias=False,
        )

        self.bn2 = nn.BatchNorm1d(num_features=out_channels) # バッチ正規化
        self.conv2 = nn.Conv1d(                              # 畳み込み層
            in_channels=out_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            groups=groups,
            bias=False,
        )

        self.maxpool = nn.MaxPool1d(                        # MaxPooling
            kernel_size=2,
            stride=2,
            padding=0,
            dilation=dilation,
        )
        self.downsampling = downsampling                    # ダウンサンプリング

    def forward(self, x):                                   # 順伝搬
        identity = x

        out = self.bn1(x)
        out = self.relu_1(out)
        out = self.dropout(out)
        out = self.conv1(out)
        out = self.bn2(out)
        out = self.relu_2(out)
        out = self.dropout(out)
        out = self.conv2(out)

        out = self.maxpool(out)
        identity = self.downsampling(x)

        out += identity
        return out


class EEGNet(nn.Module):
    def __init__(
        self,
        kernels,
        in_channels,
        fixed_kernel_size,
        num_classes,
        linear_layer_features,
        dilation=1,
        groups=1,
    ):
        super(EEGNet, self).__init__()
        self.kernels = kernels
        self.planes = 24
        self.parallel_conv = nn.ModuleList()
        self.in_channels = in_channels

        for i, kernel_size in enumerate(list(self.kernels)):
            sep_conv = nn.Conv1d(
                in_channels=in_channels,
                out_channels=self.planes,
                kernel_size=(kernel_size),
                stride=1,
                padding=0,
                dilation=dilation,
                groups=groups,
                bias=False,
            )
            self.parallel_conv.append(sep_conv)

        self.bn1 = nn.BatchNorm1d(num_features=self.planes)
        # self.relu = nn.ReLU(inplace=False)
        # self.relu_1 = nn.ReLU()
        # self.relu_2 = nn.ReLU()
        self.relu_1 = nn.SiLU()
        self.relu_2 = nn.SiLU()

        self.conv1 = nn.Conv1d(
            in_channels=self.planes,
            out_channels=self.planes,
            kernel_size=fixed_kernel_size,
            stride=2,
            padding=2,
            dilation=dilation,
            groups=groups,
            bias=False,
        )

        self.block = self._make_resnet_layer(
            kernel_size=fixed_kernel_size,
            stride=1,
            dilation=dilation,
            groups=groups,
            padding=fixed_kernel_size // 2,
        )
        self.bn2 = nn.BatchNorm1d(num_features=self.planes)
        self.avgpool = nn.AvgPool1d(kernel_size=6, stride=6, padding=2)

        self.rnn = nn.GRU(
            input_size=self.in_channels,
            hidden_size=128,
            num_layers=1,
            bidirectional=True,
            # dropout=0.2,
        )

        self.fc = nn.Linear(in_features=linear_layer_features, out_features=num_classes)

    def _make_resnet_layer(
        self,
        kernel_size,
        stride,
        dilation=1,
        groups=1,
        blocks=9,
        padding=0,
        dropout=0.0,
    ):
        layers = []
        downsample = None
        base_width = self.planes

        for i in range(blocks):
            downsampling = nn.Sequential(
                nn.MaxPool1d(kernel_size=2, stride=2, padding=0)
            )
            layers.append(
                ResNet_1D_Block(
                    in_channels=self.planes,
                    out_channels=self.planes,
                    kernel_size=kernel_size,
                    stride=stride,
                    padding=padding,
                    downsampling=downsampling,
                    dilation=dilation,
                    groups=groups,
                    dropout=dropout,
                )
            )
        return nn.Sequential(*layers)

    def extract_features(self, x):
        x = x.permute(0, 2, 1)

        out_sep = []
        for i in range(len(self.kernels)):
            sep = self.parallel_conv[i](x)
            out_sep.append(sep)

        out = torch.cat(out_sep, dim=2)
        out = self.bn1(out)
        out = self.relu_1(out)
        out = self.conv1(out)

        out = self.block(out)
        out = self.bn2(out)
        out = self.relu_2(out)
        out = self.avgpool(out)

        out = out.reshape(out.shape[0], -1)
        rnn_out, _ = self.rnn(x.permute(0, 2, 1))
        new_rnn_h = rnn_out[:, -1, :]

        new_out = torch.cat([out, new_rnn_h], dim=1)
        return new_out

    def forward(self, x):
        new_out = self.extract_features(x)
        result = self.fc(new_out)
        return result

# Adan Optimizer

In [9]:
class Adan(Optimizer):
    """
    Implements a pytorch variant of Adan
    Adan was proposed in
    Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models[J]. arXiv preprint arXiv:2208.06677, 2022.
    https://arxiv.org/abs/2208.06677
    Arguments:
        params (iterable): iterable of parameters to optimize or dicts defining parameter groups.
        lr (float, optional): learning rate. (default: 1e-3)
        betas (Tuple[float, float, flot], optional): coefficients used for computing
            running averages of gradient and its norm. (default: (0.98, 0.92, 0.99))
        eps (float, optional): term added to the denominator to improve
            numerical stability. (default: 1e-8)
        weight_decay (float, optional): decoupled weight decay (L2 penalty) (default: 0)
        max_grad_norm (float, optional): value used to clip
            global grad norm (default: 0.0 no clip)
        no_prox (bool): how to perform the decoupled weight decay (default: False)
    """

    def __init__(
        self,
        params,
        lr=1e-3,
        betas=(0.98, 0.92, 0.99),
        eps=1e-8,
        weight_decay=0.2,
        max_grad_norm=0.0,
        no_prox=False,
    ):
        if not 0.0 <= max_grad_norm:
            raise ValueError("Invalid Max grad norm: {}".format(max_grad_norm))
        if not 0.0 <= lr:
            raise ValueError("Invalid learning rate: {}".format(lr))
        if not 0.0 <= eps:
            raise ValueError("Invalid epsilon value: {}".format(eps))
        if not 0.0 <= betas[0] < 1.0:
            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
        if not 0.0 <= betas[1] < 1.0:
            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
        if not 0.0 <= betas[2] < 1.0:
            raise ValueError("Invalid beta parameter at index 2: {}".format(betas[2]))
        defaults = dict(
            lr=lr,
            betas=betas,
            eps=eps,
            weight_decay=weight_decay,
            max_grad_norm=max_grad_norm,
            no_prox=no_prox,
        )
        super(Adan, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(Adan, self).__setstate__(state)
        for group in self.param_groups:
            group.setdefault("no_prox", False)

    @torch.no_grad()
    def restart_opt(self):
        for group in self.param_groups:
            group["step"] = 0
            for p in group["params"]:
                if p.requires_grad:
                    state = self.state[p]
                    # State initialization

                    # Exponential moving average of gradient values
                    state["exp_avg"] = torch.zeros_like(p)
                    # Exponential moving average of squared gradient values
                    state["exp_avg_sq"] = torch.zeros_like(p)
                    # Exponential moving average of gradient difference
                    state["exp_avg_diff"] = torch.zeros_like(p)

    @torch.no_grad()
    def step(self):
        """
        Performs a single optimization step.
        """
        if self.defaults["max_grad_norm"] > 0:
            device = self.param_groups[0]["params"][0].device
            global_grad_norm = torch.zeros(1, device=device)

            max_grad_norm = torch.tensor(self.defaults["max_grad_norm"], device=device)
            for group in self.param_groups:

                for p in group["params"]:
                    if p.grad is not None:
                        grad = p.grad
                        global_grad_norm.add_(grad.pow(2).sum())

            global_grad_norm = torch.sqrt(global_grad_norm)

            clip_global_grad_norm = torch.clamp(
                max_grad_norm / (global_grad_norm + group["eps"]), max=1.0
            )
        else:
            clip_global_grad_norm = 1.0

        for group in self.param_groups:
            beta1, beta2, beta3 = group["betas"]
            # assume same step across group now to simplify things
            # per parameter step can be easily support by making it tensor, or pass list into kernel
            if "step" in group:
                group["step"] += 1
            else:
                group["step"] = 1

            bias_correction1 = 1.0 - beta1 ** group["step"]
            bias_correction2 = 1.0 - beta2 ** group["step"]
            bias_correction3 = 1.0 - beta3 ** group["step"]

            for p in group["params"]:
                if p.grad is None:
                    continue

                state = self.state[p]
                if len(state) == 0:
                    state["exp_avg"] = torch.zeros_like(p)
                    state["exp_avg_sq"] = torch.zeros_like(p)
                    state["exp_avg_diff"] = torch.zeros_like(p)

                grad = p.grad.mul_(clip_global_grad_norm)
                if "pre_grad" not in state or group["step"] == 1:
                    state["pre_grad"] = grad

                copy_grad = grad.clone()

                exp_avg, exp_avg_sq, exp_avg_diff = (
                    state["exp_avg"],
                    state["exp_avg_sq"],
                    state["exp_avg_diff"],
                )
                diff = grad - state["pre_grad"]

                update = grad + beta2 * diff
                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)  # m_t
                exp_avg_diff.mul_(beta2).add_(diff, alpha=1 - beta2)  # diff_t
                exp_avg_sq.mul_(beta3).addcmul_(update, update, value=1 - beta3)  # n_t

                denom = ((exp_avg_sq).sqrt() / math.sqrt(bias_correction3)).add_(
                    group["eps"]
                )
                update = (
                    (
                        exp_avg / bias_correction1
                        + beta2 * exp_avg_diff / bias_correction2
                    )
                ).div_(denom)

                if group["no_prox"]:
                    p.data.mul_(1 - group["lr"] * group["weight_decay"])
                    p.add_(update, alpha=-group["lr"])
                else:
                    p.add_(update, alpha=-group["lr"])
                    p.data.div_(1 + group["lr"] * group["weight_decay"])

                state["pre_grad"] = copy_grad

# Train func

In [10]:
def train_fn(
    stage, fold, train_loader, model, criterion, optimizer, epoch, scheduler, device
):
    model.train()

    scaler = torch.cuda.amp.GradScaler(enabled=CFG.apex)
    losses = AverageMeter()
    start = end = time.time()
    global_step = 0

    for step, batch in enumerate(train_loader):
        eegs = batch["eeg"].to(device)
        labels = batch["labels"].to(device)
        batch_size = labels.size(0)

        with torch.cuda.amp.autocast(enabled=CFG.apex):
            y_preds = model(eegs)
            loss = criterion(F.log_softmax(y_preds, dim=1), labels)

        if CFG.gradient_accumulation_steps > 1:
            loss = loss / CFG.gradient_accumulation_steps

        losses.update(loss.item(), batch_size)

        scaler.scale(loss).backward()

        grad_norm = torch.nn.utils.clip_grad_norm_(
            model.parameters(), CFG.max_grad_norm
        )

        if (step + 1) % CFG.gradient_accumulation_steps == 0:
            scaler.step(optimizer)
            scaler.update()
            global_step += 1
            # モデル内のパラメータの勾配を初期化なし
            if CFG.batch_scheduler:
                scheduler.step()
        end = time.time()

        if CFG.log_show and (
            step % CFG.log_step == 0 or step == (len(train_loader) - 1)
        ):
            # remain=timeSince(start, float(step + 1) / len(train_loader))
            LOGGER.info(
                f"Epoch {epoch+1} [{step}/{len(train_loader)}] Loss: {losses.val:.4f} Loss Avg:{losses.avg:.4f}"
            )
            # "Elapsed {remain:s} Grad: {grad_norm:.4f}  LR: {cheduler.get_lr()[0]:.8f}"

        if CFG.wandb:
            wandb.log(
                {
                    f"[fold{fold}] loss": losses.val,
                    f"[fold{fold}] lr": scheduler.get_lr()[0],
                }
            )
    return losses.avg

# Valid Func

In [11]:
def valid_fn(stage, epoch, valid_loader, model, criterion, device):
    losses = AverageMeter()
    model.eval()
    preds = []
    targets = []
    start = end = time.time()

    for step, batch in enumerate(valid_loader):
        eegs = batch["eeg"].to(device)
        labels = batch["labels"].to(device)
        batch_size = labels.size(0)

        with torch.no_grad():
            y_preds = model(eegs)
            loss = criterion(F.log_softmax(y_preds, dim=1), labels)

        if CFG.gradient_accumulation_steps > 1:
            loss = loss / CFG.gradient_accumulation_steps

        losses.update(loss.item(), batch_size)
        preds.append(nn.Softmax(dim=1)(y_preds).to("cpu").numpy())
        targets.append(labels.to("cpu").numpy())
        end = time.time()

        if CFG.log_show and (
            step % CFG.log_step == 0 or step == (len(valid_loader) - 1)
        ):
            # remain=timeSince(start, float(step + 1) / len(valid_loader))
            LOGGER.info(
                f"Epoch {epoch+1} VALIDATION: [{step}/{len(valid_loader)}] Val Loss: {losses.val:.4f} Val Loss Avg: {losses.avg:.4f}"
            )
            # Elapsed {remain:s}

    predictions = np.concatenate(preds)
    targets = np.concatenate(targets)

    return losses.avg, predictions

# Build Optimizer

In [12]:
def build_optimizer(cfg, model, device, epochs, num_batches_per_epoch):
    lr = cfg.lr
    # lr = default_configs["lr"]
    if cfg.optimizer == "SAM":
        base_optimizer = (
            torch.optim.SGD
        )  # define an optimizer for the "sharpness-aware" update
        optimizer_model = SAM(
            model.parameters(),
            base_optimizer,
            lr=lr,
            momentum=0.9,
            weight_decay=cfg.weight_decay,
            adaptive=True,
        )
    elif cfg.optimizer == "Ranger21":
        optimizer_model = Ranger21(
            model.parameters(),
            lr=lr,
            weight_decay=cfg.weight_decay,
            num_epochs=epochs,
            num_batches_per_epoch=num_batches_per_epoch,
        )
    elif cfg.optimizer == "SGD":
        optimizer_model = torch.optim.SGD(
            model.parameters(), lr=lr, weight_decay=cfg.weight_decay, momentum=0.9
        )
    elif cfg.optimizer == "Adam":
        optimizer_model = Adam(model.parameters(), lr=lr, weight_decay=CFG.weight_decay)
    elif cfg.optimizer == "AdamW":
        optimizer_model = AdamW(
            model.parameters(), lr=lr, weight_decay=CFG.weight_decay
        )
    elif cfg.optimizer == "Lion":
        optimizer_model = Lion(model.parameters(), lr=lr, weight_decay=cfg.weight_decay)
    elif cfg.optimizer == "Adan":
        optimizer_model = Adan(model.parameters(), lr=lr, weight_decay=cfg.weight_decay)

    return optimizer_model

# Scheduler

In [13]:
def get_scheduler(optimizer, epochs, steps_per_epoch):
    if CFG.scheduler == "ReduceLROnPlateau":
        scheduler = ReduceLROnPlateau(optimizer, **CFG.reduce_params)
    elif CFG.scheduler == "CosineAnnealingLR":
        scheduler = CosineAnnealingLR(optimizer, **CFG.cosanneal_params)
    elif CFG.scheduler == "CosineAnnealingWarmRestarts":
        scheduler = CosineAnnealingWarmRestarts(optimizer, **CFG.cosanneal_res_params)
    elif CFG.scheduler == "OneCycleLR":
        scheduler = OneCycleLR(
            optimizer=optimizer,
            epochs=epochs,
            pct_start=0.0,
            steps_per_epoch=steps_per_epoch,
            max_lr=CFG.lr,
            div_factor=25,
            final_div_factor=4.0e-01,
        )
    return scheduler

# Train Loop

In [14]:
def train_loop(stage, epochs, folds, fold, directory, prev_dir, eggs):
    train_folds = folds[folds["fold"] != fold].reset_index(drop=True)
    valid_folds = folds[folds["fold"] == fold].reset_index(drop=True)
    valid_labels = valid_folds[CFG.target_cols].values

    train_dataset = EEGDataset(
        train_folds,
        batch_size=CFG.batch_size,
        mode="train",
        eegs=eggs,
        bandpass_filter=CFG.bandpass_filter,
        rand_filter=CFG.rand_filter,
    )
        
    valid_dataset = EEGDataset(
        valid_folds,
        batch_size=CFG.batch_size,
        mode="valid",
        eegs=eggs,
        bandpass_filter=CFG.bandpass_filter,
        #rand_filter=CFG.rand_filter,
    )

    train_loader = DataLoader(
        train_dataset,
        batch_size=CFG.batch_size,
        shuffle=True,
        num_workers=CFG.num_workers,
        pin_memory=True,
        drop_last=True,
    )

    valid_loader = DataLoader(
        valid_dataset,
        batch_size=CFG.batch_size * CFG.batch_koef_valid,
        shuffle=False,
        num_workers=CFG.num_workers,
        pin_memory=True,
        drop_last=False,
    )

    LOGGER.info(
        f"========== stage: {stage} fold: {fold} training {len(train_loader)} / {len(valid_loader)} =========="
    )

    model = EEGNet(
        kernels=CFG.kernels,
        in_channels=CFG.in_channels,
        fixed_kernel_size=CFG.fixed_kernel_size,
        num_classes=CFG.target_size,
        linear_layer_features=CFG.linear_layer_features,
    )

    # 2stage学習
    if stage > 1:
        model_weight = f"{prev_dir}{CFG.model_name}_ver-{CFG.VERSION}_stage-{stage-1}_fold-{fold}_best.pth"
        checkpoint = torch.load(model_weight, map_location=device)
        model.load_state_dict(checkpoint["model"])

    model.to(device)

    # CPMP: wrap the model to use all GPUs
    if CFG.parallel:
        model = nn.DataParallel(model)

    optimizer = build_optimizer(
        CFG, model, device, epochs=epochs, num_batches_per_epoch=len(train_loader)
    )
    scheduler = get_scheduler(
        optimizer, epochs=epochs, steps_per_epoch=len(train_loader)
    )
    criterion = nn.KLDivLoss(reduction="batchmean")

    best_score = np.inf
    for epoch in range(epochs):
        start_time = time.time()

        # train
        avg_loss = train_fn(
            stage,
            fold,
            train_loader,
            model,
            criterion,
            optimizer,
            epoch,
            scheduler,
            device,
        )

        # eval
        valid_dataset.set_offset(CFG.sample_offset)
        avg_val_loss, predictions = valid_fn(
            stage,
            epoch,
            valid_loader,
            model,
            criterion,
            device,
        )
        
        avg_loss_line = ''
        if CFG.multi_validation:
            multi_avg_val_loss = np.zeros(CFG.n_split_samples)
            start = (2 * CFG.sample_delta) // CFG.n_split_samples
            finish = (3 * CFG.sample_delta) // CFG.n_split_samples
            delta = (finish - start) // 5
            for i in range(CFG.n_split_samples):
                valid_dataset.set_offset(start)
                multi_avg_val_loss[i], _ = valid_fn(
                    stage,
                    epoch,
                    valid_loader,
                    model,
                    criterion,
                    device,
                )
                avg_loss_line += f" {multi_avg_val_loss[i]:.4f}"
                start += delta
            avg_loss_line += f" mean={np.mean(multi_avg_val_loss):.4f}"

        elapsed = time.time() - start_time

        LOGGER.info(
            f"Epoch {epoch+1} Avg Train Loss: {avg_loss:.4f} Avg Valid Loss: {avg_val_loss:.4f} / {avg_loss_line}"
        )
        #   time: {elapsed:.0f}s
        if CFG.wandb:
            wandb.log(
                {
                    f"[fold{fold}] stage": stage,
                    f"[fold{fold}] epoch": epoch + 1,
                    f"[fold{fold}] avg_train_loss": avg_loss,
                    f"[fold{fold}] avg_val_loss": avg_val_loss,
                    #f"[fold{fold}] score": score,
                }
            )

        if CFG.save_all_models:
            torch.save(
                {"model": model.module.state_dict(), "predictions": predictions},
                f"{directory}{CFG.model_name}_ver-{CFG.VERSION}_stage-{stage}_fold-{fold}_epoch-{epoch}_val-{avg_val_loss:.4f}_train-{avg_loss:.4f}.pth",
            )

        if best_score > avg_val_loss:
            best_score = avg_val_loss
            LOGGER.info(f"Epoch {epoch+1} Save Best Valid Loss: {avg_val_loss:.4f}")
            # CPMP: save the original model. It is stored as the module attribute of the DP model.
            torch.save(
                {"model": model.module.state_dict(), "predictions": predictions},
                f"{directory}{CFG.model_name}_ver-{CFG.VERSION}_stage-{stage}_fold-{fold}_best.pth",
            )

    predictions = torch.load(
        f"{directory}{CFG.model_name}_ver-{CFG.VERSION}_stage-{stage}_fold-{fold}_best.pth",
        map_location=torch.device("cpu"),
    )["predictions"]

    # valid_folds[[f"pred_{c}" for c in CFG.target_cols]] = predictions
    valid_folds[CFG.pred_cols] = predictions
    valid_folds[CFG.target_cols] = valid_labels

    torch.cuda.empty_cache()
    gc.collect()

    return valid_folds, best_score

# Load train data

In [15]:
train = pd.read_csv(CFG.file_train)
TARGETS = train.columns[-6:]
print("Train shape:", train.shape)
print("Targets", list(TARGETS))

train["total_evaluators"] = train[CFG.target_cols].sum(axis=1)

train_uniq = train.drop_duplicates(subset=["eeg_id"] + list(TARGETS))

print(f"There are {train.patient_id.nunique()} patients in the training data.")
print(f"There are {train.eeg_id.nunique()} EEG IDs in the training data.")
print(f"There are {train_uniq.shape[0]} unique eeg_id + votes in the training data.")

if CFG.visualize:
    train_uniq.eeg_id.value_counts().value_counts().plot(
        kind="bar",
        title=f"Distribution of Count of EEG w Unique Vote: "
        f"{train_uniq.shape[0]} examples",
    )

del train_uniq
_ = gc.collect()

Train shape: (106800, 15)
Targets ['seizure_vote', 'lpd_vote', 'gpd_vote', 'lrda_vote', 'grda_vote', 'other_vote']
There are 1950 patients in the training data.
There are 17089 EEG IDs in the training data.
There are 20183 unique eeg_id + votes in the training data.


In [16]:
if CFG.visualize:
    plt.figure(figsize=(10, 6))
    plt.hist(train["total_evaluators"], bins=10, color="blue", edgecolor="black")
    plt.title("Histogram of Total Evaluators")
    plt.xlabel("Total Evaluators")
    plt.ylabel("Frequency")
    plt.grid(True)
    plt.show()

tst_eeg_df = pd.read_parquet(CFG.file_features_test)
tst_eeg_features = tst_eeg_df.columns
print(f"There are {len(tst_eeg_features)} raw eeg features")
print(list(tst_eeg_features))
del tst_eeg_df
_ = gc.collect()

There are 20 raw eeg features
['Fp1', 'F3', 'C3', 'P3', 'F7', 'T3', 'T5', 'O1', 'Fz', 'Cz', 'Pz', 'Fp2', 'F4', 'C4', 'P4', 'F8', 'T4', 'T6', 'O2', 'EKG']


# Split Data

In [17]:
# %%time
all_eeg_specs = np.load(CFG.file_eeg_specs, allow_pickle=True).item()

train = train[train["label_id"].isin(all_eeg_specs.keys())].copy()
print(train.shape[0])

# y_data = train[TARGETS].values + 0.166666667  # Regularization value
y_data = train[TARGETS].values
y_data = y_data / y_data.sum(axis=1, keepdims=True)
train[TARGETS] = y_data

train["target"] = train["expert_consensus"]

train[train['total_evaluators'] == CFG.test_total_eval].groupby(['expert_consensus','total_evaluators']).count()

20183


Unnamed: 0_level_0,Unnamed: 1_level_0,eeg_id,eeg_sub_id,eeg_label_offset_seconds,spectrogram_id,spectrogram_sub_id,spectrogram_label_offset_seconds,label_id,patient_id,seizure_vote,lpd_vote,gpd_vote,lrda_vote,grda_vote,other_vote,target
expert_consensus,total_evaluators,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1


In [18]:
if CFG.test_total_eval > 0:
    train['key_id'] = range(train.shape[0])

    train_pop_olds = []
    for total_eval in CFG.total_evals_old:
        if type(total_eval) is list:
            pop_idx = (train["total_evaluators"] >= total_eval[0][0]) & (
                train["total_evaluators"] < total_eval[0][1]
            ) | (train["total_evaluators"] >= total_eval[1][0]) & (
                train["total_evaluators"] < total_eval[1][1]
            )
        else:
            pop_idx = (train["total_evaluators"] >= total_eval[0]) & (
                train["total_evaluators"] < total_eval[1]
            )

        train_pop = train[pop_idx].copy().reset_index()

        sgkf = GroupKFold(n_splits=CFG.n_fold)
        train_pop["fold"] = -1
        for fold_id, (_, val_idx) in enumerate(
            sgkf.split(train_pop, y=train_pop["target"], groups=train_pop["patient_id"])
        ):
            train_pop.loc[val_idx, "fold"] = fold_id

        train_pop_olds.append(train_pop)
        print(train_pop.shape[0])

In [19]:
train_pops = [] # 各stageのtrain用データフレームをリストで持つ。[stage1用df, stage2用df, ...]
for eval_list in CFG.total_evaluators:
    result=[]
    train_pop = train  
    for eval_dict in eval_list:
        band = eval_dict['band'] # 投票数の範囲(min, max)を取得
        pop_idx = (train_pop["total_evaluators"] >= band[0]) # band[0] = min
        pop_idx &= (train_pop["total_evaluators"] <= band[1]) # band[1] = max
        for exclude in eval_dict['excl_evals']:
            pop_idx &= ~(train_pop['expert_consensus'] == exclude)
            pass
        result.append(train_pop[pop_idx])
    train_pop = pd.concat(result).copy().reset_index()

    sgkf = GroupKFold(n_splits=CFG.n_fold)
    train_pop["fold"] = -1
    for fold_id, (_, val_idx) in enumerate(
        sgkf.split(train_pop, y=train_pop["target"], groups=train_pop["patient_id"])
    ):
        train_pop.loc[val_idx, "fold"] = fold_id

    train_pops.append(train_pop) # 各stage用のdfをリストに入れる
    print(train_pop.shape[0])

train_0 = train_pops[0]
train_0[train_0['total_evaluators'] == CFG.test_total_eval].groupby(['expert_consensus','total_evaluators']).count()

13833
6350


Unnamed: 0_level_0,Unnamed: 1_level_0,index,eeg_id,eeg_sub_id,eeg_label_offset_seconds,spectrogram_id,spectrogram_sub_id,spectrogram_label_offset_seconds,label_id,patient_id,seizure_vote,lpd_vote,gpd_vote,lrda_vote,grda_vote,other_vote,target,fold
expert_consensus,total_evaluators,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1


In [20]:
# if CFG.test_total_eval > 0:
#     df_old = train_pop_olds[0].copy(deep=True).set_index(['key_id'], drop=True).drop(columns=['fold'])
#     df_new = train_pops[0].copy(deep=True).set_index(['key_id'], drop=True).drop(columns=['fold'])

#     #outer merge the two DataFrames, adding an indicator column called 'Exist'
#     diff_df = pd.merge(df_old, df_new, how='outer', indicator='Exist')

#     #find which rows don't exist in both DataFrames
#     diff_df = diff_df.loc[diff_df['Exist'] != 'both']
#     display(diff_df)

#     del df_old, df_new, diff_df, train_pop_olds
#     _ = gc.collect()

In [21]:
if CFG.visualize:
    print("Pop 1: train unique eeg_id + votes shape:", train_pops[0].shape)
    plt.figure(figsize=(10, 6))
    plt.hist(train["total_evaluators"], bins=10, color="blue", edgecolor="black")
    plt.title("Histogram of Total Evaluators")
    plt.xlabel("Total Evaluators")
    plt.ylabel("Frequency")
    plt.grid(True)
    plt.show()

del all_eeg_specs
_ = gc.collect()

# Deduplicate Train EEG Id

In [22]:
%%time
# Chrisのコード
# EEGデータのParquetファイルをNumpy辞書に変換
# eeg_from_parquet関数で、生のEEGデータに対して前処理をおこなう

if CFG.create_eegs:
    all_eegs = {}
    visualize = 1 if CFG.visualize else 0
    eeg_ids = train.eeg_id.unique()

    for i, eeg_id in tqdm(enumerate(eeg_ids)):

        # numpy 配列の Python 辞書に EEG を保存
        eeg_path = CFG.path_train / f"{eeg_id}.parquet"

        # 真ん中の50秒部分を切り取って真ん中のNaNを詰める
        data = eeg_from_parquet(eeg_path, display=i < visualize)
        all_eegs[eeg_id] = data

        if i == visualize:
            if CFG.create_eegs:
                print(
                    f"Processing {train['eeg_id'].nunique()} eeg parquets... ", end=""
                )
            else:
                print(f"Reading {len(eeg_ids)} eeg NumPys from disk.")
                break
    np.save("./eegs", all_eegs)

else:
    all_eegs = np.load(CFG.file_raw_eeg, allow_pickle=True).item()

if CFG.visualize:
    frequencies = [1, 2, 4, 8, 16][::-1]  # frequencies in Hz
    x = [all_eegs[eeg_ids[0]][:, 0]]  # select one EEG feature

    for frequency in frequencies:
        x.append(butter_lowpass_filter(x[0], cutoff_freq=frequency))

    plt.figure(figsize=(12, 8))
    plt.plot(range(CFG.nsamples), x[0], label="without filter")
    for k in range(1, len(x)):
        plt.plot(
            range(CFG.nsamples),
            x[k] - k * (x[0].max() - x[0].min()),
            label=f"with filter {frequencies[k-1]}Hz",
        )

    plt.legend()
    plt.yticks([])
    plt.title("Butter Low-Pass Filter Examples", size=18)
    plt.show()

CPU times: user 4.51 s, sys: 4.21 s, total: 8.72 s
Wall time: 1min 33s


In [23]:
if CFG.visualize:
    train_dataset = EEGDataset(
        train_pops[0], batch_size=CFG.batch_size, eegs=all_eegs, mode="train"
    )
    train_loader = DataLoader(
        train_dataset,
        batch_size=CFG.batch_size,
        shuffle=False,
        num_workers=CFG.num_workers,
        pin_memory=True,
        drop_last=True,
    )
    output = train_dataset[0]
    X, y = output["eeg"], output["labels"]
    print(f"X shape: {X.shape}, y shape: {y.shape}")

    iot = torch.randn(2, CFG.nsamples, CFG.in_channels)  # .cuda()
    model = EEGNet(
        kernels=CFG.kernels,
        in_channels=CFG.in_channels,
        fixed_kernel_size=CFG.fixed_kernel_size,
        num_classes=CFG.target_size,
        linear_layer_features=CFG.linear_layer_features,
    )
    output = model(iot)
    print(output.shape)

    for batch in train_loader:
        X = batch.pop("eeg")
        y = batch.pop("labels")
        for item in range(4):
            plt.figure(figsize=(20, 4))
            offset = 0
            for col in range(X.shape[-1]):
                if col != 0:
                    offset -= X[item, :, col].min()
                plt.plot(
                    range(CFG.nsamples),
                    X[item, :, col] + offset,
                    label=f"feature {col+1}",
                )
                offset += X[item, :, col].max()
            tt = f"{y[col][0]:0.1f}"
            for t in y[col][1:]:
                tt += f", {t:0.1f}"
            plt.title(f"EEG_Id = {eeg_ids[item]}\nTarget = {tt}", size=14)
            plt.legend()
            plt.show()
        break

    del iot, model
    gc.collect()

### Label Refine

In [24]:
def inference_function(test_loader, model, device):
    model.eval()  # set model in evaluation mode
    softmax = nn.Softmax(dim=1)
    prediction_dict = {}
    preds = []
    with tqdm(test_loader, unit="test_batch", desc="Inference") as tqdm_test_loader:
        for step, batch in enumerate(tqdm_test_loader):
            X = batch.pop("eeg").to(device)  # send inputs to `device`
            batch_size = X.size(0)
            with torch.no_grad():
                y_preds = model(X)  # forward propagation pass
            y_preds = softmax(y_preds)
            preds.append(y_preds.to("cpu").numpy())  # save predictions

    prediction_dict["predictions"] = np.concatenate(
        preds
    )  # np.array() of shape (fold_size, target_cols)
    return prediction_dict

In [25]:
df_low_quality = train_pops[0]
df_high_quality = train_pops[1]

In [26]:
%%time

# 高品質データで学習
# stage1として対応
if __name__ == "__main__" and CFG.train_by_stages:
    seed_torch(seed=CFG.seed)
    stage_temp = 0
    prev_dir = ""
    # oof_df_all = pd.DataFrame()
    # oof_stage1 = pd.DataFrame()
    # oof_stage2 = pd.DataFrame()

    pop_dir = f"{OUTPUT_DIR}pop_label_refine_weight_oof/"
    if not os.path.exists(pop_dir):
        os.makedirs(pop_dir)

#     if stage_temp not in CFG.train_stages:
#         prev_dir = pop_dir
#         continue

    oof_df0 = pd.DataFrame()
    scores = []
    for fold in CFG.train_folds:
        train_oof_df, score = train_loop(
            # stage=stage + 1,
            stage = stage_temp,
            epochs=CFG.epochs[stage_temp],
            fold=fold,
            # folds=train_pops[stage],
            folds = df_high_quality,
            directory=pop_dir,
            prev_dir=prev_dir,
            eggs=all_eegs,
        )

        oof_df0 = pd.concat([oof_df0, train_oof_df])
        scores.append(score)

        LOGGER.info(f"========== stage: {stage_temp+1} fold: {fold} result ==========")
        LOGGER.info(f"Score with best loss weights stage{stage_temp+1}: {score:.4f}")

    LOGGER.info(f"==================== CV ====================")
    LOGGER.info(f"Score with best loss weights: {np.mean(scores):.4f}")

    oof_df0.reset_index(drop=True, inplace=True)
    oof_df0.to_csv(
        f"{pop_dir}{CFG.model_name}_oof_df_ver-{CFG.VERSION}_label_refine.csv",
        index=False,
    )

    # prev_dir = pop_dir

    if CFG.wandb:
        wandb.finish()

Epoch 1 Avg Train Loss: 0.8219 Avg Valid Loss: 0.7163 / 
Epoch 1 Save Best Valid Loss: 0.7163
Epoch 2 Avg Train Loss: 0.6559 Avg Valid Loss: 0.6401 / 
Epoch 2 Save Best Valid Loss: 0.6401
Epoch 3 Avg Train Loss: 0.5884 Avg Valid Loss: 0.6193 / 
Epoch 3 Save Best Valid Loss: 0.6193
Epoch 4 Avg Train Loss: 0.5711 Avg Valid Loss: 0.6471 / 
Epoch 5 Avg Train Loss: 0.5479 Avg Valid Loss: 0.5973 / 
Epoch 5 Save Best Valid Loss: 0.5973
Epoch 6 Avg Train Loss: 0.5299 Avg Valid Loss: 0.5863 / 
Epoch 6 Save Best Valid Loss: 0.5863
Epoch 7 Avg Train Loss: 0.5244 Avg Valid Loss: 0.5678 / 
Epoch 7 Save Best Valid Loss: 0.5678
Epoch 8 Avg Train Loss: 0.5173 Avg Valid Loss: 0.5394 / 
Epoch 8 Save Best Valid Loss: 0.5394
Epoch 9 Avg Train Loss: 0.5120 Avg Valid Loss: 0.5350 / 
Epoch 9 Save Best Valid Loss: 0.5350
Epoch 10 Avg Train Loss: 0.4995 Avg Valid Loss: 0.5098 / 
Epoch 10 Save Best Valid Loss: 0.5098
Epoch 11 Avg Train Loss: 0.5043 Avg Valid Loss: 0.5322 / 
Epoch 12 Avg Train Loss: 0.4908 Avg V

CPU times: user 1h 47min 48s, sys: 6min 59s, total: 1h 54min 47s
Wall time: 1h 25min 46s


In [27]:
# df_low_quality を Label Refine する

koef_1 = 1.0
model_weights = [
    {
        'bandpass_filter':{'low':0.5, 'high':20, 'order':2}, 
        'file_data': 
        [
            #{'koef':koef_1, 'file_mask':"/kaggle/input/hms-resnet1d-gru-weights-v82/pop_1_weight_oof/*_best.pth"},
            # {'koef':koef_1, 'file_mask':"/kaggle/input/hms-resnet1d-gru-weights-v82/pop_2_weight_oof/*_best.pth"},
            {'koef':koef_1, 'file_mask':"./pop_label_refine_weight_oof/*_best.pth"},
#             {'koef':koef_1, 'file_mask':"../input/resnet1d-gru-adjusted-signal-size-ver1-train/pop_2_weight_oof/*_best.pth"},
        ]
    },
]


koef_sum = 0
koef_count = 1
predictions = []
files = []
    
for model_block in model_weights:
    test_dataset = EEGDataset(
        df=df_low_quality,
        batch_size=CFG.batch_size,
        mode="test",
        eegs=all_eegs,
        bandpass_filter=model_block['bandpass_filter']
    )

    if len(predictions) == 0:
        output = test_dataset[0]
        X = output["eeg"]
        print(f"X shape: {X.shape}")
                
    test_loader = DataLoader(
        test_dataset,
        batch_size=CFG.batch_size,
        shuffle=False,
        num_workers=CFG.num_workers,
        pin_memory=True,
        drop_last=False,
    )

    model = EEGNet(
        kernels=CFG.kernels,
        in_channels=CFG.in_channels,
        fixed_kernel_size=CFG.fixed_kernel_size,
        num_classes=CFG.target_size,
        linear_layer_features=CFG.linear_layer_features,
    )

    for file_line in model_block['file_data']:
        koef = file_line['koef']
        for weight_model_file in glob(file_line['file_mask']):
            files.append(weight_model_file)
            checkpoint = torch.load(weight_model_file, map_location=device)
            model.load_state_dict(checkpoint["model"])
            model.to(device)
            prediction_dict = inference_function(test_loader, model, device)
            predict = prediction_dict["predictions"]
            predict *= koef
            koef_sum += koef
            koef_count += 1
            predictions.append(predict)
            torch.cuda.empty_cache()
            gc.collect()

# predictions = np.array(predictions)
# koef_sum /= koef_count
# predictions /= koef_sum
# predictions = np.mean(predictions, axis=0)

X shape: torch.Size([2000, 8])


Inference:   0%|          | 0/217 [00:00<?, ?test_batch/s]

Inference:   0%|          | 0/217 [00:00<?, ?test_batch/s]

Inference:   0%|          | 0/217 [00:00<?, ?test_batch/s]

Inference:   0%|          | 0/217 [00:00<?, ?test_batch/s]

Inference:   0%|          | 0/217 [00:00<?, ?test_batch/s]

In [28]:
predictions = np.array(predictions)
predictions_mean = np.mean(predictions, axis = 0)

In [29]:
# 確認
df_low_temp = pd.DataFrame({"eeg_id": df_low_quality.eeg_id.values})
df_low_temp[CFG.target_cols] = predictions_mean
df_low_temp["sum"] = df_low_temp["seizure_vote"] + df_low_temp["lpd_vote"] + df_low_temp["gpd_vote"] + df_low_temp["lrda_vote"] + df_low_temp["grda_vote"] + df_low_temp["other_vote"]
df_low_temp.head()

Unnamed: 0,eeg_id,seizure_vote,lpd_vote,gpd_vote,lrda_vote,grda_vote,other_vote,sum
0,1628180742,0.121905,0.168896,0.023786,0.137949,0.0389,0.508564,1.0
1,387987538,0.026605,0.062485,0.004791,0.295362,0.233107,0.37765,1.0
2,2175806584,0.184089,0.032693,0.723641,0.000608,0.013627,0.045341,1.0
3,1626798710,0.230813,0.089289,0.440274,0.014314,0.106776,0.118534,1.0
4,2529955608,0.131982,0.04001,0.690643,0.003777,0.053652,0.079937,1.0


In [30]:
df_low_quality.loc[:, CFG.target_cols] /= 2
df_low_quality.loc[:, CFG.target_cols] += predictions_mean / 2
df_low_quality.head()

Unnamed: 0,index,eeg_id,eeg_sub_id,eeg_label_offset_seconds,spectrogram_id,spectrogram_sub_id,spectrogram_label_offset_seconds,label_id,patient_id,expert_consensus,seizure_vote,lpd_vote,gpd_vote,lrda_vote,grda_vote,other_vote,total_evaluators,target,fold
0,0,1628180742,0,0.0,353733,0,0.0,127492639,42516,Seizure,0.560952,0.084448,0.011893,0.068974,0.01945,0.254282,3,Seizure,0
1,22,387987538,0,0.0,1084844,0,0.0,4099147263,4264,LRDA,0.013303,0.031243,0.002395,0.647681,0.116554,0.188825,3,LRDA,3
2,28,2175806584,0,0.0,1219001,0,0.0,1963161945,23435,Seizure,0.592044,0.016347,0.36182,0.000304,0.006814,0.022671,3,Seizure,1
3,30,1626798710,0,0.0,1219001,2,74.0,3631726128,23435,Seizure,0.415406,0.044644,0.420137,0.007157,0.053388,0.059267,5,Seizure,1
4,32,2529955608,0,0.0,1219001,4,190.0,4265493714,23435,Seizure,0.365991,0.020005,0.545321,0.001888,0.026826,0.039969,5,Seizure,1


In [31]:
train_pops = []
train_pops.append(df_low_quality)
train_pops.append(df_high_quality)

# Train Stages

In [32]:
def get_score(preds, targets):
    oof = pd.DataFrame(preds.copy())
    oof["id"] = np.arange(len(oof))
    true = pd.DataFrame(targets.copy())
    true["id"] = np.arange(len(true))
    cv = kaggle_kl_div.score(solution=true, submission=oof, row_id_column_name="id")
    return cv


def get_result(result_df):
    gt = result_df[["eeg_id"] + CFG.target_cols]
    gt.sort_values(by="eeg_id", inplace=True)
    gt.reset_index(inplace=True, drop=True)
    preds = result_df[["eeg_id"] + CFG.pred_cols]
    preds.columns = ["eeg_id"] + CFG.target_cols
    preds.sort_values(by="eeg_id", inplace=True)
    preds.reset_index(inplace=True, drop=True)
    score_loss = get_score(gt[CFG.target_cols], preds[CFG.target_cols])
    LOGGER.info(f"Score with best loss weights: {score_loss}")

## 2 stage

In [33]:
%%time

# stageごとにfoldを処理
if __name__ == "__main__" and CFG.train_by_stages:
    seed_torch(seed=CFG.seed)

    prev_dir = ""
    oof_df_all = pd.DataFrame()
    oof_stage1 = pd.DataFrame()
    oof_stage2 = pd.DataFrame()
    for stage in range(len(CFG.total_evaluators)):
        pop_dir = f"{OUTPUT_DIR}pop_{stage+1}_weight_oof/"
        if not os.path.exists(pop_dir):
            os.makedirs(pop_dir)

        if stage not in CFG.train_stages:
            prev_dir = pop_dir
            continue

        oof_df = pd.DataFrame()
        scores = []
        for fold in CFG.train_folds:
            train_oof_df, score = train_loop(
                stage=stage + 1,
                epochs=CFG.epochs[stage],
                fold=fold,
                folds=train_pops[stage],
                directory=pop_dir,
                prev_dir=prev_dir,
                eggs=all_eegs,
            )

            oof_df = pd.concat([oof_df, train_oof_df])
            scores.append(score)

            LOGGER.info(f"========== stage: {stage+1} fold: {fold} result ==========")
            LOGGER.info(f"Score with best loss weights stage{stage+1}: {score:.4f}")

        LOGGER.info(f"==================== CV ====================")
        LOGGER.info(f"Score with best loss weights: {np.mean(scores):.4f}")

        oof_df.reset_index(drop=True, inplace=True)
        oof_df.to_csv(
            f"{pop_dir}{CFG.model_name}_oof_df_ver-{CFG.VERSION}_stage-{stage+1}.csv",
            index=False,
        )
        oof_df_all = pd.concat([oof_df_all, oof_df], axis = 0)
        if stage == 0:
            oof_stage1 = oof_df
        else:
            oof_stage2 = oof_df

        prev_dir = pop_dir

    if CFG.wandb:
        wandb.finish()

Epoch 1 Avg Train Loss: 0.5643 Avg Valid Loss: 0.5976 / 
Epoch 1 Save Best Valid Loss: 0.5976
Epoch 2 Avg Train Loss: 0.4203 Avg Valid Loss: 0.4764 / 
Epoch 2 Save Best Valid Loss: 0.4764
Epoch 3 Avg Train Loss: 0.3888 Avg Valid Loss: 0.4480 / 
Epoch 3 Save Best Valid Loss: 0.4480
Epoch 4 Avg Train Loss: 0.3672 Avg Valid Loss: 0.4333 / 
Epoch 4 Save Best Valid Loss: 0.4333
Epoch 5 Avg Train Loss: 0.3541 Avg Valid Loss: 0.4424 / 
Epoch 6 Avg Train Loss: 0.3466 Avg Valid Loss: 0.5481 / 
Epoch 7 Avg Train Loss: 0.3394 Avg Valid Loss: 0.4676 / 
Epoch 8 Avg Train Loss: 0.3313 Avg Valid Loss: 0.4405 / 
Epoch 9 Avg Train Loss: 0.3262 Avg Valid Loss: 0.4359 / 
Epoch 10 Avg Train Loss: 0.3238 Avg Valid Loss: 0.4396 / 
Epoch 11 Avg Train Loss: 0.3161 Avg Valid Loss: 0.4231 / 
Epoch 11 Save Best Valid Loss: 0.4231
Epoch 12 Avg Train Loss: 0.3189 Avg Valid Loss: 0.4197 / 
Epoch 12 Save Best Valid Loss: 0.4197
Epoch 13 Avg Train Loss: 0.3082 Avg Valid Loss: 0.3829 / 
Epoch 13 Save Best Valid Loss: 

CPU times: user 7h 36min 14s, sys: 23min 53s, total: 8h 7s
Wall time: 5h 56min 11s


In [34]:
%%time

# foldごとにstageを処理

if __name__ == "__main__" and CFG.train_by_folds:
    seed_torch(seed=CFG.seed)

    stages_scores = {i: [] for i in CFG.train_stages}
    stages_oof_df = {i: pd.DataFrame() for i in CFG.train_stages}
    oof_df_all = pd.DataFrame()
    for fold in CFG.train_folds:

        prev_dir = ""
        for stage in range(len(CFG.total_evaluators)):

            pop_dir = f"{OUTPUT_DIR}pop_{stage+1}_weight_oof/"
            if not os.path.exists(pop_dir):
                os.makedirs(pop_dir)

            if stage not in CFG.train_stages:
                prev_dir = pop_dir
                continue

            train_oof_df, score = train_loop(
                stage=stage + 1,
                epochs=CFG.epochs[stage],
                fold=fold,
                folds=train_pops[stage],
                directory=pop_dir,
                prev_dir=prev_dir,
                eggs=all_eegs,
            )

            stages_oof_df[stage] = pd.concat([stages_oof_df[stage], train_oof_df])
            stages_scores[stage].append(score)

            prev_dir = pop_dir

            LOGGER.info(f"========== fold: {fold} stage: {stage+1} result ==========")
            LOGGER.info(f"Score with best loss weights stage{stage+1}: {score:.4f}")

    for stage, scores in stages_scores.items():
        LOGGER.info(f"============ CV score with best loss weights ============")
        LOGGER.info(f"Stage {stage}: {np.mean(scores):.4f}")

    for stage, oof_df in stages_oof_df.items():
        pop_dir = f"{OUTPUT_DIR}pop_{stage+1}_weight_oof/"
        oof_df.reset_index(drop=True, inplace=True)
        oof_df.to_csv(
            f"{pop_dir}{CFG.model_name}_oof_df_ver-{CFG.VERSION}_stage-{stage+1}.csv",
            index=False,
        )
        oof_df_all = pd.concat([oof_df_all, oof_df], axis = 0)

    if CFG.wandb:
        wandb.finish()

CPU times: user 6 µs, sys: 0 ns, total: 6 µs
Wall time: 10.7 µs


# Submission

In [35]:
# === Pre-process OOF ===
gt = oof_stage1[["eeg_id"] + CFG.target_cols]
gt.sort_values(by="eeg_id", inplace=True)
gt.reset_index(inplace=True, drop=True)

preds = oof_stage1[["eeg_id"] + CFG.pred_cols]
preds.columns = ["eeg_id"] + CFG.target_cols
preds.sort_values(by="eeg_id", inplace=True)
preds.reset_index(inplace=True, drop=True)

y_trues = gt[CFG.target_cols]
y_preds = preds[CFG.target_cols]

oof = pd.DataFrame(y_preds.copy())
oof["id"] = np.arange(len(oof))

true = pd.DataFrame(y_trues.copy())
true["id"] = np.arange(len(true))

cv = kaggle_kl_div.score(solution=true, submission=oof, row_id_column_name="id")
print(f"stage1: CV Score with resnet1D_gru Raw EEG = {cv:.4f}")

stage1: CV Score with resnet1D_gru Raw EEG = 0.2650


In [36]:
# === Pre-process OOF ===
gt = oof_stage2[["eeg_id"] + CFG.target_cols]
gt.sort_values(by="eeg_id", inplace=True)
gt.reset_index(inplace=True, drop=True)

preds = oof_stage2[["eeg_id"] + CFG.pred_cols]
preds.columns = ["eeg_id"] + CFG.target_cols
preds.sort_values(by="eeg_id", inplace=True)
preds.reset_index(inplace=True, drop=True)

y_trues = gt[CFG.target_cols]
y_preds = preds[CFG.target_cols]

oof = pd.DataFrame(y_preds.copy())
oof["id"] = np.arange(len(oof))

true = pd.DataFrame(y_trues.copy())
true["id"] = np.arange(len(true))

cv = kaggle_kl_div.score(solution=true, submission=oof, row_id_column_name="id")
print(f"stage2: CV Score with resnet1D_gru Raw EEG = {cv:.4f}")

stage2: CV Score with resnet1D_gru Raw EEG = 0.3869


In [37]:
# === Pre-process OOF ===
gt = oof_df_all[["eeg_id"] + CFG.target_cols]
gt.sort_values(by="eeg_id", inplace=True)
gt.reset_index(inplace=True, drop=True)

preds = oof_df_all[["eeg_id"] + CFG.pred_cols]
preds.columns = ["eeg_id"] + CFG.target_cols
preds.sort_values(by="eeg_id", inplace=True)
preds.reset_index(inplace=True, drop=True)

y_trues = gt[CFG.target_cols]
y_preds = preds[CFG.target_cols]

oof = pd.DataFrame(y_preds.copy())
oof["id"] = np.arange(len(oof))

true = pd.DataFrame(y_trues.copy())
true["id"] = np.arange(len(true))

cv = kaggle_kl_div.score(solution=true, submission=oof, row_id_column_name="id")
print(f"CV Score with resnet1D_gru Raw EEG = {cv:.4f}")

CV Score with resnet1D_gru Raw EEG = 0.3033


In [38]:
print(oof_stage1.shape)
print(oof_stage2.shape)
print(oof_df_all.shape)

(13833, 25)
(6350, 25)
(20183, 25)


In [39]:
import pandas as pd

# CSV ファイルとして出力
oof_stage1.to_csv('oof_stage1.csv', index=False)
oof_stage2.to_csv('oof_stage2.csv', index=False)
oof_df_all.to_csv('oof_df_all.csv', index=False)

In [40]:
import zipfile

# zipファイルを作成
with zipfile.ZipFile('oof_stage1.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
    zipf.write('oof_stage1.csv', 'oof_stage1.csv')

# ダウンロードリンクを表示
from IPython.display import FileLink
FileLink('oof_stage1.zip')

In [41]:
import zipfile

# zipファイルを作成
with zipfile.ZipFile('oof_stage2.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
    zipf.write('oof_stage2.csv', 'oof_stage2.csv')

# ダウンロードリンクを表示
from IPython.display import FileLink
FileLink('oof_stage2.zip')

In [42]:
import zipfile

# zipファイルを作成
with zipfile.ZipFile('oof_df_all.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
    zipf.write('oof_df_all.csv', 'oof_df_all.csv')

# ダウンロードリンクを表示
from IPython.display import FileLink
FileLink('oof_df_all.zip')

In [43]:
import os
import zipfile

# 圧縮するフォルダのパス
folder_to_zip = 'pop_1_weight_oof'

# zipファイル名
zip_filename = 'pop_1_weight_oof.zip'

# zipファイルを作成
with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for root, _, files in os.walk(folder_to_zip):
        for file in files:
            zipf.write(os.path.join(root, file))



from IPython.display import FileLink

FileLink(zip_filename)

In [44]:
import os
import zipfile

# 圧縮するフォルダのパス
folder_to_zip = 'pop_2_weight_oof'

# zipファイル名
zip_filename = 'pop_2_weight_oof.zip'

# zipファイルを作成
with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for root, _, files in os.walk(folder_to_zip):
        for file in files:
            zipf.write(os.path.join(root, file))



from IPython.display import FileLink

FileLink(zip_filename)