# EfficientNetB2 Starter for Brain Comp
This notebook is forked from @CHRIS DEOTTE. 

The training dataset consists of 11,138 spectrogram files in Parquet format. These files originate from two sources: the Kaggle spectrograms dataset and [brain-eeg-spectrograms dataset](https://www.kaggle.com/datasets/cdeotte/brain-eeg-spectrograms). Kaggle spectrograms are 10-minute visual representations of EEG signals, while raw EEG spectrograms are 50-second recordings capturing the same events.


# Change Logs
[Version 1] (CV 0.59, LB 0.43) Trained an EfficientNet model using both Kaggle spectrograms and additional EEG spectrograms.


**References**
- [EfficientNetB0 Starter - [LB 0.43]](https://www.kaggle.com/code/cdeotte/efficientnetb0-starter-lb-0-43)
- [CatBoost Starter - [LB 0.60]](https://www.kaggle.com/code/cdeotte/catboost-starter-lb-0-60?scriptVersionId=159895287)
- [EfficientNetB0 Starter - [LB 0.43]](https://www.kaggle.com/code/cdeotte/efficientnetb0-starter-lb-0-43)

# Install `EfficientNet` library
Traditionally, scaling up convolutional neural networks (CNNs) often require tedious manual tuning because of its uncertain results: some models scale with depth, some with width whilst some model scale with image resolution. 

EfficientNet used `compound scaling` technique to improve CNNs. Unlike random scaling of individual dimensions like width, depth, or resolution, it uniformly scales all three dimensions together using fixed coefficients. This eliminates manual tuning for efficient and effective scaling.


 

In [None]:
!pip install --no-index --find-links=/kaggle/input/tf-efficientnet-whl-files /kaggle/input/tf-efficientnet-whl-files/efficientnet-1.1.1-py3-none-any.whl

# Imports

In [None]:
import os, gc, sys, time
import tensorflow as tf
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
print('TensorFlow version =',tf.__version__)

# Config
Specify computing devices and enabled mix-precisions if possible

In [None]:
class CFG:
    batch_size = 32
    shuffle = False

In [None]:
# Enable Distribution using multiple GPUs 
gpus = tf.config.list_physical_devices('GPU')
if len(gpus) ==0:
    strategy = tf.distribute.OneDeviceStrategy(device="/cpu:0")
    print("Using CPUs")
if len(gpus)==1:
    strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")
    print(f'Using {len(gpus)} GPU')
else:
    os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
    strategy = tf.distribute.MirroredStrategy()
    print(f'Using {len(gpus)} GPUs')

# USE MIXED PRECISION
ENABLED = False
if ENABLED:
    tf.config.optimizer.set_experimental_options({"auto_mixed_precision": True})
    print('Mixed precision enabled')
else:
    print('Using full precision')

In [None]:
import random, torch
# Seed the same seed to all 
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

SEED = 42
seed_everything(SEED)

In [None]:
import ctypes, gc
libc = ctypes.CDLL("libc.so.6")
def clear_memory():
    libc.malloc_trim(0)
    torch.cuda.empty_cache()
    gc.collect()

# Create Train Data
Load training features and training spectrograms files 

## Load Train Features
Create spectrogram features like`spectrogram_id` `eeg_id` `spectrogram_label_offset_seconds`

Following the competition's data description, the test data contains no overlapping samples with the same `ee_id` and `spectrogram_id`. Therefore, we generate training data by grouping samples based on `spectrogram_id` and averaging the corresponding votes for each unique `spectrogram_id`.

[Relevant Discussion](https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification/discussion/467021)

In [None]:
def create_train_features(df):
    # eeg_label_offset_seconds - The time between the beginning of the consolidated EEG and this subsample
    train_df = df.groupby('eeg_id').agg({'spectrogram_id':'first',
                       'spectrogram_label_offset_seconds':'min'})
    # spec_id: `spectrogram_id
    # min: Minimal time of the same spectogram_id, e.g. 0 for spectrogram_id `353733`
    train_df.columns = ['spec_id', 'min']
    # max: Maximal of the same spectrogram_id, 40 for spectrogram_id `353733`
    train_df['max'] = df.groupby('eeg_id').agg({'spectrogram_label_offset_seconds':'max'})
    
    # patient_id: patentid, e.g. 42516 for spectrogram_id `353733`
    train_df['patient_id'] = df.groupby('eeg_id').agg({'patient_id':'first'})
    
    # Generate vote targets ('seizure_vote', 'lpd_vote', 'gpd_vote', 'lrda_vote', 'grda_vote', 'other_vote')
    # The values is the total votes of a specific target col
    target_df = df.groupby('eeg_id')[TARGETS].agg({'sum'})
    # Generate Y values
    target_data = np.asarray(target_df[TARGETS].values)
    print(f"target_data.shape = {target_data.shape}")
    y_data = target_data / np.sum(target_data, axis=1, keepdims=True)
    print(f"y_data.shape = {y_data.shape}") # Get 6 columns
    for i, t  in enumerate(TARGETS):
        train_df[t] = y_data[:, i]

    # target: expert consensus 
    train_df['target'] = df.groupby('eeg_id').agg({'expert_consensus':'first'})
    train_df = train_df.reset_index()
    print('Train data shape:', train_df.shape)
    return train_df

In [None]:
df = pd.read_csv('/kaggle/input/hms-harmful-brain-activity-classification/train.csv')
TARGETS = df.columns[-6:]
print('Train shape:', df.shape )
print(f'Target columns: list(TARGETS)')
df.head(10)

In [None]:
train_df = create_train_features(df)
train_df.head()
train_df.to_csv("train.csv", encoding="utf-8")
clear_memory()

## Load Kaggle Training Spectrograms 

The Kaggle training dataset comprises 11,000 individual spectrogram files. To speed the process, load a single file (`eeg_specs.npy`) provided by "brain-eeg-spectrograms" dataset, which incorporates all these spectrograms.

In [None]:
start = time.time()
# Load all the spectrogram files in a single read (1 minutes)
# specs = np.load('/kaggle/input/brain-eeg-spectrograms-data/specs.npy', allow_pickle=True).item()
# Load the spectrogram file one by one (10 minutes)
PATH = '/kaggle/input/hms-harmful-brain-activity-classification/train_spectrograms/'
files = os.listdir(PATH)
print(f'There are {len(files)} spectrogram parquets')
specs = {}
for file in tqdm(files):
    tmp = pd.read_parquet(f'{PATH}{file}')
    name = int(file.split('.')[0]) # Get file name (spectrogram_id)
    specs[name] = tmp.iloc[:,1:].values
# Save to a file
with open('specs.npy', 'wb') as f:
    np.save(f, specs) 
clear_memory()
print(f"Loading specs files in {time.time() - start: .2f} seconds")

## Load Additional EEG Spectrograms
The additional EEG spectrograms come from @CHRIS DEOTTE's [dataset](https://www.kaggle.com/datasets/cdeotte/brain-eeg-spectrograms) 

In [None]:
# DEBUG = True # True: don't include additional EEG data for debugging purpose
# if not DEBUG:
start = time.time()
# eeg_specs = np.load('/kaggle/input/brain-eeg-spectrograms/eeg_specs.npy',allow_pickle=True).item()    

# Get all eeg_ids from training data
eeg_ids = train_df['eeg_id'].tolist()
eeg_specs= {}
for eeg_id in tqdm(eeg_ids):
    eeg_specs[eeg_id] = np.load(f'/kaggle/input/brain-eeg-spectrograms/EEG_Spectrograms/{eeg_id}.npy')

# # Save to a file
with open('eeg_specs.npy', 'wb') as f:
    np.save(f, eeg_specs) 
clear_memory()

In [None]:
sys.exit(0)

# Create Train DataLoader
This dataloader outputs 4 spectrogram images as a 4 channel image of size 128x256x4 per train sample.

<!-- This notebook version is not using data augmention but the code is available below to experiment with albumentations data augmention. Just add `augment = True` when creating the train data loader. And consider adding new transformations to the augment function below. -->



In [None]:
# Albumentations is a computer vision tool (https://albumentations.ai/) that perform image augmentations
import albumentations as albu

class DataLoader(tf.keras.utils.Sequence):
    'Load the data'
    def __init__(self, data, specs, eeg_specs,
                 batch_size=32, shuffle=False,
                 augmentation=False, mode='train'): 
        self.data = data
        self.specs = specs
        self.eeg_specs = eeg_specs
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.augmentation = augmentation
        self.mode = mode
        self.__update_indexes()
        
    def __len__(self):
        'Denotes the number of batches per epoch'
        ct = int( np.ceil( len(self.data) / self.batch_size ) )
        return ct

    def __getitem__(self, index):
        'Get one batch of data'
        indexes = self.indexes[index*self.batch_size: (index+1)*self.batch_size]
        X, y = self.__data_generation(indexes)
        if self.augmentation: # Data argumentation
            X = self.__augment_batch(X) 
        return X, y

    def __update_indexes(self):
        'Updates indexes and shuffle indexes if enabled'
        self.indexes = np.arange( len(self.data) )
        if self.shuffle: 
            np.random.shuffle(self.indexes)
                        
    def __data_generation(self, indexes):
        'Generates data containing batch_size samples' 
        # X is a collection of 128x256x8 images
        X = np.zeros((len(indexes), 128, 256, 8), dtype='float32')
        # Y is 6 targets 
        y = np.zeros((len(indexes), 6), dtype='float32')
        
        img = np.ones((128,256),dtype='float32')
        for i, index in tqdm(enumerate(indexes)):
            row = self.data.iloc[index]
            if self.mode=='test': 
                r = 0
            else: 
                r = int( (row['min'] + row['max'])//4 )

            for k in range(4): # 4 channels
                # EXTRACT 300 ROWS OF SPECTROGRAM
                img = self.specs[row.spec_id][r:r+300, k*100:(k+1)*100].T
                
                # LOG TRANSFORM SPECTROGRAM
                img = np.clip(img, np.exp(-4), np.exp(8)) # fall within 10^-4 to 10^8
                img = np.log(img) # Logarized the img values
                
                # STANDARDIZE PER IMAGE
                ep = 1e-6
                m = np.nanmean(img.flatten())
                s = np.nanstd(img.flatten())
                img = (img-m)/(s+ep)
                img = np.nan_to_num(img, nan=0.0)
                
                # CROP TO 256 TIME STEPS
                X[i, 14:-14,:, k] = img[:,22:-22] / 2.0
        
            # EEG SPECTROGRAMS
            img = self.eeg_specs[row.eeg_id]
            X[i, :, :, 4:] = img
            # Add targets to y
            if self.mode != 'test':
                y[i,] = row[TARGETS]
        print(f"X.shape = {X.shape}")    
        return X, y
    
    # Run data augmentation function on training images       
    def __augment_batch(self, img_batch):        
        composition = albu.Compose([
                    albu.HorizontalFlip(p=0.5), 
                    #albu.CoarseDropout(max_holes=8,max_height=32,max_width=32,fill_value=0,p=0.5),
                ])
        for i in range(img_batch.shape[0]):
            img = img_batch[i, ]
            img_batch[i, ] = composition(image=img)['image']
        return img_batch

In [None]:
data_loader = DataLoader(train_df, specs, eeg_specs=None, 
                         batch_size=CFG.batch_size, shuffle=CFG.shuffle, 
                         augmentation=False)

## Display DataLoader
Display some example images from data loader

In [None]:
TARS = {'Seizure':0, 'LPD':1, 'GPD':2, 'LRDA':3, 'GRDA':4, 'Other':5}
TARS2 = {x:y for y,x in TARS.items()}

def display_dataLoader(data_loader, train_df):
    ROWS=2
    COLS=3
    BATCHES=2
    for i, (x,y) in enumerate(data_loader):
        plt.figure(figsize=(20,8))
        for j in range(ROWS):
            for k in range(COLS):
                plt.subplot(ROWS,COLS,j*COLS+k+1)
                t = y[j*COLS+k]
                img = x[j*COLS+k,:,:,0][::-1,]
                mn = img.flatten().min()
                mx = img.flatten().max()
                img = (img-mn)/(mx-mn)
                plt.imshow(img)
                tars = f'[{t[0]:0.2f}'
                for s in t[1:]: tars += f', {s:0.2f}'
                eeg = train_df.eeg_id.values[i*32+j*COLS+k]
                plt.title(f'EEG = {eeg}\nTarget = {tars}',size=12)
                plt.yticks([])
                plt.ylabel('Frequencies (Hz)',size=14)
                plt.xlabel('Time (sec)',size=16)
        plt.show()
        if i==BATCHES-1: 
            break

In [None]:
display_dataLoader(data_loader)

In [None]:
sys.exit(0)

# Train Scheduler
We will train our model with a Step Train Schedule for 4 epochs. First 2 epochs are LR=1e-3. Then epochs 3 and 4 use LR=1e-4 and 1e-5 respectively. (Below we also provide a Cosine Train Schedule if you want to experiment with it. Note it is not used in this notebook).

In [None]:
import math
LR_START = 1e-6
LR_MAX = 1e-3
LR_MIN = 1e-6
LR_RAMPUP_EPOCHS = 0
LR_SUSTAIN_EPOCHS = 0
EPOCHS2 = 10

def lrfn(epoch):
    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        decay_total_epochs = EPOCHS2 - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS - 1
        decay_epoch_index = epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS
        phase = math.pi * decay_epoch_index / decay_total_epochs
        cosine_decay = 0.5 * (1 + math.cos(phase))
        lr = (LR_MAX - LR_MIN) * cosine_decay + LR_MIN
    return lr

rng = [i for i in range(EPOCHS2)]
lr_y = [lrfn(x) for x in rng]
plt.figure(figsize=(10, 4))
plt.plot(rng, lr_y, '-o')
plt.xlabel('epoch',size=14); plt.ylabel('learning rate',size=14)
plt.title('Cosine Training Schedule',size=16); plt.show()

LR2 = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = True)

In [None]:
LR_START = 1e-4
LR_MAX = 1e-3
LR_RAMPUP_EPOCHS = 0
LR_SUSTAIN_EPOCHS = 1
LR_STEP_DECAY = 0.1
EVERY = 1
EPOCHS = 4

def lrfn(epoch):
    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = LR_MAX * LR_STEP_DECAY**((epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS)//EVERY)
    return lr

rng = [i for i in range(EPOCHS)]
y = [lrfn(x) for x in rng]
plt.figure(figsize=(10, 4))
plt.plot(rng, y, 'o-'); 
plt.xlabel('epoch',size=14); plt.ylabel('learning rate',size=14)
plt.title('Step Training Schedule',size=16); plt.show()

LR = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = True)

## Build EfficientNet Model
Version 1-3 uses EfficientNet B2. Version 4 uses EfficientNet B0. Our models receives both Kaggle spectrograms and EEG spectrograms from our data loader. We then reshape these 8 spectrograms into 1 large flat image and feed it into EfficientNet.

In [None]:
VER = 5

# IF THIS EQUALS NONE, THEN WE TRAIN NEW MODELS
# IF THIS EQUALS DISK PATH, THEN WE LOAD PREVIOUSLY TRAINED MODELS
LOAD_MODELS_FROM = '/kaggle/input/brain-efficientnet-models-v3-v4-v5/'

In [None]:
import efficientnet.tfkeras as efn

def build_model():
    
    inp = tf.keras.Input(shape=(128,256,8))
    base_model = efn.EfficientNetB0(include_top=False, weights=None, input_shape=None)
    base_model.load_weights('/kaggle/input/tf-efficientnet-imagenet-weights/efficientnet-b0_weights_tf_dim_ordering_tf_kernels_autoaugment_notop.h5')
    
    # RESHAPE INPUT 128x256x8 => 512x512x3 MONOTONE IMAGE
    # KAGGLE SPECTROGRAMS
    x1 = [inp[:,:,:,i:i+1] for i in range(4)]
    x1 = tf.keras.layers.Concatenate(axis=1)(x1)
    # EEG SPECTROGRAMS
    x2 = [inp[:,:,:,i+4:i+5] for i in range(4)]
    x2 = tf.keras.layers.Concatenate(axis=1)(x2)
    # MAKE 512X512X3
    if USE_KAGGLE_SPECTROGRAMS & USE_EEG_SPECTROGRAMS:
        x = tf.keras.layers.Concatenate(axis=2)([x1,x2])
    elif USE_EEG_SPECTROGRAMS: x = x2
    else: x = x1
    x = tf.keras.layers.Concatenate(axis=3)([x,x,x])
    
    # OUTPUT
    x = base_model(x)
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    x = tf.keras.layers.Dense(6,activation='softmax', dtype='float32')(x)
        
    # COMPILE MODEL
    model = tf.keras.Model(inputs=inp, outputs=x)
    opt = tf.keras.optimizers.Adam(learning_rate = 1e-3)
    loss = tf.keras.losses.KLDivergence()

    model.compile(loss=loss, optimizer = opt) 
        
    return model

# Train Model
We train using Group KFold on patient id. If `LOAD_MODELS_FROM = None`, then we will train new models in this notebook version. Otherwise we will load saved models from the path `LOAD_MODELS_FROM`.

In [None]:
from sklearn.model_selection import KFold, GroupKFold
import tensorflow.keras.backend as K, gc

all_oof = []
all_true = []

gkf = GroupKFold(n_splits=5)
for i, (train_index, valid_index) in enumerate(gkf.split(train, train.target, train.patient_id)):  
    
    print('#'*25)
    print(f'### Fold {i+1}')
    
    train_gen = DataGenerator(train.iloc[train_index], shuffle=True, batch_size=32, augment=False)
    valid_gen = DataGenerator(train.iloc[valid_index], shuffle=False, batch_size=64, mode='valid')
    
    print(f'### train size {len(train_index)}, valid size {len(valid_index)}')
    print('#'*25)
    
    K.clear_session()
    with strategy.scope():
        model = build_model()
    if LOAD_MODELS_FROM is None:
        model.fit(train_gen, verbose=1,
              validation_data = valid_gen,
              epochs=EPOCHS, callbacks = [LR])
        model.save_weights(f'EffNet_v{VER}_f{i}.h5')
    else:
        model.load_weights(f'{LOAD_MODELS_FROM}EffNet_v{VER}_f{i}.h5')
        
    oof = model.predict(valid_gen, verbose=1)
    all_oof.append(oof)
    all_true.append(train.iloc[valid_index][TARGETS].values)
    
    del model, oof
    gc.collect()
    
all_oof = np.concatenate(all_oof)
all_true = np.concatenate(all_true)

# CV Score for EfficientNet
This is CV score for our EfficientNet model.

In [None]:
import sys
sys.path.append('/kaggle/input/kaggle-kl-div')
from kaggle_kl_div import score

oof = pd.DataFrame(all_oof.copy())
oof['id'] = np.arange(len(oof))

true = pd.DataFrame(all_true.copy())
true['id'] = np.arange(len(true))

cv = score(solution=true, submission=oof, row_id_column_name='id')
print('CV Score KL-Div for EfficientNetB2 =',cv)

# Infer Test and Create Submission CSV
Below we use our 5 EfficientNet fold models to infer the test data and create a `submission.csv` file.

In [None]:
del all_eegs, spectrograms; gc.collect()
test = pd.read_csv('/kaggle/input/hms-harmful-brain-activity-classification/test.csv')
print('Test shape',test.shape)
test.head()

In [None]:
# READ ALL SPECTROGRAMS
PATH2 = '/kaggle/input/hms-harmful-brain-activity-classification/test_spectrograms/'
files2 = os.listdir(PATH2)
print(f'There are {len(files2)} test spectrogram parquets')
    
spectrograms2 = {}
for i,f in enumerate(files2):
    if i%100==0: print(i,', ',end='')
    tmp = pd.read_parquet(f'{PATH2}{f}')
    name = int(f.split('.')[0])
    spectrograms2[name] = tmp.iloc[:,1:].values
    
# RENAME FOR DATALOADER
test = test.rename({'spectrogram_id':'spec_id'},axis=1)

In [None]:
import pywt, librosa

USE_WAVELET = None 

NAMES = ['LL','LP','RP','RR']

FEATS = [['Fp1','F7','T3','T5','O1'],
         ['Fp1','F3','C3','P3','O1'],
         ['Fp2','F8','T4','T6','O2'],
         ['Fp2','F4','C4','P4','O2']]

# DENOISE FUNCTION
def maddest(d, axis=None):
    return np.mean(np.absolute(d - np.mean(d, axis)), axis)

def denoise(x, wavelet='haar', level=1):    
    coeff = pywt.wavedec(x, wavelet, mode="per")
    sigma = (1/0.6745) * maddest(coeff[-level])

    uthresh = sigma * np.sqrt(2*np.log(len(x)))
    coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])

    ret=pywt.waverec(coeff, wavelet, mode='per')
    
    return ret

def spectrogram_from_eeg(parquet_path, display=False):
    
    # LOAD MIDDLE 50 SECONDS OF EEG SERIES
    eeg = pd.read_parquet(parquet_path)
    middle = (len(eeg)-10_000)//2
    eeg = eeg.iloc[middle:middle+10_000]
    
    # VARIABLE TO HOLD SPECTROGRAM
    img = np.zeros((128,256,4),dtype='float32')
    
    if display: plt.figure(figsize=(10,7))
    signals = []
    for k in range(4):
        COLS = FEATS[k]
        
        for kk in range(4):
        
            # COMPUTE PAIR DIFFERENCES
            x = eeg[COLS[kk]].values - eeg[COLS[kk+1]].values

            # FILL NANS
            m = np.nanmean(x)
            if np.isnan(x).mean()<1: x = np.nan_to_num(x,nan=m)
            else: x[:] = 0

            # DENOISE
            if USE_WAVELET:
                x = denoise(x, wavelet=USE_WAVELET)
            signals.append(x)

            # RAW SPECTROGRAM
            mel_spec = librosa.feature.melspectrogram(y=x, sr=200, hop_length=len(x)//256, 
                  n_fft=1024, n_mels=128, fmin=0, fmax=20, win_length=128)

            # LOG TRANSFORM
            width = (mel_spec.shape[1]//32)*32
            mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max).astype(np.float32)[:,:width]

            # STANDARDIZE TO -1 TO 1
            mel_spec_db = (mel_spec_db+40)/40 
            img[:,:,k] += mel_spec_db
                
        # AVERAGE THE 4 MONTAGE DIFFERENCES
        img[:,:,k] /= 4.0
        
        if display:
            plt.subplot(2,2,k+1)
            plt.imshow(img[:,:,k],aspect='auto',origin='lower')
            plt.title(f'EEG {eeg_id} - Spectrogram {NAMES[k]}')
            
    if display: 
        plt.show()
        plt.figure(figsize=(10,5))
        offset = 0
        for k in range(4):
            if k>0: offset -= signals[3-k].min()
            plt.plot(range(10_000),signals[k]+offset,label=NAMES[3-k])
            offset += signals[3-k].max()
        plt.legend()
        plt.title(f'EEG {eeg_id} Signals')
        plt.show()
        print(); print('#'*25); print()
        
    return img

In [None]:
# READ ALL EEG SPECTROGRAMS
PATH2 = '/kaggle/input/hms-harmful-brain-activity-classification/test_eegs/'
DISPLAY = 1
EEG_IDS2 = test.eeg_id.unique()
all_eegs2 = {}

print('Converting Test EEG to Spectrograms...'); print()
for i,eeg_id in enumerate(EEG_IDS2):
        
    # CREATE SPECTROGRAM FROM EEG PARQUET
    img = spectrogram_from_eeg(f'{PATH2}{eeg_id}.parquet', i<DISPLAY)
    all_eegs2[eeg_id] = img

In [None]:
# INFER EFFICIENTNET ON TEST
preds = []
model = build_model()
test_gen = DataGenerator(test, shuffle=False, batch_size=64, mode='test',
                         specs = spectrograms2, eeg_specs = all_eegs2)

for i in range(5):
    print(f'Fold {i+1}')
    if LOAD_MODELS_FROM:
        model.load_weights(f'{LOAD_MODELS_FROM}EffNet_v{VER}_f{i}.h5')
    else:
        model.load_weights(f'EffNet_v{VER}_f{i}.h5')
    pred = model.predict(test_gen, verbose=1)
    preds.append(pred)
pred = np.mean(preds,axis=0)
print()
print('Test preds shape',pred.shape)

In [None]:
sub = pd.DataFrame({'eeg_id':test.eeg_id.values})
sub[TARGETS] = pred
sub.to_csv('submission.csv',index=False)
print('Submissionn shape',sub.shape)
sub.head()

In [None]:
# SANITY CHECK TO CONFIRM PREDICTIONS SUM TO ONE
sub.iloc[:,-6:].sum(axis=1)