<center><h1>EfficientNet B5</h1> Training</center> 

### Commit 0
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- Dropout 0.5

**CV MAE is: 3.5531554222106934**

### Commit 1
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.001
- Dropout 0.385

**CV MAE is: 3.9672446727752684**

### Commit 2 (Native)
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 456x456
- Dropout 0.4
- No BatchNorm Layer

**CV MAE is: 3.088250017166138**

### Commit 3 (Native)
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 456x456
- Dropout 0.4
- With BatchNorm Layer

**CV MAE is: 3.682063627243042**

**Since batch size is less, so BatchNorm is not viable, trying Layer Normalization**

### Commit 4 (Native)
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 456x456
- Dropout 0.4
- With LayerNorm Layer

**CV MAE is: 3.556714630126953**

### Commit 6
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 512x512
- Dropout 0.385
- 3 Channels

**CV MAE is: 3.102118635177612**

### Commit 7
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 512x512
- Dropout 0.385
- 3 Channels
- Gaussian Noise 0.25

**CV MAE is: 3.195901393890381**

### Commit 8
- EPOCHS = 100
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 528x528
- Dropout 0.385
- Gaussian Noise 0.2

**CV MAE is: 3.20487517118454**

### Commit 9
- EPOCHS = 110
- BATCH_SIZE = 4
- NFOLD = 5
- LR = 0.003
- 528x528
- Dropout 0.4
- Gaussian Noise 0.2

**CV MAE is: 3.222148895263672**

### Commit 10 (10 Folds at 50 Epochs)
- EPOCHS = 50
- BATCH_SIZE = 4
- NFOLD = 10
- LR = 0.003
- 512x512
- Dropout 0.385
- Gaussian Noise 0.2
- Male

**CV MAE is: 3.222148895263672**

In [1]:
!pip install ../input/kerasapplications/keras-team-keras-applications-3b180cb -f ./ --no-index
!pip install ../input/efficientnet/efficientnet-1.1.0/ -f ./ --no-index

Looking in links: ./
Processing /kaggle/input/kerasapplications/keras-team-keras-applications-3b180cb
Building wheels for collected packages: Keras-Applications
  Building wheel for Keras-Applications (setup.py) ... [?25l- \ done
[?25h  Created wheel for Keras-Applications: filename=Keras_Applications-1.0.8-py3-none-any.whl size=50704 sha256=ed8989406b35a76a4657249e32a57fd2b98ff3d67bc61550c0d58c777aca15a0
  Stored in directory: /root/.cache/pip/wheels/f4/96/13/eccdd9391bd8df958d78851b98ec4dc207ba05b67b011eb70a
Successfully built Keras-Applications
Installing collected packages: Keras-Applications
Successfully installed Keras-Applications-1.0.8
Looking in links: ./
Processing /kaggle/input/efficientnet/efficientnet-1.1.0
Building wheels for collected packages: efficientnet
  Building wheel for efficientnet (setup.py) ... [?25l- done
[?25h  Created wheel for efficientnet: filename=efficientnet-1.1.0-py3-none-any.whl size=14141 sha256=62846ac72bbad8a0269a8c9fe90c377

In [2]:
import os
import cv2
import pydicom
import pandas as pd
import numpy as np 
import tensorflow as tf 
import matplotlib.pyplot as plt 
from tqdm.notebook import tqdm 
from tensorflow.keras.layers import (
    Dense, Dropout, Activation, Flatten, Input, BatchNormalization, LayerNormalization, GlobalAveragePooling2D, Add, Conv2D, AveragePooling2D, 
    LeakyReLU, Concatenate 
)
from tensorflow.keras import Model
from tensorflow.keras.utils import Sequence
import tensorflow.keras.backend as K
import tensorflow.keras.applications as tfa
import efficientnet.tfkeras as efn
from sklearn.model_selection import train_test_split, KFold
import seaborn as sns

In [3]:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

# Training Parameters

- `EPOCHS`: number of epochs to train for in each fold
- `BATCH_SIZE`: batch size of images during training
- `NFOLD`: number of folds in K-fold cross-validation (CV)
- `LR`: learning rate
- `SAVE_BEST`: default is True to save best weights on validation loss
- `MODEL_CLASS`: the class of model. E.g. "b1" for EfficientNet-B1

In [4]:
EPOCHS = 50
BATCH_SIZE = 4
NFOLD = 10
LR = 0.003
SAVE_BEST = True
MODEL_CLASS = 'b5'

In [5]:
train = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/train.csv') 

In [6]:
train.head()

Unnamed: 0,Patient,Weeks,FVC,Percent,Age,Sex,SmokingStatus
0,ID00007637202177411956430,-4,2315,58.253649,79,Male,Ex-smoker
1,ID00007637202177411956430,5,2214,55.712129,79,Male,Ex-smoker
2,ID00007637202177411956430,7,2061,51.862104,79,Male,Ex-smoker
3,ID00007637202177411956430,9,2144,53.950679,79,Male,Ex-smoker
4,ID00007637202177411956430,11,2069,52.063412,79,Male,Ex-smoker


In [7]:
train.SmokingStatus.unique()

array(['Ex-smoker', 'Never smoked', 'Currently smokes'], dtype=object)

In [8]:
def get_tab(df):
    vector = [(df.Age.values[0] - 30) / 30] 
    
    if df.Sex.values[0].lower() == 'Male':
       vector.append(0)
    else:
       vector.append(1)
    
    if df.SmokingStatus.values[0] == 'Never smoked':
        vector.extend([0,0])
    elif df.SmokingStatus.values[0] == 'Ex-smoker':
        vector.extend([1,1])
    elif df.SmokingStatus.values[0] == 'Currently smokes':
        vector.extend([0,1])
    else:
        vector.extend([1,0])
    return np.array(vector) 

In [9]:
A = {} 
TAB = {} 
P = [] 
for i, p in tqdm(enumerate(train.Patient.unique())):
    sub = train.loc[train.Patient == p, :] 
    fvc = sub.FVC.values
    weeks = sub.Weeks.values
    c = np.vstack([weeks, np.ones(len(weeks))]).T
    a, b = np.linalg.lstsq(c, fvc)[0]
    
    A[p] = a
    TAB[p] = get_tab(sub)
    P.append(p)

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




  if __name__ == '__main__':


In [10]:
def get_img(path):
    d = pydicom.dcmread(path)
    return cv2.resize((d.pixel_array - d.RescaleIntercept) / (d.RescaleSlope * 1000), (512, 512))

In [11]:
x, y = [], []
for p in tqdm(train.Patient.unique()):
    try:
        ldir = os.listdir(f'../input/osic-pulmonary-fibrosis-progression-lungs-mask/mask_noise/mask_noise/{p}/')
        numb = [float(i[:-4]) for i in ldir]
        for i in ldir:
            x.append(cv2.imread(f'../input/osic-pulmonary-fibrosis-progression-lungs-mask/mask_noise/mask_noise/{p}/{i}', 0).mean())
            y.append(float(i[:-4]) / max(numb))
    except:
        pass

HBox(children=(FloatProgress(value=0.0, max=176.0), HTML(value='')))




In [12]:
class IGenerator(Sequence):
    BAD_ID = ['ID00011637202177653955184', 'ID00052637202186188008618']
    def __init__(self, keys, a, tab, batch_size=BATCH_SIZE):
        self.keys = [k for k in keys if k not in self.BAD_ID]
        self.a = a
        self.tab = tab
        self.batch_size = batch_size
        
        self.train_data = {}
        for p in train.Patient.values:
            self.train_data[p] = os.listdir(f'../input/osic-pulmonary-fibrosis-progression/train/{p}/')
    
    def __len__(self):
        return 1000
    
    def __getitem__(self, idx):
        x = []
        a, tab = [], [] 
        keys = np.random.choice(self.keys, size = self.batch_size)
        for k in keys:
            try:
                i = np.random.choice(self.train_data[k], size=1)[0]
                img = get_img(f'../input/osic-pulmonary-fibrosis-progression/train/{k}/{i}')
                x.append(img)
                a.append(self.a[k])
                tab.append(self.tab[k])
            except:
                print(k, i)
       
        x,a,tab = np.array(x), np.array(a), np.array(tab)
        x = np.expand_dims(x, axis=-1)
        return [x, tab] , a

In [13]:
def get_efficientnet(model, shape):
    models_dict = {
        'b0': efn.EfficientNetB0(input_shape=shape,weights=None,include_top=False),
        'b1': efn.EfficientNetB1(input_shape=shape,weights=None,include_top=False),
        'b2': efn.EfficientNetB2(input_shape=shape,weights=None,include_top=False),
        'b3': efn.EfficientNetB3(input_shape=shape,weights=None,include_top=False),
        'b4': efn.EfficientNetB4(input_shape=shape,weights=None,include_top=False),
        'b5': efn.EfficientNetB5(input_shape=shape,weights=None,include_top=False),
        'b6': efn.EfficientNetB6(input_shape=shape,weights=None,include_top=False),
        'b7': efn.EfficientNetB7(input_shape=shape,weights=None,include_top=False)
    }
    return models_dict[model]

def build_model(shape=(512, 512, 1), model_class=None):
    inp = Input(shape=shape)
    base = get_efficientnet(model_class, shape)
    x = base(inp)
    x = GlobalAveragePooling2D()(x)
    inp2 = Input(shape=(4,))
    x2 = tf.keras.layers.GaussianNoise(0.2)(inp2)
    x = Concatenate()([x, x2]) 
    x = Dropout(0.385)(x) 
    x = Dense(1)(x)
    model = Model([inp, inp2] , x)
    return model

# Training

In [14]:
kf = KFold(n_splits=NFOLD, random_state=42,shuffle=False)
P = np.array(P)
subs = []
folds_history = []
for fold, (tr_idx, val_idx) in enumerate(kf.split(P)):
    print('#####################')
    print('####### Fold %i ######'%fold)
    print('#####################')
    print('Training...')
    
    er = tf.keras.callbacks.EarlyStopping(
        monitor="val_loss",
        min_delta=1e-3,
        patience=10,
        verbose=1,
        mode="auto",
        baseline=None,
        restore_best_weights=True,
    )

    cpt = tf.keras.callbacks.ModelCheckpoint(
        filepath='fold-%i.h5'%fold,
        monitor='val_loss', 
        verbose=1, 
        save_best_only=SAVE_BEST,
        mode='auto'
    )

    rlp = tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', 
        factor=0.5,
        patience=5, 
        verbose=1, 
        min_lr=1e-8
    )
    model = build_model(model_class=MODEL_CLASS)
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss="mae") 
    history = model.fit_generator(IGenerator(keys=P[tr_idx], 
                                   a = A, 
                                   tab = TAB), 
                        steps_per_epoch = 32,
                        validation_data=IGenerator(keys=P[val_idx], 
                                   a = A, 
                                   tab = TAB),
                        validation_steps = 16, 
                        callbacks = [cpt, rlp], 
                        epochs=EPOCHS)
    folds_history.append(history.history)
    print('Training done!')



#####################
####### Fold 0 ######
#####################
Training...
Epoch 1/50
Epoch 00001: val_loss improved from inf to 92420.67188, saving model to fold-0.h5
Epoch 2/50
Epoch 00002: val_loss improved from 92420.67188 to 16185.80859, saving model to fold-0.h5
Epoch 3/50
Epoch 00003: val_loss improved from 16185.80859 to 4293.44482, saving model to fold-0.h5
Epoch 4/50
Epoch 00004: val_loss improved from 4293.44482 to 477.22949, saving model to fold-0.h5
Epoch 5/50
Epoch 00005: val_loss did not improve from 477.22949
Epoch 6/50
Epoch 00006: val_loss improved from 477.22949 to 244.35130, saving model to fold-0.h5
Epoch 7/50
Epoch 00007: val_loss improved from 244.35130 to 82.23602, saving model to fold-0.h5
Epoch 8/50
Epoch 00008: val_loss improved from 82.23602 to 5.42919, saving model to fold-0.h5
Epoch 9/50
Epoch 00009: val_loss improved from 5.42919 to 5.06479, saving model to fold-0.h5
Epoch 10/50
Epoch 00010: val_loss did not improve from 5.06479
Epoch 11/50
Epoch 00011

ResourceExhaustedError:  OOM when allocating tensor with shape[4,2048,16,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node functional_13/efficientnet-b5/top_activation/mul (defined at /opt/conda/lib/python3.7/site-packages/efficientnet/model.py:115) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_1142042]

Errors may have originated from an input operation.
Input Source operations connected to node functional_13/efficientnet-b5/top_activation/mul:
 functional_13/efficientnet-b5/top_bn/FusedBatchNormV3 (defined at <ipython-input-14-156379352ce4>:47)

Function call stack:
train_function


# CV Evaluation

In [15]:
if SAVE_BEST:
    mean_val_loss = np.mean([np.min(h['val_loss']) for h in folds_history])
else:
    mean_val_loss = np.mean([h['val_loss'][-1] for h in folds_history])
print('Our mean CV MAE is: ' + str(mean_val_loss))

Our mean CV MAE is: 3.4775497118631997
