In this notebook I'll wrap the Pytorch model by `skorch` and fit baseline models.

## Issue: Segmentation fault
After several round of experiments I've found that to use `skorch` in local laptop, the model need to be very small otherwise the kernel will restart. If run in command line, the error message was "zsh: segmentation fault".  
For example, if we use properties + morgan256 as input, the MLP cannot even take hidden dimension as 128 at batch size as low as 4. If set hidden dim to 64, the model can work. There must be a bug.

To get baseline MLP, write a script and execute on cluster. Here I'll only use the very small model (nofp) to explore methods.

In [21]:
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from skorch import NeuralNetRegressor
import pickle

In [22]:
%load_ext autoreload
%autoreload 2
import ivpk

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [23]:
target = "VDss"

In [24]:
(x_train, _), y_train, (x_val, _), y_val, _, _ = ivpk.data.all_datasets(
    target=target, 
    smiles_func=None, 
    #fpType="morgan", fpSize=256
)
x_train, y_train = x_train.astype(np.float32), y_train.reshape(-1, 1).astype(np.float32)
x_val, y_val = x_val.astype(np.float32), y_val.reshape(-1, 1).astype(np.float32)

In [5]:
# train_dl, val_dl = ivpk.data.get_dataloaders(target=target, dests=("train", "val"), batch_size=8)

## Basic wrapper

In [15]:
import skorch

In [6]:
# mlp_reg = ivpk.models.SimpleRegHead(11+256, 512)
reg = NeuralNetRegressor(
    ivpk.models.SimpleRegHead, 
    module__in_dim=11, 
    module__hid_dim=8, 
    criterion=nn.MSELoss, 
    optimizer=torch.optim.SGD, 
    max_epochs=20, 
    lr=0.01, 
    batch_size=4, 
    train_split=None, 
#    iterator_train=train_dl, 
#    iterator_valid=val_dl, 
)

In [7]:
reg.fit(x_train, y_train)

  epoch    train_loss     dur
-------  ------------  ------
      1        [36m3.5303[0m  0.1075
      2        [36m2.7825[0m  0.1086
      3        [36m2.6988[0m  0.1054
      4        [36m2.6517[0m  0.1130
      5        [36m2.6180[0m  0.1216
      6        [36m2.5925[0m  0.1068
      7        [36m2.5704[0m  0.1102
      8        [36m2.5492[0m  0.1297
      9        [36m2.5306[0m  0.1160
     10        [36m2.5150[0m  0.1022
     11        [36m2.5013[0m  0.1092
     12        [36m2.4888[0m  0.1052
     13        [36m2.4773[0m  0.1112
     14        [36m2.4668[0m  0.1126
     15        [36m2.4571[0m  0.1052
     16        [36m2.4480[0m  0.1050
     17        [36m2.4395[0m  0.1049
     18        [36m2.4315[0m  0.1040
     19        [36m2.4240[0m  0.1389
     20        [36m2.4168[0m  0.1317


<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=11, out_features=8, bias=True)
      (1): GELU()
      (2): Dropout(p=0.0, inplace=False)
      (3): Linear(in_features=8, out_features=1, bias=True)
    )
  ),
)

In [8]:
reg.predict(x_val).shape

(229, 1)

## Callbacks

Metrics score callback.

In [16]:
from skorch.callbacks import EpochScoring

In [25]:
mae = EpochScoring("neg_mean_absolute_error", lower_is_better=False, on_train=True)

In [26]:
r2 = EpochScoring("r2", lower_is_better=False, on_train=True)

In [27]:
reg = NeuralNetRegressor(
    ivpk.models.SimpleRegHead, 
    module__in_dim=11, 
    module__hid_dim=8, 
    criterion=nn.MSELoss, 
    optimizer=torch.optim.SGD, 
    max_epochs=10, 
    lr=0.01, 
    batch_size=4, 
    train_split=None, 
    callbacks=[r2, mae]
)

In [28]:
reg.fit(x_train, y_train)

  epoch    neg_mean_absolute_error      r2    train_loss     dur
-------  -------------------------  ------  ------------  ------
      1                    [36m-1.4887[0m  [32m0.1974[0m        [35m3.6035[0m  0.1743
      2                    [36m-1.2491[0m  [32m0.3775[0m        [35m2.7950[0m  0.1309
      3                    [36m-1.2172[0m  [32m0.4058[0m        [35m2.6677[0m  0.1147
      4                    [36m-1.2049[0m  [32m0.4145[0m        [35m2.6287[0m  0.1094
      5                    [36m-1.1974[0m  [32m0.4198[0m        [35m2.6048[0m  0.1372
      6                    [36m-1.1923[0m  [32m0.4238[0m        [35m2.5872[0m  0.1354
      7                    [36m-1.1888[0m  [32m0.4269[0m        [35m2.5730[0m  0.1390
      8                    [36m-1.1861[0m  [32m0.4296[0m        [35m2.5607[0m  0.1607
      9                    [36m-1.1836[0m  [32m0.4322[0m        [35m2.5493[0m  0.1453
     10                    [36m-1.1810[0

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=11, out_features=8, bias=True)
      (1): GELU()
      (2): Dropout(p=0.0, inplace=False)
      (3): Linear(in_features=8, out_features=1, bias=True)
    )
  ),
)

Early stopping callback.  
Remember early stopping is not compatible with grid search since train-val split in grid search left no validation set of neural network thus it's not possible to monitor the "valid_loss". To select a proper model in grid search, we can only try different max epochs.

In [21]:
from skorch.callbacks import EarlyStopping

In [23]:
earlystop = EarlyStopping(patience=1)

In [26]:
reg = NeuralNetRegressor(
    ivpk.models.SimpleRegHead, 
    module__in_dim=11, 
    module__hid_dim=8, 
    criterion=nn.MSELoss, 
    optimizer=torch.optim.SGD, 
    max_epochs=50, 
    lr=0.01, 
    batch_size=4, 
#    train_split=None, 
    callbacks=[r2, earlystop]
)

In [27]:
reg.fit(x_train, y_train)

  epoch      r2    train_loss    valid_loss     dur
-------  ------  ------------  ------------  ------
      1  [36m0.1999[0m        [32m3.6021[0m        [35m3.0378[0m  0.1219
      2  [36m0.3342[0m        [32m2.9975[0m        [35m2.6723[0m  0.1274
      3  [36m0.3554[0m        [32m2.9022[0m        [35m2.5569[0m  0.1130
      4  [36m0.3667[0m        [32m2.8510[0m        [35m2.4924[0m  0.1137
      5  [36m0.3772[0m        [32m2.8040[0m        [35m2.4422[0m  0.1032
      6  [36m0.3892[0m        [32m2.7497[0m        [35m2.3896[0m  0.1019
      7  [36m0.4021[0m        [32m2.6918[0m        [35m2.3579[0m  0.1015
      8  [36m0.4108[0m        [32m2.6528[0m        [35m2.3482[0m  0.1021
      9  [36m0.4163[0m        [32m2.6278[0m        [35m2.3432[0m  0.1010
     10  [36m0.4208[0m        [32m2.6076[0m        [35m2.3387[0m  0.0993
     11  [36m0.4249[0m        [32m2.5893[0m        [35m2.3341[0m  0.1005
     12  [36m0.4287[0m   

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=11, out_features=8, bias=True)
      (1): GELU()
      (2): Dropout(p=0.0, inplace=False)
      (3): Linear(in_features=8, out_features=1, bias=True)
    )
  ),
)

In [30]:
reg.predict(x_val).shape

(229, 1)

## GridSearch

In [31]:
from sklearn.model_selection import GridSearchCV

In [32]:
reg = NeuralNetRegressor(
    ivpk.models.SimpleRegHead, 
    module__in_dim=11, 
    module__hid_dim=8, 
    criterion=nn.MSELoss, 
    optimizer=torch.optim.SGD, 
    max_epochs=50, 
    lr=0.01, 
    batch_size=4, 
    train_split=None, 
)

In [33]:
# sample param grid for efficiency
param_grid = {
    'lr': [0.001, 0.005],
    'module__hid_dim': [4, 8],
    'module__dropout': [0, 0.2],
    'max_epochs': [2, 5]
}

In [34]:
gs = GridSearchCV(reg, param_grid, refit=True, cv=3, scoring='neg_mean_absolute_error')

In [35]:
gs.fit(x_train, y_train)

  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.3928[0m  0.0791
      2        [36m4.0993[0m  0.0753
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.2986[0m  0.0737
      2        [36m4.0903[0m  0.0751
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.3791[0m  0.0766
      2        [36m4.1695[0m  0.0756
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.4790[0m  0.0904
      2        [36m4.2622[0m  0.0891
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.1695[0m  0.0826
      2        [36m3.8662[0m  0.0972
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.4571[0m  0.0934
      2        [36m4.1761[0m  0.0882
  epoch    train_loss     dur
-------  ------------  ------
      1        [36m4.5844[0m  0.0775
      2        [36m4.5220[0m  0.0809
  epoch    train_loss     dur
----

GridSearchCV(cv=3,
             estimator=<class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class 'ivpk.models.SimpleRegHead'>,
  module__hid_dim=8,
  module__in_dim=11,
),
             param_grid={'lr': [0.001, 0.005], 'max_epochs': [2, 5],
                         'module__dropout': [0, 0.2],
                         'module__hid_dim': [4, 8]},
             scoring='neg_mean_absolute_error')

In [36]:
gs.best_estimator_

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=11, out_features=8, bias=True)
      (1): GELU()
      (2): Dropout(p=0, inplace=False)
      (3): Linear(in_features=8, out_features=1, bias=True)
    )
  ),
)

In [45]:
gs.best_params_

{'lr': 0.005, 'max_epochs': 5, 'module__dropout': 0, 'module__hid_dim': 8}

In [38]:
gs.best_score_

-1.21109934647878

In [44]:
gs.best_estimator_.history_

{'batches': [{'train_loss': 7.843193054199219, 'train_batch_size': 4},
  {'train_loss': 0.9014673233032227, 'train_batch_size': 4},
  {'train_loss': 0.8766525983810425, 'train_batch_size': 4},
  {'train_loss': 11.758108139038086, 'train_batch_size': 4},
  {'train_loss': 2.4747419357299805, 'train_batch_size': 4},
  {'train_loss': 4.248571395874023, 'train_batch_size': 4},
  {'train_loss': 4.872438430786133, 'train_batch_size': 4},
  {'train_loss': 0.09492123872041702, 'train_batch_size': 4},
  {'train_loss': 8.943922996520996, 'train_batch_size': 4},
  {'train_loss': 8.077860832214355, 'train_batch_size': 4},
  {'train_loss': 5.599372386932373, 'train_batch_size': 4},
  {'train_loss': 1.5913069248199463, 'train_batch_size': 4},
  {'train_loss': 2.7337045669555664, 'train_batch_size': 4},
  {'train_loss': 5.423847675323486, 'train_batch_size': 4},
  {'train_loss': 4.3838701248168945, 'train_batch_size': 4},
  {'train_loss': 5.449618339538574, 'train_batch_size': 4},
  {'train_loss': 5.0

## GridSearchCV on cluster

The parameters for GridSearchCV on cluster in YAML:
```
param_grid:
  module__hid_dim: [64, 128, 256, 512]
  module__dropout: [0, 0.1, 0.2, 0.3, 0.5]
  lr: [0.001, 0.01, 0.05]
  max_epochs: [20, 50, 100, 200]
```

The `GridSearchCV`-wrapped-`NeuralNetRegressor`-wrapped-MLP is stored in `models/GridSearchCV` with other GridSearchCV objects.

The results were a little worse than random forest regressor. I even tried to use morgan2048 as input. Compared with properties + morgan256, properties + morgan2048 is better for MLP, but still slightly worse than random forest regressor. Here we only discussed the train+val set, leaving test set unseen.

## Finalize MLP

In this part I'm going to use the best MLP hyperparameters from GridSearchCV to train MLP on the train-val split. Note that we can set same train-val split as our split in `ivpk.data.all_datasets` by setting the same random_state in `skorch.dataset.CVSplit()`.

The result MLP will be registered for later evaluation along with other ML models.

Due to the skorch memory issue, the training will be performed on cluster.

#### VDss

In [5]:
with open("models/GridSearchCV/VDss_MLP_gridsearch.pkl", "rb") as f:
    model = pickle.load(f)

In [9]:
model.best_estimator_

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=267, out_features=64, bias=True)
      (1): GELU()
      (2): Dropout(p=0.3, inplace=False)
      (3): Linear(in_features=64, out_features=1, bias=True)
    )
  ),
)

In [31]:
model.best_estimator_.batch_size

128

In [7]:
model.best_params_

{'lr': 0.05, 'max_epochs': 50, 'module__dropout': 0.3, 'module__hid_dim': 64}

#### CL

In [38]:
with open("models/GridSearchCV/CL_MLP_gridsearch.pkl", "rb") as f:
    model = pickle.load(f)

In [39]:
model.best_estimator_

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=SimpleRegHead(
    (main): Sequential(
      (0): Linear(in_features=267, out_features=512, bias=True)
      (1): GELU()
      (2): Dropout(p=0.3, inplace=False)
      (3): Linear(in_features=512, out_features=1, bias=True)
    )
  ),
)

In [40]:
model.best_estimator_.batch_size

128

In [41]:
model.best_params_

{'lr': 0.001,
 'max_epochs': 200,
 'module__dropout': 0.3,
 'module__hid_dim': 512}

Next I'll check all saved models to create a leaderboard and report results.