# Skorch Basic usage

*`skorch`* is designed to maximize interoperability between `sklearn` and `pytorch`. The aim is to keep 99% of the flexibility of `pytorch` while being able to leverage most features of `sklearn`. Below, we show the basic usage of `skorch` and how it can be combined with `sklearn`.

<table align="left"><td>
<a target="_blank" href="https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Basic_Usage.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>  
</td><td>
<a target="_blank" href="https://github.com/skorch-dev/skorch/blob/master/notebooks/Basic_Usage.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a></td></table>

This notebook shows you how to use the basic functionality of `skorch`.

### Table of contents

* [Definition of the pytorch module](#Definition-of-the-pytorch-module)
* [Training a classifier](#Training-a-classifier-and-making-predictions)
  * [Dataset](#A-toy-binary-classification-task)
  * [pytorch module](#Definition-of-the-pytorch-classification-module)
  * [Model training](#Defining-and-training-the-neural-net-classifier)
  * [Inference](#Making-predictions,-classification)
* [Training a regressor](#Training-a-regressor)
  * [Dataset](#A-toy-regression-task)
  * [pytorch module](#Definition-of-the-pytorch-regression-module)
  * [Model training](#Defining-and-training-the-neural-net-regressor)
  * [Inference](#Making-predictions,-regression)
* [Saving and loading a model](#Saving-and-loading-a-model)
  * [Whole model](#Saving-the-whole-model)
  * [Only parameters](#Saving-only-the-model-parameters)
* [Usage with an sklearn Pipeline](#Usage-with-an-sklearn-Pipeline)
* [Callbacks](#Callbacks)
* [Grid search](#Usage-with-sklearn-GridSearchCV)
  * [Special prefixes](#Special-prefixes)
  * [Performing a grid search](#Performing-a-grid-search)

In [21]:
#! [ ! -z "$COLAB_GPU" ] && pip install torch skorch

In [22]:
import torch
from torch import nn
import torch.nn.functional as F

In [23]:
torch.manual_seed(0)
torch.cuda.manual_seed(0)

## Training a classifier and making predictions

### A toy binary classification task

We load a toy classification task from `sklearn`.

In [24]:
import numpy as np
from sklearn.datasets import make_classification

In [25]:
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)

In [26]:
X.shape, y.shape, y.mean()

((1000, 20), (1000,), 0.5)

### Definition of the `pytorch` classification `module`

We define a vanilla neural network with two hidden layers. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling `predict_proba`, the output from the `forward` call will be used.

In [27]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = F.relu(self.dense1(X))
        X = F.softmax(self.output(X), dim=-1)
        return X

### Defining and training the neural net classifier

We use `NeuralNetClassifier` because we're dealing with a classifcation task. The first argument should be the `pytorch module`. As additional arguments, we pass the number of epochs and the learning rate (`lr`), but those are optional.

*Note*: To use the CUDA backend, pass `device='cuda'` as an additional argument.

In [28]:
from skorch import NeuralNetClassifier

In [29]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

As in `sklearn`, we call `fit` passing the input data `X` and the targets `y`. By default, `NeuralNetClassifier` makes a `StratifiedKFold` split on the data (80/20) to track the validation loss. This is shown, as well as the train loss and the accuracy on the validation set.

In [30]:
net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6905[0m       [32m0.6150[0m        [35m0.6749[0m  0.0145
      2        [36m0.6740[0m       [32m0.6200[0m        [35m0.6668[0m  0.0150
      3        [36m0.6594[0m       [32m0.6750[0m        [35m0.6554[0m  0.0147
      4        [36m0.6482[0m       [32m0.6900[0m        [35m0.6452[0m  0.0156
      5        [36m0.6423[0m       [32m0.7050[0m        [35m0.6333[0m  0.0132
      6        [36m0.6231[0m       0.7000        [35m0.6188[0m  0.0153
      7        [36m0.6081[0m       [32m0.7100[0m        [35m0.6064[0m  0.0159
      8        [36m0.6003[0m       0.7000        [35m0.5940[0m  0.0152
      9        [36m0.5937[0m       [32m0.7250[0m        [35m0.5836[0m  0.0139
     10        [36m0.5830[0m       0.7150        [35m0.5725[0m  0.0149
     11        [36m0.5686[0m       0.7100        [35m0.5660[0m  0.014

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)

In [31]:
class subClass(ClassifierModule):
    _hidden_neuron_cls = 5
    
class subsubClass(ClassifierModule):
    _hidden_neuron_cls = 6
    

In [32]:
net2 = NeuralNetClassifier(
    subClass,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

net3 = NeuralNetClassifier(
    subsubClass,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

In [33]:
net2.fit(X, y)
net3.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6933[0m       [32m0.5700[0m        [35m0.6852[0m  0.0145
      2        [36m0.6855[0m       [32m0.6200[0m        [35m0.6736[0m  0.0174
      3        [36m0.6777[0m       0.6200        [35m0.6663[0m  0.0134
      4        [36m0.6603[0m       [32m0.6450[0m        [35m0.6597[0m  0.0159
      5        [36m0.6560[0m       [32m0.6650[0m        [35m0.6524[0m  0.0159
      6        [36m0.6534[0m       0.6600        [35m0.6465[0m  0.0159
      7        [36m0.6490[0m       0.6650        [35m0.6393[0m  0.0164
      8        [36m0.6305[0m       [32m0.6700[0m        [35m0.6305[0m  0.0146
      9        [36m0.6270[0m       [32m0.6800[0m        [35m0.6202[0m  0.0135
     10        [36m0.6222[0m       0.6750        [35m0.6114[0m  0.0697
     11        [36m0.6110[0m       0.6800        [35m0.6038[0m  0.0150
     12

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=subsubClass(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)

Also, as in `sklearn`, you may call `predict` or `predict_proba` on the fitted model.

### Making predictions, classification

In [34]:
y_pred = net.predict(X[:5])
y_pred

array([0, 0, 0, 0, 0])

In [35]:
y_proba = net.predict_proba(X[:5])
y_proba

array([[0.5603605 , 0.4396395 ],
       [0.782588  , 0.21741197],
       [0.6924924 , 0.3075076 ],
       [0.8895971 , 0.1104029 ],
       [0.70746267, 0.2925373 ]], dtype=float32)

## Training a regressor

### A toy regression task

In [36]:
from sklearn.datasets import make_regression

In [37]:
X_regr, y_regr = make_regression(1000, 20, n_informative=10, random_state=0)
X_regr = X_regr.astype(np.float32)
y_regr = y_regr.astype(np.float32) / 100
y_regr = y_regr.reshape(-1, 1)

In [38]:
X_regr.shape, y_regr.shape, y_regr.min(), y_regr.max()

((1000, 20), (1000, 1), -6.4901485, 6.154505)

*Note*: Regression currently requires the target to be 2-dimensional, hence the need to reshape. This should be fixed with an upcoming version of pytorch.

### Definition of the `pytorch` regression `module`

Again, define a vanilla neural network with two hidden layers. The main difference is that the output layer only has one unit and does not apply a softmax nonlinearity.

In [39]:
class RegressorModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
    ):
        super(RegressorModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 1)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = F.relu(self.dense1(X))
        X = self.output(X)
        return X

### Defining and training the neural net regressor

Training a regressor is almost the same as training a classifier. Mainly, we use `NeuralNetRegressor` instead of `NeuralNetClassifier` (this is the same terminology as in `sklearn`).

In [40]:
from skorch import NeuralNetRegressor

In [41]:
net_regr = NeuralNetRegressor(
    RegressorModule,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

In [42]:
net_regr.fit(X_regr, y_regr)

  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        [36m4.3479[0m        [32m3.0639[0m  0.0383
      2        [36m1.7830[0m        [32m0.5272[0m  0.0148
      3        [36m0.2620[0m        [32m0.1964[0m  0.0170
      4        [36m0.1332[0m        [32m0.1704[0m  0.0163
      5        [36m0.1209[0m        [32m0.1355[0m  0.0168
      6        0.2293        0.5582  0.0182
      7        0.3251        [32m0.1058[0m  0.0177
      8        [36m0.0747[0m        [32m0.0565[0m  0.0156
      9        [36m0.0347[0m        [32m0.0419[0m  0.0168
     10        [36m0.0299[0m        [32m0.0309[0m  0.0156
     11        [36m0.0202[0m        0.0358  0.0167
     12        0.0328        0.0310  0.0168
     13        0.0290        0.0599  0.0173
     14        0.0635        0.0513  0.0170
     15        0.0502        0.0602  0.0164
     16        0.0475        [32m0.0221[0m  0.0161
     17        [36m0.0168[0m       

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=RegressorModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=1, bias=True)
  ),
)

### Making predictions, regression

You may call `predict` or `predict_proba` on the fitted model. For regressions, both methods return the same value.

In [43]:
y_pred = net_regr.predict(X_regr[:5])
y_pred

array([[ 0.8809004 ],
       [-1.4545293 ],
       [-0.73125255],
       [-0.21497256],
       [-0.3453628 ]], dtype=float32)

## Saving and loading a model

Save and load either the whole model by using pickle or just the learned model parameters by calling `save_params` and `load_params`.

### Saving the whole model

In [44]:
import pickle

In [45]:
file_name = '/tmp/mymodel.pkl'

In [46]:
with open(file_name, 'wb') as f:
    pickle.dump(net, f)

In [47]:
with open(file_name, 'rb') as f:
    new_net = pickle.load(f)

### Saving only the model parameters

This only saves and loads the proper `module` parameters, meaning that hyperparameters such as `lr` and `max_epochs` are not saved. Therefore, to load the model, we have to re-initialize it beforehand.

In [48]:
net.save_params(f_params=file_name)  # a file handler also works

In [49]:
# first initialize the model
new_net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
).initialize()

In [50]:
new_net.load_params(file_name)

## Usage with an `sklearn Pipeline`

It is possible to put the `NeuralNetClassifier` inside an `sklearn Pipeline`, as you would with any `sklearn` classifier.

In [51]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [52]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('net', net),
])

In [53]:
pipe.fit(X, y)

Re-initializing module.
Re-initializing criterion.
Re-initializing optimizer.
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6861[0m       [32m0.5700[0m        [35m0.6835[0m  0.0134
      2        [36m0.6831[0m       [32m0.5950[0m        [35m0.6811[0m  0.0140
      3        [36m0.6798[0m       0.5900        [35m0.6782[0m  0.0154
      4        [36m0.6766[0m       [32m0.6050[0m        [35m0.6751[0m  0.0149
      5        [36m0.6751[0m       0.6050        [35m0.6715[0m  0.0154
      6        [36m0.6656[0m       [32m0.6450[0m        [35m0.6663[0m  0.0158
      7        [36m0.6641[0m       0.6250        [35m0.6607[0m  0.0146
      8        [36m0.6532[0m       0.6400        [35m0.6540[0m  0.0162
      9        [36m0.6524[0m       [32m0.6500[0m        [35m0.6474[0m  0.0149
     10        [36m0.6399[0m       [32m0.6650[0m        [35m0.6408[0m  0.0140
    

In [54]:
y_proba = pipe.predict_proba(X[:5])
y_proba

array([[0.36018655, 0.6398134 ],
       [0.694329  , 0.30567098],
       [0.66123945, 0.3387605 ],
       [0.7088052 , 0.2911948 ],
       [0.6906064 , 0.3093936 ]], dtype=float32)

To save the whole pipeline, including the pytorch module, use `pickle`.

## Callbacks

Adding a new callback to the model is straightforward. Below we show how to add a new callback that determines the area under the ROC (AUC) score.

In [55]:
from skorch.callbacks import EpochScoring

There is a scoring callback in skorch, `EpochScoring`, which we use for this. We have to specify which score to calculate. We have 3 choices:

* Passing a string: This should be a valid `sklearn` metric. For a list of all existing scores, look [here](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics).
* Passing `None`: If you implement your own `.score` method on your neural net, passing `scoring=None` will tell `skorch` to use that.
* Passing a function or callable: If we want to define our own scoring function, we pass a function with the signature `func(model, X, y) -> score`, which is then used.

Note that this works exactly the same as scoring in `sklearn` does.

For our case here, since `sklearn` already implements AUC, we just pass the correct string `'roc_auc'`. We should also tell the callback that higher scores are better (to get the correct colors printed below -- by default, lower scores are assumed to be better). Furthermore, we may specify a `name` argument for `EpochScoring`, and whether to use training data (by setting `on_train=True`) or validation data (which is the default).

In [56]:
auc = EpochScoring(scoring='roc_auc', lower_is_better=False)

Finally, we pass the scoring callback to the `callbacks` parameter as a list and then call `fit`. Notice that we get the printed scores and color highlighting for free.

In [57]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    callbacks=[auc],
)

In [58]:
net.fit(X, y)

  epoch    roc_auc    train_loss    valid_acc    valid_loss     dur
-------  ---------  ------------  -----------  ------------  ------
      1     [36m0.7071[0m        [32m0.6925[0m       [35m0.6150[0m        [31m0.6777[0m  0.0146
      2     [36m0.7544[0m        [32m0.6772[0m       [35m0.6450[0m        [31m0.6673[0m  0.0129
      3     [36m0.7713[0m        [32m0.6729[0m       [35m0.6800[0m        [31m0.6572[0m  0.0149
      4     [36m0.7810[0m        [32m0.6666[0m       [35m0.6900[0m        [31m0.6496[0m  0.0136
      5     [36m0.7865[0m        [32m0.6490[0m       0.6850        [31m0.6383[0m  0.0148
      6     [36m0.7930[0m        0.6538       0.6900        [31m0.6301[0m  0.0151
      7     0.7903        [32m0.6335[0m       0.6900        [31m0.6170[0m  0.0152
      8     0.7881        [32m0.6131[0m       0.6900        [31m0.6022[0m  0.0161
      9     0.7891        [32m0.6049[0m       [35m0.7000[0m        [31m0.5900[0m  0.0137


<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)

For information on how to write custom callbacks, have a look at the [Advanced_Usage](https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Advanced_Usage.ipynb) notebook.

## Usage with sklearn `GridSearchCV`

### Special prefixes

The `NeuralNet` class allows to directly access parameters of the `pytorch module` by using the `module__` prefix. So e.g. if you defined the `module` to have a `num_units` parameter, you can set it via the `module__num_units` argument. This is exactly the same logic that allows to access estimator parameters in `sklearn Pipeline`s and `FeatureUnion`s.

This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an `sklearn GridSearchCV` as shown below.

In addition to the parameters prefixed by `module__`, you may access a couple of other attributes, such as those of the optimizer by using the `optimizer__` prefix (again, see below). All those special prefixes are stored in the `prefixes_` attribute:

In [59]:
print(', '.join(net.prefixes_))

iterator_train, iterator_valid, callbacks, dataset, module, criterion, optimizer


### Performing a grid search

Below we show how to perform a grid search over the learning rate (`lr`), the module's number of hidden units (`module__num_units`), the module's dropout rate (`module__dropout`), and whether the SGD optimizer should use Nesterov momentum or not (`optimizer__nesterov`).

In [60]:
from sklearn.model_selection import GridSearchCV

In [61]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    optimizer__momentum=0.9,
    verbose=0,
    train_split=False,
)

*Note*: We set the verbosity level to zero (`verbose=0`) to prevent too much print output from being shown. Also, we disable the skorch-internal train-validation split (`train_split=False`) because `GridSearchCV` already splits the training data for us. We only have to leave the skorch-internal split enabled for some specific uses, e.g. to perform `EarlyStopping`.

In [62]:
params = {
    'lr': [0.05, 0.1],
    'module__num_units': [10, 20],
    'module__dropout': [0, 0.5],
    'optimizer__nesterov': [False, True],
}

In [63]:
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)

In [64]:
gs.fit(X, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time=   0.2s
[CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time=   0.

In [65]:
print(gs.best_score_, gs.best_params_)

0.8699897502292712 {'lr': 0.1, 'module__dropout': 0, 'module__num_units': 20, 'optimizer__nesterov': False}


Of course, we could further nest the `NeuralNetClassifier` within an `sklearn Pipeline`, in which case we just prefix the parameter by the name of the net (e.g. `net__module__num_units`).