# About 

This notebook demonstrates neural networks (NN) classifiers, which are provided by __Reproducible experiment platform (REP)__ package. <br /> REP contains wrappers for following NN libraries:
* __theanets__
* __neurolab__ 
* __pybrain__ 


### In this notebook we show: 
* train classifier
* get predictions 
* measure quality
* pretraining and partial fitting
* combine classifiers using meta-algorithms

Most of this is done in the same way as for other classifiers (see notebook [01-howto-Classifiers.ipynb](https://github.com/yandex/rep/blob/master/howto/01-howto-Classifiers.ipynb))

# Loading data

In [1]:
import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score

sig_data = pandas.read_csv('toy_datasets/toyMC_sig_mass.csv', sep='\t')
bck_data = pandas.read_csv('toy_datasets/toyMC_bck_mass.csv', sep='\t')

labels = numpy.array([1] * len(sig_data) + [0] * len(bck_data))
data = pandas.concat([sig_data, bck_data])

### First rows of our data

In [2]:
data[:5]

Unnamed: 0,CDF1,CDF2,CDF3,DOCAone,DOCAthree,DOCAtwo,FlightDistance,FlightDistanceError,Hlt1Dec,Hlt2Dec,...,p1_IP,p1_IPSig,p1_Laura_IsoBDT,p1_pt,p2_IP,p2_IPSig,p2_Laura_IsoBDT,p2_pt,peakingbkg,pt
0,1.0,1.0,1.0,0.111337,0.012695,0.123426,162.650955,0.870942,0,0,...,11.314665,83.196968,-0.223668,699.066467,9.799975,64.790207,-0.121159,521.628174,,220.742111
1,0.759755,0.597375,0.389256,0.021781,0.094551,0.088421,4.193265,1.26228,0,0,...,0.72007,7.237868,-0.256142,587.628935,0.882111,8.834325,-0.20322,532.67995,,661.208843
2,1.0,0.796142,0.566286,0.011852,0.0044,0.009153,1.58061,0.261697,0,0,...,0.362181,4.173097,-0.252788,802.746495,0.42729,5.008959,-0.409469,674.122342,,1290.963982
3,0.716397,0.524712,0.279033,0.015171,0.0839,0.069127,7.884569,1.310151,0,0,...,0.753449,6.615949,-0.25355,564.203857,0.917409,8.695459,-0.192284,537.791687,,692.654175
4,1.0,0.996479,0.888159,0.005547,0.070438,0.064689,-2.267649,0.139555,0,0,...,0.589455,21.869143,-0.254778,746.624928,0.388996,8.465344,-0.217319,988.539221,,1328.83784


### Splitting into train and test

In [3]:
# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.5)

# Neural nets

All nets inherit from __sklearn.BaseEstimator__ and have the same interface as another wrappers in REP (details see in **01-howto-Classifiers**)

All of these nets libraries **support**:

* classification
* multi-classification
* regression
* multi-target regresssion
* additional fitting (using `partial_fit` method)

and **don't support**:

* staged prediction methoods
* weights for data

# Variables used in training

In [4]:
variables = ["FlightDistance", "FlightDistanceError", "IP", "VertexChi2", 
             "pt", "p0_pt", "p1_pt", "p2_pt", 'LifeTime','dira']

# Theanets

In [5]:
from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__

Classifier from Theanets library. 

    Parameters:
    -----------
    :param features: list of features to train model
    :type features: None or list(str)
    :param layers: a sequence of values specifying the **hidden** layer configuration for the network.
        For more information please see 'Specifying layers' in theanets documentation:
        http://theanets.readthedocs.org/en/latest/creating.html#creating-specifying-layers
        Note that theanets "layers" parameter included input and output layers in the sequence as well.
    :type layers: sequence of int, tuple, dict
    :param int input_layer: size of the input layer. If equals -1, the size is taken from the training dataset
    :param int output_layer: size of the output layer. If equals -1, the size is taken from the training dataset
    :param str hidden_activation: the name of an activation function to use on hidden network layers by default
    :param str output_activation: the name of an activation function to u

### Simple training

In [6]:
tn = TheanetsClassifier(features=variables, layers=[20], 
                        trainers=[{'optimize': 'nag', 'learning_rate': 0.1}])

tn.fit(train_data, train_labels)



TheanetsClassifier(decode_from=1,
          features=['FlightDistance', 'FlightDistanceError', 'IP', 'VertexChi2', 'pt', 'p0_pt', 'p1_pt', 'p2_pt', 'LifeTime', 'dira'],
          hidden_activation='logistic', hidden_dropouts=0, hidden_noise=0,
          input_dropouts=0, input_layer=-1, input_noise=0, layers=[20],
          output_activation='linear', output_layer=-1, random_state=42,
          scaler=StandardScaler(copy=True, with_mean=True, with_std=True),
          trainers=[{'learning_rate': 0.1, 'optimize': 'nag'}])

### Predicting probabilities, measuring the quality

In [7]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob

[[ 0.04263913  0.95736087]
 [ 0.91032442  0.08967558]
 [ 0.97220917  0.02779083]
 ..., 
 [ 0.73230451  0.26769549]
 [ 0.91418271  0.08581729]
 [ 0.9888496   0.0111504 ]]


In [8]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

ROC AUC 0.906091572446


### Theanets multistage training 

In some cases we need to continue training: i.e., we have new data or current trainer is not efficient anymore.

For this purpose there is `partial_fit` method, where you can continue training using different trainer or different data.

In [9]:
tn = TheanetsClassifier(features=variables, layers=[10, 10], 
                        trainers=[{'optimize': 'rprop'}])

tn.fit(train_data, train_labels)
print('training complete')

training complete


####  Second stage of fitting

In [10]:
tn.partial_fit(train_data, train_labels, **{'optimize': 'adadelta'})

TheanetsClassifier(decode_from=1,
          features=['FlightDistance', 'FlightDistanceError', 'IP', 'VertexChi2', 'pt', 'p0_pt', 'p1_pt', 'p2_pt', 'LifeTime', 'dira'],
          hidden_activation='logistic', hidden_dropouts=0, hidden_noise=0,
          input_dropouts=0, input_layer=-1, input_noise=0, layers=[10, 10],
          output_activation='linear', output_layer=-1, random_state=42,
          scaler=StandardScaler(copy=True, with_mean=True, with_std=True),
          trainers=[{'optimize': 'rprop'}, {'optimize': 'adadelta'}])

In [11]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob

[[ 0.0256754   0.9743246 ]
 [ 0.96961642  0.03038358]
 [ 0.97005419  0.02994581]
 ..., 
 [ 0.60414954  0.39585046]
 [ 0.55504215  0.44495785]
 [ 0.9699952   0.0300048 ]]


In [12]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

ROC AUC 0.901973367888


### Predictions of classes

In [13]:
tn.predict(test_data)

array([1, 0, 0, ..., 0, 0, 0])

## Neurolab

In [14]:
from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__

Classifier from neurolab library. 

    Parameters:
    -----------
    :param features: features used in training
    :type features: list[str] or None
    :param list[int] layers: sequence, number of units inside each **hidden** layer.
    :param string net_type: type of network
        One of 'feed-forward', 'single-layer', 'competing-layer', 'learning-vector',
        'elman-recurrent', 'hopfield-recurrent', 'hemming-recurrent'
    :param initf: layer initializers
    :type initf: anything implementing call(layer), e.g. nl.init.* or list[nl.init.*] of shape [n_layers]
    :param trainf: net train function, default value depends on type of network
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param random_state: ignored, added for uniformity.
    :param dict kwargs: additional arguments to net __init__, varies with different net_types

    .. seealso:: https://pythonhosted.org/neur

### Let's train network using Rprop algorithm

In [17]:
import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=40, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')

KeyboardInterrupt: 

### After training neural network you still can improve it by using partial fit on other data:
```
nl.partial_fit(new_train_data, new_train_labels)
```


### Predict probabilities and estimate quality

In [None]:
# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob

In [None]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

In [None]:
# predict labels
nl.predict(test_data)

## Pybrain

In [None]:
from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__

In [None]:
pb = PyBrainClassifier(features=variables, layers=[10, 2], hiddenclass=['TanhLayer', 'SigmoidLayer'])
pb.fit(train_data, train_labels)
print('training complete')

### Predict probabilities and estimate quality
again, we could proceed with training and use new dataset
```
nl.partial_fit(new_train_data, new_train_labels)
```


In [None]:
prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])

### Predict labels

In [None]:
pb.predict(test_data)

## Scaling of features
initial prescaling of features is frequently crucial to get some appropriate results using neural networks.

By default, all the networks use `StandardScaler` from `sklearn`, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of `scaler` parameter

In [None]:
from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)

# Advantages of common interface

Let's build an ensemble of neural networks. This will be done by bagging meta-algorithm

## Bagging over Theanets classifier (same can be done with any neural network)
in practice, one will need __many__ networks to get predictions better, then obtained by one network

In [None]:
from sklearn.ensemble import BaggingClassifier

base_tn = TheanetsClassifier(layers=[20], trainers=[{'min_improvement': 0.01}])
bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=3)
bagging_tn.fit(train_data[variables], train_labels)
print('training complete')

In [None]:
prob = bagging_tn.predict_proba(test_data[variables])
print 'AUC', roc_auc_score(test_labels, prob[:, 1])

# Other advantages of common interface
There are many things you can do with neural networks now: 
* cloning
* getting / setting parameters as dictionaries 
* use `grid_search`, play with sizes of hidden layers and other parameters
* build pipelines (`sklearn.pipeline`)
* use hierarchical training, training on subsets
* passing over internet / train classifiers on other machines / distributed learning of ensemles


And you can replace classifiers at any moment.