# Stacking

In this notebook, you will see how to use stacking in a reasonably straightforward manner; stacking can really quickly become quite computationally expensive.

Note that stacking may sometime not help to improve the performances of your model overall, just like in the rest of Data Science / Machine Learning, there isn't the "one magic recipe", just a number of good tools that may or may not work for your problem!

## The data

Our dataset corresponds to a set of attributes for customers of an online retail website and whether the customer returned to the shop after their purchases were recorded. We want to predict if the customer will return or not.

Load the dataset from `data/online_retail.csv`. You will need to set the column `CustomerID` as index. Display the dataset with `.head()`

The column `has_returned` needs to be converted to integer: convert False to 0 and True to 1:

Get X and y

Use `train_test_split` from `sklearn.model_selection` to create X_train, X_test, y_train and y_test:

## Baseline RF classifier

To get a quick baseline, apply a RF classifier with default setting (set `random_state=0` so that everyone gets the same results) and use the `classification_report` to investigate how well the model is doing (don't forget to import the relevant libraries).

In [None]:
# your code here to fit a RF classifier

# your code here to display a report on the model


## Stacking classifiers

Here you are going to use `mlxtend` (which you can install easily via `pip`). 

The key class in the library that you will use is `StackingCVClassifier` from `mlxtend.classifier`. You can look up [the documentation](http://rasbt.github.io/mlxtend/) to review the API but the basic usage is of the form:

```python
stack = StackingCVClassifier(classifiers=[clf1, clf2, clf3],
                             meta_classifier = clfmeta,
                             use_probas=True,
                             use_clones=False
                             cv=5)
```

Where `classifiers` takes a list of SkLearn classifiers, `meta_classifier` is the classifier used at the higher level, use probas allow us to use the probabilities outputed by base classifiers instead the classes, that should provide more granular information for our meta learner to learn from. `use_clones` is set so that the stack uses the instance of the classifiers we provide instead of working on a copy (so that we can easily access individual classifiers). Finally `cv` is the number of cross validation folds to use to train the model.

In this example, consider two base classifiers:

* KNN (with 2 neighbors)
* RF (with 100 estimators)

and as a secondary classifier, consider a decision tree with a depth of 4 (a simple tree should be enough since we only have two features here, the output of the previous classifiers)

Train the stack using the usual `sklearn` way (`mlxtend` implements `fit` and `predict` just like `Sklearn`) and show the classification report, compare with what you had before. 

In [None]:
# import the relevant libraries


In [None]:
# define the classifiers


In [None]:
# fit the stack and show the results
# you might have to pass X and y as numpy array
# as mlxtend can struggle with pandas DataFrame


## Cross validating a stack

`StackingCVClassfier` is built-in with it's own cross validation. Meaning that for a given training set, it will run cross validation under the hood to fit the meta learner on data generated from predictions on a part of the dataset that wasn't used to train the first layer. Hence we have confidence that our model will be properly trained.

Now in order to compare hyperparameters, we still need to run cross-validation on top of it, so we don't choose parameters that simply do well on a specific train set that we have. To do so we can use `GridSearchCV` as we usually do. The key difference is that you have to pay attention to how parameters are named. The convention is:

```
nameofclassifier__parameter
meta-nameofclassifier__parameter
```

For example:

* `kneighborsclassifier__n_neighbors`

You can get the list of names given to your classifiers by checking the attributes `named_meta_classifier` and `named_classifiers` on your stacking object.

The parameters you may want to tune are:

* number of neighbors for the KNN
* max_depth and min_samples_split of random forest
* max_depth of meta classifier

(watch out this may take some time)

In [None]:
# add your code for GCV


What are your best parameters?

Retrain your stack with those parameters. You can use `set_params` to set the parameters of the different models in your stack

Check the classification report:

In [None]:
# add your code here


### Plotting the meta learner (Decision Tree)

In [None]:
from IPython.display import Image  
from sklearn import tree
import pydotplus

dot_data = tree.export_graphviz(stack.meta_classifier, 
                                out_file=None, 
                                filled=True, 
                                rounded=True,
                                proportion=True,
                                special_characters=True,
                               feature_names=["RF_P0", "RF_P1", "KNN_P0", "KNN_P1"])

graph = pydotplus.graph_from_dot_data(dot_data)  
Image(graph.create_png())