# Model Ensembles

In this notebook we will implement stacking ensembles models for online shoppers dataset.


## The Code / Stacking

Our code is saved in the same folder with this report under descion_tree.py and data set is also at the same location called "online_shoppers_intention.csv". We created a set of definitions to help build our model

In [1]:
from Desicion_Tree import *
from sklearn import model_selection
from sklearn.model_selection import GridSearchCV

In [2]:
data_frame_os = read_data_return_frame("online_shoppers_intention.csv")

In [3]:
x, y, class_names, feature_names = preprocess_df(data_frame_os)

The key idea of stacking is to train multiple classifiers and then stack them together using a meta-learner or a voting mechanism.(Module CEGE0004 Week-06 Lecture Slides)
In this case we will use below classifiers:
   * Decision tree classifier that we defined earlier 
   * Naive Bayes Classifiers
       * Bernoulli
       * Gaussian
       * Multinominal
   *  KNeighborsClassifier
   *  MLPClassifier
    
Let's start with the definition of these classifiers.

In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

In [5]:
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size = 0.25, random_state = 42)

## Desicion Tree

In [6]:
clf = tree.DecisionTreeClassifier(criterion='gini')
params = {'max_depth': [5,6,7,8,None],
          'min_samples_leaf': [5,6,7,8],
          'min_samples_split': [5,6,7,8,9]
         }
grid_search = GridSearchCV(estimator=clf,param_grid=params, scoring='f1')
grid_search.fit(x_train, y_train)
gcv_desc_tr = grid_search.best_estimator_

## Naive Bayes Classifiers

We define a function for different Naive Bayes Classifiers

In [7]:
def create_NB_classifier(x_train, y_train, classifier):
       
    param_grid = [{'alpha': [0.1,0.5, 1.0, 1.5, 5, 10]}]

    if classifier == 'Gaussian':
        return GaussianNB()
    
    elif classifier == "Multinomial":
        classifier = MultinomialNB()
        grid_search = GridSearchCV(classifier, param_grid, cv=5, verbose=2, scoring = 'f1')
        grid_search.fit(x_train, y_train)
        return grid_search.best_estimator_
    
    elif classifier == "Bernoulli":
        classifier = BernoulliNB()        
        grid_search = GridSearchCV(classifier, param_grid, cv=5, verbose=2, scoring = 'f1')
        grid_search.fit(x_train, y_train)
        return grid_search.best_estimator_
    

Now we create each NB classifier

In [8]:
%%capture
gcv_multiN = create_NB_classifier(x_train,y_train,'Multinomial')
gcv_bern = create_NB_classifier(x_train,y_train,'Bernoulli')
clf_gaus = create_NB_classifier(x_train,y_train,'Gaussian');

## KNeighborsClassifier

In [9]:
clf_Knn = KNeighborsClassifier(n_neighbors=1, metric='cosine', weights = 'uniform')

## Neural Networks Classifier 

We use MLP classifier from sklearn since Pytorch can not fit in stacking here.

In [10]:
    mlp = MLPClassifier(max_iter=20)
    param_grid = {'hidden_layer_sizes': [(64,64), (64,16), (50,50), (32,32)],
                  'activation': ["logistic", "relu", "tanh", "identity"]}
    grid_search = GridSearchCV(mlp, param_grid=param_grid,cv=5, verbose=2, scoring = 'f1')
    grid_search.fit(x_train, y_train)
    clf_mlp = grid_search.best_estimator_

Fitting 5 folds for each of 16 candidates, totalling 80 fits




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 64); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 64); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 64); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 64); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 64); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(50, 50); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END ...activation=logistic, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END ...activation=logistic, hidden_layer_sizes=(50, 50); total time=   0.7s




[CV] END ...activation=logistic, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END ...activation=logistic, hidden_layer_sizes=(32, 32); total time=   0.6s




[CV] END ...activation=logistic, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END ...activation=logistic, hidden_layer_sizes=(32, 32); total time=   0.4s




[CV] END ...activation=logistic, hidden_layer_sizes=(32, 32); total time=   0.6s




[CV] END ...activation=logistic, hidden_layer_sizes=(32, 32); total time=   0.6s
[CV] END .......activation=relu, hidden_layer_sizes=(64, 64); total time=   0.4s




[CV] END .......activation=relu, hidden_layer_sizes=(64, 64); total time=   0.6s
[CV] END .......activation=relu, hidden_layer_sizes=(64, 64); total time=   0.4s
[CV] END .......activation=relu, hidden_layer_sizes=(64, 64); total time=   0.5s




[CV] END .......activation=relu, hidden_layer_sizes=(64, 64); total time=   0.6s




[CV] END .......activation=relu, hidden_layer_sizes=(64, 16); total time=   0.4s




[CV] END .......activation=relu, hidden_layer_sizes=(64, 16); total time=   0.6s




[CV] END .......activation=relu, hidden_layer_sizes=(64, 16); total time=   0.6s
[CV] END .......activation=relu, hidden_layer_sizes=(64, 16); total time=   0.5s
[CV] END .......activation=relu, hidden_layer_sizes=(64, 16); total time=   0.4s




[CV] END .......activation=relu, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END .......activation=relu, hidden_layer_sizes=(50, 50); total time=   0.5s
[CV] END .......activation=relu, hidden_layer_sizes=(50, 50); total time=   0.4s
[CV] END .......activation=relu, hidden_layer_sizes=(50, 50); total time=   0.4s
[CV] END .......activation=relu, hidden_layer_sizes=(50, 50); total time=   0.3s




[CV] END .......activation=relu, hidden_layer_sizes=(32, 32); total time=   0.4s




[CV] END .......activation=relu, hidden_layer_sizes=(32, 32); total time=   0.5s
[CV] END .......activation=relu, hidden_layer_sizes=(32, 32); total time=   0.3s




[CV] END .......activation=relu, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END .......activation=relu, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 64); total time=   0.8s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 64); total time=   0.8s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 64); total time=   0.8s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 64); total time=   0.8s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 64); total time=   0.9s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 16); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 16); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 16); total time=   0.6s
[CV] END .......activation=tanh, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END .......activation=tanh, hidden_layer_sizes=(64, 16); total time=   0.5s




[CV] END .......activation=tanh, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(50, 50); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(50, 50); total time=   0.7s




[CV] END .......activation=tanh, hidden_layer_sizes=(50, 50); total time=   0.7s




[CV] END .......activation=tanh, hidden_layer_sizes=(50, 50); total time=   0.7s




[CV] END .......activation=tanh, hidden_layer_sizes=(32, 32); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END .......activation=tanh, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END .......activation=tanh, hidden_layer_sizes=(32, 32); total time=   0.6s




[CV] END .......activation=tanh, hidden_layer_sizes=(32, 32); total time=   0.6s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 64); total time=   0.5s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 64); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 64); total time=   0.4s
[CV] END ...activation=identity, hidden_layer_sizes=(64, 64); total time=   0.2s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 64); total time=   0.5s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 16); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 16); total time=   0.3s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 16); total time=   0.3s
[CV] END ...activation=identity, hidden_layer_sizes=(64, 16); total time=   0.3s




[CV] END ...activation=identity, hidden_layer_sizes=(64, 16); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(50, 50); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(50, 50); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(50, 50); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(50, 50); total time=   0.5s




[CV] END ...activation=identity, hidden_layer_sizes=(50, 50); total time=   0.5s
[CV] END ...activation=identity, hidden_layer_sizes=(32, 32); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(32, 32); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(32, 32); total time=   0.4s




[CV] END ...activation=identity, hidden_layer_sizes=(32, 32); total time=   0.5s




[CV] END ...activation=identity, hidden_layer_sizes=(32, 32); total time=   0.5s




### VotingClassifier

We now define the voting mechanism by using the VotingClassifier of scikit-learn.

In [11]:
from sklearn.ensemble import VotingClassifier

voting_clf = VotingClassifier(
    estimators=[('dtr', gcv_desc_tr), ('nb_MN', gcv_multiN), ('nb_BE', gcv_bern), ('nb_GA', clf_gaus),('MLP', clf_mlp),
                ('Knn', clf_Knn)], voting='hard')

To use this voting classifier we need to provide an id for each classifier and the classifiers themselves, and a voting parameter which can be set to hard or soft. If this is set to ‘hard’, it uses the predicted class labels for a majority rule voting, if this is set to ‘soft’, it predicts the class label based on the argmax of the sums of the predicted probabilities.

We then fit the voting classifier.(Module CEGE0004 Week-06 Lecture Slides)

In [12]:
voting_clf.fit(x_train, y_train)



VotingClassifier(estimators=[('dtr',
                              DecisionTreeClassifier(min_samples_leaf=5,
                                                     min_samples_split=7)),
                             ('nb_MN', MultinomialNB(alpha=0.1)),
                             ('nb_BE', BernoulliNB()), ('nb_GA', GaussianNB()),
                             ('MLP',
                              MLPClassifier(hidden_layer_sizes=(50, 50),
                                            max_iter=20)),
                             ('Knn',
                              KNeighborsClassifier(metric='cosine',
                                                   n_neighbors=1))])

We also need to fit each classifier independently

In [13]:
for clf in (gcv_desc_tr, gcv_multiN, gcv_bern, clf_gaus, clf_Knn):
    clf.fit(x_train, y_train)

### Measures

In [14]:
from sklearn.metrics import accuracy_score

for clf in (gcv_desc_tr, gcv_multiN, gcv_bern, clf_gaus, clf_mlp, clf_Knn, voting_clf):
    print(clf.__class__.__name__)
    y_pred = clf.predict(x_train)
    print('\ttrain:', accuracy_score(y_train, y_pred))
    y_pred = clf.predict(x_test)
    print('\ttest:', accuracy_score(y_test, y_pred))

DecisionTreeClassifier
	train: 0.9087271547528928
	test: 0.7894907557573791
MultinomialNB
	train: 0.7261814642586785
	test: 0.7252675964969186
BernoulliNB
	train: 0.8458959662593274
	test: 0.840739539409666
GaussianNB
	train: 0.7621931437222883
	test: 0.7632176451508271
MLPClassifier
	train: 0.8465448253487617
	test: 0.8410638987998702
KNeighborsClassifier
	train: 0.9998918568184276
	test: 0.7654881608822576
VotingClassifier
	train: 0.8699037525684006
	test: 0.8413882581900746


Stacking Classifier of Voting Classifier reports 83% on Test data which is higher than every other type of classifiers except Bernoulli which is 84%

### References

Lipani, A. (2021) UCL (University College London)Module CEGE0004 Week-05 Practical Material