## Introduction to Ensemble Method
* Objective of ensemble methods is to combine the predictions of serveral base estimators ( Linear Regression, Decisison Tree, etc. ) to create a combined effect or more genralized model.
* Two types of Ensemble Method
    * Averaging Method : Build several estimators independently & average their predictions. Examples are RandomForest etc.
    * Boosting Method : Base estimators are built sequentially using weighted version of data .i.e fitting models with data that were mis-classified. Examples are AdaBoost

<img src='https://github.com/taruntiwarihp/raw_images/blob/master/ada.png?raw=true'>

### RandomForest
*  Limitations of decison tree is that it overfits & shows high variance.
* RandomForest is an averaging ensemble method whose prediction is function of prediction of 'n' decision trees.

<img src='https://github.com/taruntiwarihp/raw_images/blob/master/rf.png?raw=true'>

### RandomForest Algorithm
* Data consist of R rows & M features.
* Sample of training data is taken.
* Random set of features are selected.
* As many as configured number of trees are created using above two steps.
* Final prediction in case of classification is majority prediction.
* Final prediction in case of regression is mean/median of individual tree prediction

### Objectives
1. Understanding Ensemble Methods
2. Bagging vs Boosting
3. Recap of RandomForest 
4. AdaBoost
5. VotingClassifier
6. GBT

### Ensemble Methods
* Base Estimators - Simple estimators like Linear Regression, Decision Trees, NearestNeighbours, Naive Bayes
* Ensemble Methods - They combine same or different base estimators & create predictors
* Decision Tree was Base estimator & RandomForest is Ensemble Methods
* Ensemble methods results into more robust models

### Types of Ensemble Methods
* Bagging - Build several estimators independently & average their predictions. Example -RandomForest.
* Boosting - Each data has an important information which is the weightage of data.
           - Initially, all the data is of same weightage.
           - Weightage of data tells how important is to classify the data correctly.
           - Training the model with heigher weightage of previously trained model for misclassifier data
           - For prediction all the weak classifiers are consulted

In [92]:
from sklearn.ensemble import AdaBoostClassifier,GradientBoostingClassifier,VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler

### AdaBoost
* Boosting in general is about building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.
* AdaBoost was first boosting algorithm.
* AdaBoost can be used for both classification & regression

#### Algorithm
* Core concept of adaboost is to fit weak learners ( like decision tree ) sequantially on repeatedly modifying data.
* Initially, each data is assigned equal weights.
* A base estimator is fitted with this data.
* Weights of misclassified data are increased & weights of correctly classified data is decreased.
* Repeat the above two steps till all data are correctly classified or max number of iterations configured.
* Making Prediction : The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction.

In [2]:
# using decision tree
adaboost = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100)

In [4]:
digits = load_digits()

In [19]:
digits.data

array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ..., 10.,  0.,  0.],
       [ 0.,  0.,  0., ..., 16.,  9.,  0.],
       ...,
       [ 0.,  0.,  1., ...,  6.,  0.,  0.],
       [ 0.,  0.,  2., ..., 12.,  0.,  0.],
       [ 0.,  0., 10., ..., 12.,  1.,  0.]])

In [18]:
digits.data[5][3]

10.0

In [15]:
digits.target

2

### Split the data

In [34]:
trainX, testX, trainY, testY = train_test_split(digits.data, digits.target)

In [35]:
adaboost.fit(trainX, trainY)

AdaBoostClassifier(base_estimator=RandomForestClassifier(), n_estimators=100)

In [36]:
adaboost.score(testX, testY)

0.9733333333333334

In [37]:
# using RandomForst
adaboost = AdaBoostClassifier(base_estimator= RandomForestClassifier(n_estimators=100),n_estimators=100)

In [38]:
adaboost.fit(trainX, trainY)

AdaBoostClassifier(base_estimator=RandomForestClassifier(), n_estimators=100)

In [39]:
adaboost.score(testX, testY)

0.9733333333333334

In [40]:
# using simple model
rf = RandomForestClassifier(n_estimators=100) 

In [41]:
rf.fit(trainX, trainY)

RandomForestClassifier()

In [42]:
rf.score(testX, testY)

0.9733333333333334

In [43]:
# using Logistic regression 
adaboost = AdaBoostClassifier(n_estimators=10, base_estimator=LogisticRegression())

In [44]:
adaboost.fit(trainX, trainY)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

AdaBoostClassifier(base_estimator=LogisticRegression(), n_estimators=10)

In [45]:
adaboost.score(testX,testY) 

0.9355555555555556

In [46]:
# using simple model
lr = LogisticRegression()

In [47]:
lr.fit(trainX, trainY)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression()

In [48]:
lr.score(testX, testY)

0.9577777777777777

### GradientBoostingClassifier
* A machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.
* One of the very basic assumption of linear regression is that it's sum of residuals is 0.
* These residuals as mistakes committed by our predictor model.
* Although, tree based models are not based on any of such assumptions, but if sum of residuals is not 0, then most probably there is some pattern in the residuals of our model which can be leveraged to make our model better.
* So, the intuition behind gradient boosting algorithm is to leverage the pattern in residuals and strenghten a weak prediction model, until our residuals don't show any pattern.
* Algorithmically, we are minimizing our loss function, such that test loss reach it’s minima.

In [52]:
gbt = GradientBoostingClassifier(n_estimators=100)

In [53]:
gbt.fit(trainX, trainY)

GradientBoostingClassifier()

In [54]:
gbt.score(testX, testY)

0.9555555555555556

### VotingClassifier
* As of now the same base estimator is used 
* How about if we want to combine different type of base estimators
* Hard Voting - Same weitage for different algorithms
* Soft Voting - Different weightage for different algorithms
* How to fig out the best combination of weightage

In [75]:
estimators = [ 
    ('rf',RandomForestClassifier(n_estimators=20)),
    ('svc',SVC(kernel='rbf', probability=True)),
    ('knc',KNeighborsClassifier()),
    ('abc',AdaBoostClassifier(base_estimator=DecisionTreeClassifier() ,n_estimators=20)),
    ('lr',LogisticRegression()) 
]

In [76]:
vc = VotingClassifier(estimators=estimators, voting='hard')

In [77]:
vc.fit(trainX,trainY)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


VotingClassifier(estimators=[('rf', RandomForestClassifier(n_estimators=20)),
                             ('svc', SVC(probability=True)),
                             ('knc', KNeighborsClassifier()),
                             ('abc',
                              AdaBoostClassifier(base_estimator=DecisionTreeClassifier(),
                                                 n_estimators=20)),
                             ('lr', LogisticRegression())])

In [78]:
vc.estimators_

[RandomForestClassifier(n_estimators=20),
 SVC(probability=True),
 KNeighborsClassifier(),
 AdaBoostClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=20),
 LogisticRegression()]

In [79]:
vc.estimators

[('rf', RandomForestClassifier(n_estimators=20)),
 ('svc', SVC(probability=True)),
 ('knc', KNeighborsClassifier()),
 ('abc',
  AdaBoostClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=20)),
 ('lr', LogisticRegression())]

In [80]:
for est, name in zip(vc.estimators_,vc.estimators):
    print(name[0], est.score(testX, testY))

rf 0.9622222222222222
svc 0.9888888888888889
knc 0.9866666666666667
abc 0.8422222222222222
lr 0.9577777777777777


In [81]:
vc.score(testX, testY)

0.9822222222222222

In [82]:
vc.predict(testX[:2])

array([9, 8])

In [84]:
estimators = [ 
    ('rf',RandomForestClassifier(n_estimators=20)),
    ('svc',SVC(kernel='rbf', probability=True)),
    ('knc',KNeighborsClassifier()),
    ('lr',LogisticRegression()) 
]

In [93]:
vc = VotingClassifier(estimators= estimators, voting='soft', weights=[3,5,4,2])

In [94]:
vc.fit(trainX, trainY)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


VotingClassifier(estimators=[('rf', RandomForestClassifier(n_estimators=20)),
                             ('svc', SVC(probability=True)),
                             ('knc', KNeighborsClassifier()),
                             ('lr', LogisticRegression())],
                 voting='soft', weights=[3, 5, 4, 2])

In [95]:
for est, name in zip(vc.estimators_,vc.estimators):
    print(name[0], est.score(testX, testY))

rf 0.9577777777777777
svc 0.9888888888888889
knc 0.9866666666666667
lr 0.9577777777777777


In [96]:
vc.score(testX, testY)

0.9888888888888889

### HyperParameter Tunning

* n_estimators : number of trees to be configured, larger is better but compute cost.
* max_features : maximum number of features to be considered for splitting the node. For classification this equals to sqrt(n_features). And, for regression max_features = n_features.
* n_jobs : Configure as -1 so that we can make use of all cores.

#### Advantages
* Minimal data cleaning or dealing with missing values required.
* Works well with high dimensional datasets
* Minimizes variance even for low variance models
* RandomForest can tell importance of features. We can find important features & use them in model training

In [97]:
gs = GridSearchCV(vc, param_grid={'weights':[[3,4,5,2],[3,5,4,2]]},cv=5)

In [98]:
gs.fit(trainX,trainY)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

GridSearchCV(cv=5,
             estimator=VotingClassifier(estimators=[('rf',
                                                     RandomForestClassifier(n_estimators=20)),
                                                    ('svc',
                                                     SVC(probability=True)),
                                                    ('knc',
                                                     KNeighborsClassifier()),
                                                    ('lr',
                                                     LogisticRegression())],
                                        voting='soft', weights=[3, 5, 4, 2]),
             param_grid={'weights': [[3, 4, 5, 2], [3, 5, 4, 2]]})

In [99]:
gs.best_params_

{'weights': [3, 4, 5, 2]}

In [100]:
gs.best_score_

0.9888558446922758

### Pipeline 

In [103]:
pipeline = make_pipeline(MinMaxScaler(), AdaBoostClassifier(n_estimators=100, base_estimator=DecisionTreeClassifier()))

In [104]:
pipeline.fit(trainX, trainY)

Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                ('adaboostclassifier',
                 AdaBoostClassifier(base_estimator=DecisionTreeClassifier(),
                                    n_estimators=100))])

In [105]:
gs = GridSearchCV(pipeline, param_grid = {'adaboostclassifier__base_estimator':[DecisionTreeClassifier(), LogisticRegression()]}, cv=5) 

In [106]:
gs.fit(trainX, trainY)

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
                                       ('adaboostclassifier',
                                        AdaBoostClassifier(base_estimator=DecisionTreeClassifier(),
                                                           n_estimators=100))]),
             param_grid={'adaboostclassifier__base_estimator': [DecisionTreeClassifier(),
                                                                LogisticRegression()]})

In [107]:
gs.best_params_

{'adaboostclassifier__base_estimator': LogisticRegression()}

In [108]:
gs.best_score_

0.9139033457249071