# <font color="orange">AdaBoost Classification (ensemble method)</font>
<ul>
<li><p><strong>AdaBoost or Adaptive Boosting</strong> is one of the ensemble boosting classifier proposed by Yoav Freund and Robert Schapire in 1996.</p>
</li>
<li><p>It combines multiple weak classifiers to increase the accuracy of classifiers.</p>
</li>
<li><p>AdaBoost is an iterative ensemble method. AdaBoost classifier builds a strong classifier by combining multiple poorly performing classifiers so that you will get high accuracy strong classifier.</p>
</li>
<li><p>The basic concept behind Adaboost is to set the weights of classifiers and training the data sample in each iteration such that it ensures the accurate predictions of unusual observations.</p>
</li>
<li><p>Any machine learning algorithm can be used as base classifier if it accepts weights on the training set.</p>
</li>
<li><p><strong>AdaBoost</strong> should meet two conditions:</p>
<ol>
<li><p>The classifier should be trained interactively on various weighed training examples.</p>
</li>
<li><p>In each iteration, it tries to provide an excellent fit for these examples by minimizing training error.</p>
</li>
</ol>
</li>
</ul>

<ul>
<li><p>To build a AdaBoost classifier, imagine that as a first base classifier we train a Decision Tree algorithm to make predictions on our training data.</p>
</li>
<li><p>Now, following the methodology of AdaBoost, the weight of the misclassified training instances is increased.</p>
</li>
<li><p>The second classifier is trained and acknowledges the updated weights and it repeats the procedure over and over again.</p>
</li>
<li><p>At the end of every model prediction we end up boosting the weights of the misclassified instances so that the next model does a better job on them, and so on.</p>
</li>
<li><p>AdaBoost adds predictors to the ensemble gradually making it better. The great disadvantage of this algorithm is that the model cannot be parallelized since each predictor can only be trained after the previous one has been trained and evaluated.</p>
</li>
<li><p>Below are the steps for performing the AdaBoost algorithm:</p>
<ol>
<li><p>Initially, all observations are given equal weights.</p>
</li>
<li><p>A model is built on a subset of data.</p>
</li>
<li><p>Using this model, predictions are made on the whole dataset.</p>
</li>
<li><p>Errors are calculated by comparing the predictions and actual values.</p>
</li>
<li><p>While creating the next model, higher weights are given to the data points which were predicted incorrectly.</p>
</li>
<li><p>Weights can be determined using the error value. For instance,the higher the error the more is the weight assigned to the observation.</p>
</li>
<li><p>This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.</p>
</li>
</ol>
</li>
</ul>


<img src="../../../img/image_3_nwa5zf.webp">


In [10]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as PLT, cm as CMAP
from sklearn import datasets
from sklearn.ensemble import AdaBoostClassifier
from sklearn import metrics
from sklearn.svm import SVC

In [2]:
DT = datasets.load_digits()

In [3]:
X = DT.images.reshape(len(DT.images),-1)
Y = DT.target

In [6]:
random_forest_classifier = AdaBoostClassifier(n_estimators=1000)
random_forest_classifier.fit(X[:1000],Y[:1000])

predicted = random_forest_classifier.predict(X[1000:])
expected = Y[1000:]

report = metrics.classification_report(expected,predicted)
print(report)

              precision    recall  f1-score   support

           0       0.89      0.96      0.93        79
           1       0.00      0.00      0.00        80
           2       0.00      0.00      0.00        77
           3       0.13      0.97      0.22        79
           4       0.00      0.00      0.00        83
           5       0.00      0.00      0.00        82
           6       0.47      0.34      0.39        80
           7       0.00      0.00      0.00        80
           8       0.00      0.00      0.00        76
           9       0.76      0.38      0.51        81

    accuracy                           0.26       797
   macro avg       0.22      0.27      0.20       797
weighted avg       0.22      0.26      0.20       797



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [50]:
random_forest_classifier = AdaBoostClassifier(base_estimator=SVC(probability=True,kernel='linear'),n_estimators=500)
random_forest_classifier.fit(X[:1000],Y[:1000])

AdaBoostClassifier(base_estimator=SVC(kernel='linear', probability=True),
                   n_estimators=500)

In [51]:
predicted = random_forest_classifier.predict(X[1000:])
expected = Y[1000:]

report = metrics.classification_report(expected,predicted)
print(report)

              precision    recall  f1-score   support

           0       1.00      0.99      0.99        79
           1       0.96      0.89      0.92        80
           2       1.00      0.99      0.99        77
           3       0.97      0.84      0.90        79
           4       0.98      0.95      0.96        83
           5       0.92      0.99      0.95        82
           6       0.99      0.99      0.99        80
           7       0.94      0.96      0.95        80
           8       0.89      0.93      0.91        76
           9       0.86      0.95      0.90        81

    accuracy                           0.95       797
   macro avg       0.95      0.95      0.95       797
weighted avg       0.95      0.95      0.95       797

