## Code 

Let’s see how we can go about implementing AdaBoost in Python. To start, we import the following libraries.

In [1]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder

In this example, we’ll use AdaBoost to classify tumors as being malignant or benign. We use the scikit-learn API to import the dataset into our program.

In [2]:
breast_cancer = load_breast_cancer()
X = pd.DataFrame(breast_cancer.data, columns=breast_cancer.feature_names)
y = pd.Categorical.from_codes(breast_cancer.target, breast_cancer.target_names)

Whenever we are working with categorical feature, we must encode it as numbers. For this problem, we’ll set malignant to 1 and benign to 0.

In [3]:
encoder = LabelEncoder()
binary_encoded_y = pd.Series(encoder.fit_transform(y))

We split our data into training and test sets to evaluate our model’s performance.

In [4]:
train_X, test_X, train_y, test_y = train_test_split(X, binary_encoded_y, random_state=1)

Next, we construct and fit our model to the training set. max_depth=1 is used to tell our model that we’d like our forest to be composed of trees with a single decision node and two leaves. n_estimators is used to specify the total number of trees in the forest.

In [6]:
classifier = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1),
    n_estimators=200
)
classifier.fit(train_X, train_y)

AdaBoostClassifier(algorithm='SAMME.R',
                   base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                         class_weight=None,
                                                         criterion='gini',
                                                         max_depth=1,
                                                         max_features=None,
                                                         max_leaf_nodes=None,
                                                         min_impurity_decrease=0.0,
                                                         min_impurity_split=None,
                                                         min_samples_leaf=1,
                                                         min_samples_split=2,
                                                         min_weight_fraction_leaf=0.0,
                                                         presort='deprecated',
                          

We use our model to predict whether a tumor is malignant or benign given what it has learnt.

In [7]:
predictions = classifier.predict(test_X)

Finally, we evaluate the model using a confusion matrix. The model finished with 2 false positives and 3 false negatives.

In [8]:
confusion_matrix(test_y, predictions)

array([[86,  2],
       [ 3, 52]])