# Bagging


Bagging is an ensemble method used to reduce the variance of an estimator. Here the objective is to create several subsets of data from training sample chosen randomly with replacement. Each collection of subset data is used to train their decision trees. As a result, we get an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree classifier.

# Bagging Classifier


Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree) by introducing randomization into its construction procedure and then making an ensemble out of it.

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Reading the dataset
df = pd.read_csv('smsspamcollection.txt',sep='\t',header=None,names=['label', 'sms_message'])

# labelling the values
df['label'] = df.label.map({'ham':0, 'spam':1})

# Splitting the dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], 
                                                    df['label'], 
                                                    random_state=1)

# Instantiating the CountVectorizer method
count_vector = CountVectorizer()

# Fitting the training data and returning the matrix.
training_data = count_vector.fit_transform(X_train)

# Transforming testing data and returning the matrix.
testing_data = count_vector.transform(X_test)

In [5]:
# Importing the Bagging Classifier
from sklearn.ensemble import BaggingClassifier

# Instantiating a BaggingClassifier with 210 weak learners.
bag_mod = BaggingClassifier(n_estimators=210)


# Fitting your BaggingClassifier to the training data
bag_mod.fit(training_data, y_train)


# Predicting using BaggingClassifier on the test data
bag_preds = bag_mod.predict(testing_data) 

In [6]:
# Scoring the model
print('Accuracy score: ', format(accuracy_score(y_test, bag_preds)))
print('Precision score: ', format(precision_score(y_test, bag_preds)))
print('Recall score: ', format(recall_score(y_test, bag_preds)))
print('F1 score: ', format(f1_score(y_test, bag_preds)))

Accuracy score:  0.9741564967695621
Precision score:  0.9116022099447514
Recall score:  0.8918918918918919
F1 score:  0.9016393442622951
