# Notebook 3 - Ensemble Techniques

In this notebook, I've performend development of stacked and voting classifiers to combine individual model predictions, enhancing model robustness, reliability and predictive strength. Ensemble models are saved for later analysis.

Another reason for having a seperate notebook is to maintain modularity and keeping in mind long execution times and timeout limitations.

In [1]:
!pip install -q catboost

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.7/98.7 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
## Import the necessary libraries required for the task

## Data Manipulation and Visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-whitegrid')
import seaborn as sns

# Turning off warnings
import warnings
warnings.simplefilter('ignore')
## Various libraries for preprocessing, modeling, and evaluation
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.dummy import DummyClassifier
from sklearn.feature_selection import SelectKBest, chi2, f_classif
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from lightgbm import LGBMClassifier, Dataset
from sklearn.metrics import accuracy_score, f1_score, make_scorer
from sklearn.model_selection import KFold, GridSearchCV, cross_val_score, RandomizedSearchCV, StratifiedKFold

## Utils
import os
import time
from joblib import dump, load

# Setting a maximum width for columns display in pandas dataframe
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', None)

  plt.style.use('seaborn-whitegrid')


In [12]:
def holdoutperformance(model_name, model, X_train, y_train, X_hold_out, y_hold_out):

    # Train the model on the training set
    model.fit(X_train, y_train)

    # Predict on the hold-out set
    y_pred = model.predict(X_hold_out)
    y_pred_proba = model.predict_proba(X_hold_out)[:, 1] if hasattr(model, "predict_proba") else None

    # Calculate ROC-AUC if probabilities are available
    roc_auc = roc_auc_score(y_hold_out, y_pred_proba) if y_pred_proba is not None else None

    # Output the performance metrics
    cls_report = classification_report(y_hold_out, y_pred)

    print(f"{model_name} - ROC-AUC on hold-out set: {roc_auc}")
    print(f"{model_name} - Classification Report on hold-out set: \n{cls_report}")

    # Save the model to disk
    model_filename = f'/content/drive/MyDrive/quantspark/models/{model_name}_model.joblib'
    dump(model, model_filename)

    print(f"Optimized model saved to {model_filename}")

    return model

In [None]:
# Read train-validation and holdout datasets
DATASET_READPATH = "/content/drive/MyDrive/quantspark/datasets"
X_train_val = pd.read_csv(os.path.join(DATASET_READPATH,"X_train_val.csv"))
X_hold_out = pd.read_csv(os.path.join(DATASET_READPATH,"X_hold_out.csv"))
y_train_val = pd.read_csv(os.path.join(DATASET_READPATH,"y_train_val.csv"))
y_hold_out = pd.read_csv(os.path.join(DATASET_READPATH,"y_hold_out.csv"))

In [14]:
# Load Optimized models
MODEL_READPATH = "/content/drive/MyDrive/quantspark/models"
CBClf = load(os.path.join(MODEL_READPATH, "CBClf_opt_model.joblib"))
DTClf = load(os.path.join(MODEL_READPATH, "DTClf_opt_model.joblib"))
GBClf = load(os.path.join(MODEL_READPATH, "GBClf_opt_model.joblib"))
RFClf = load(os.path.join(MODEL_READPATH, "RFClf_opt_model.joblib"))

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. I'm using 2 popular ensemble methods, Stacking and Voting:-
1. Stacking Classifier - Stacks multiple models (Gradient Boosting, Random Forest, CatBoost, Decision Tree) and uses their predictions as input for a final estimator ( =Logistic Regression) to make the final prediction.
2. Voting Classifier: Combines predictions from multiple models using majority voting (each model votes for a prediction, and the majority vote is the final prediction).

Both methods aim to improve predictive performance by combining the strengths of individual models.

In [15]:
%%time
stacking_model = StackingClassifier([('gbr', GBClf),
                    ('rf',  RFClf),
                    ('cb',CBClf),
                    ('dt', DTClf)], n_jobs=-1)
holdoutperformance("StackingClf", stacking_model, X_train_val, y_train_val, X_hold_out, y_hold_out)

StackingClf - ROC-AUC on hold-out set: 0.999983471347581
StackingClf - Classification Report on hold-out set: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     19866
           1       0.98      1.00      0.99       134

    accuracy                           1.00     20000
   macro avg       0.99      1.00      0.99     20000
weighted avg       1.00      1.00      1.00     20000

Optimized model saved to /content/drive/MyDrive/quantspark/models/StackingClf_model.joblib
CPU times: user 7.54 s, sys: 1.77 s, total: 9.31 s
Wall time: 11min 27s


In [16]:
%%time
voting_model = VotingClassifier([('gbr', GBClf),
                    ('rf',  RFClf),
                    ('cb',CBClf),
                    ('dt', DTClf)], n_jobs=-1)
holdoutperformance("VotingClf", stacking_model, X_train_val, y_train_val, X_hold_out, y_hold_out)

VotingClf - ROC-AUC on hold-out set: 0.999983471347581
VotingClf - Classification Report on hold-out set: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     19866
           1       0.98      1.00      0.99       134

    accuracy                           1.00     20000
   macro avg       0.99      1.00      0.99     20000
weighted avg       1.00      1.00      1.00     20000

Optimized model saved to /content/drive/MyDrive/quantspark/models/VotingClf_model.joblib
CPU times: user 6.5 s, sys: 1.67 s, total: 8.17 s
Wall time: 11min 14s


- Here, we can observe that the voting and stacking classifiers also yield impressive performance of 0.99 on the holdout set for class 1 (churn). On first glance, this can be indistinguishable from the performance of individual classifiers such as catboost.
- However, on further validation and incorporating model explainability techniques such as calibration in Notebook 5, we come to understand that the ensemble models are more confident in their predictions and their prediction probability scores are more trustworthy.

In the next notebook, we experiment with a neural network approach to predict customer churn.