# Stacking Classifier
A Stacking Classifier is an ensemble learning technique that combines multiple classifiers (base estimators) by training a meta-classifier to make final predictions based on the outputs of the base classifiers. It leverages the strengths of multiple models to improve overall performance.

## Advantages:
- Improved Accuracy: Combines multiple models to boost performance.
- Flexibility: Allows for using a variety of models as base learners.
- Versatility: Can handle both regression and classification tasks.
- Reduction in Overfitting: By combining multiple models, it can reduce overfitting compared to single models.

## Disadvantages:
- Complexity: Requires careful tuning of multiple models and their interactions.
- Computationally Intensive: Training multiple models and a meta-classifier can be resource-intensive.
- Data Splitting: Needs careful handling of training and validation data to avoid data leakage.

## Use Case:
- Image Classification: Combining models for more robust image recognition.
- Credit Scoring: Using multiple models to predict credit risk.
- Medical Diagnosis: Combining different diagnostic models for more accurate predictions.
- Customer Churn Prediction: Combining models to predict customer behavior.

## Scaling (not necessary and necessary Depend on the models)
Stacking Classifier itself does not require scaling, but it depends on the base estimator used. For example, SVMs require scaling, while decision trees do not.

## Encoding (necessary)
Categorical data must be encoded into numerical values.

# Import Libraries

In [19]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform
from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler

In [20]:
df = pd.read_csv('Breast_Cancer.csv')
x = df.drop('diagnosis',axis=1)
y = df['diagnosis']

In [21]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Scale data

In [22]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# 1. Stacking with the Default Estimator (Decision Tree)

## Grid Search

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

# Define the base and meta learners
base_learners = [('dt1', DecisionTreeClassifier()), ('dt2', DecisionTreeClassifier())]
meta_learner = LogisticRegression()

# Create the Stacking Regressor with default estimator (DecisionTreeClassifier)
Stacking_clas = StackingClassifier(estimators=base_learners,final_estimator=LogisticRegression())

# Define parameter grid for GridSearchCV
param_grid = {
    'dt1__max_depth': [3, 5, 7],
    'dt2__max_depth': [3, 5, 7],
    'final_estimator__fit_intercept': [True, False]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(Stacking_clas, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(x_train, y_train)

In [None]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

In [None]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [13]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

# Define the base and meta learners
base_learners = [('dt1', DecisionTreeClassifier()), ('dt2', DecisionTreeClassifier())]
meta_learner = LogisticRegression()

# Create the Stacking Regressor with default estimator (DecisionTreeClassifier)
Stacking_clas = StackingClassifier(estimators=base_learners,final_estimator=LogisticRegression())

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'dt1__max_depth': [3, 5, 7],
    'dt2__max_depth': [3, 5, 7],
    'final_estimator__fit_intercept': [True, False]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(Stacking_clas, param_distributions=param_dist, n_iter=10, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(x_train, y_train)

In [14]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 6
Best Hyperparameters: {'final_estimator__fit_intercept': True, 'dt2__max_depth': 7, 'dt1__max_depth': 7}
Best Cross-Validated Score: 0.9318681318681319


In [15]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train StackingClassifier without search

In [16]:
from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

# Define the base and meta learners
base_learners = [('dt1', DecisionTreeClassifier()), ('dt2', DecisionTreeClassifier())]
meta_learner = LogisticRegression()

# Create the Stacking Regressor with default estimator (DecisionTreeClassifier)
Stacking_clas = StackingClassifier(estimators=base_learners,final_estimator=LogisticRegression())

model = StackingClassifier(estimators=base_learners,final_estimator=LogisticRegression(),cv=5)
# model.fit(x_train, y_train)

# 2. Stacking with a Single Estimator (Support Vector Classifier)

## Grid Search

In [25]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression


# Define the base and meta learners
base_learners = [('svc1', SVC(probability=True)), ('svc2', SVC(probability=True))]
meta_learner = LogisticRegression()

# Create the Stacking Regressor
Stacking_clas_svc = StackingClassifier(estimators=base_learners, final_estimator=meta_learner)


# Define parameter grid for GridSearchCV
param_grid = {
    'svc1__C': [0.1, 1, 10],
    'svc1__degree': [0.1, 0.2, 0.5],
    'svc1__kernel': ['linear', 'poly', 'rbf'],
    'svc2__C': [0.1, 1, 10],
    'svc2__degree': [0.1, 0.2, 0.5],
    'svc2__kernel': ['linear', 'poly', 'rbf'],
    'final_estimator__fit_intercept': [True, False]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(Stacking_clas_svc, param_grid, cv=5, n_jobs=-1, error_score='raise')

# Train the grid search
grid_search.fit(x_train, y_train)

InvalidParameterError: The 'degree' parameter of SVC must be an int in the range [0, inf). Got 0.1 instead.

In [None]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

In [None]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [None]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

# Define the base and meta learners
base_learners = [('svc1', SVC(probability=True)), ('svc2', SVC(probability=True))]
meta_learner = LogisticRegression()

# Create the Stacking Regressor
Stacking_clas_svc = StackingClassifier(estimators=base_learners, final_estimator=meta_learner)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'svc1__C': [0.1, 1, 10],
    'svc1__degree': [0.1, 0.2, 0.5],
    'svc1__kernel': ['linear', 'poly', 'rbf'],
    'svc2__C': [0.1, 1, 10],
    'svc2__degree': [0.1, 0.2, 0.5],
    'svc2__kernel': ['linear', 'poly', 'rbf'],
    'final_estimator__fit_intercept': [True, False]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(Stacking_clas_svc, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(x_train, y_train)

In [None]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

In [None]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train StackingClassifier without search

In [26]:
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

# Define the base and meta learners
base_learners = [('svc1', SVC(probability=True)), ('svc2', SVC(probability=True))]
meta_learner = LogisticRegression()

model = StackingClassifier(estimators=base_learners, final_estimator=meta_learner,cv=5)
# model.fit(x_train, y_train)

# 3. Stacking with Multiple Estimators (SVC, Decision Tree, GaussianNB)

## Grid Search

In [29]:
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression

# Create the individual regressors without the Pipeline of scaler
svc_pipeline = make_pipeline(StandardScaler(), SVC(probability=True)) 
decision_tree = DecisionTreeClassifier()
gaussian_NB_pipeline = make_pipeline(StandardScaler(),GaussianNB())


# Create the VotingClassifier with the different models
voting_regressor = VotingClassifier(estimators=[
    ('svc', svc_pipeline),
    ('decision_tree', decision_tree),
    ('gaussian_NB', gaussian_NB_pipeline)
])

meta_learner = LogisticRegression()

# Create the Stacking Regressor with VotingClassifier
Stacking_clas_voting = StackingClassifier(estimators=voting_regressor, final_estimator=meta_learner)

# Define parameter grid for GridSearchCV
param_grid = {
    'svc__svc__C': [0.1, 1, 10],
    'svc__svc__gamma': [0.1, 0.2, 0.5],
    'svc__svc__kernel': ['linear', 'poly', 'rbf'],
    'decision_tree__max_depth': [3, 5, 7],
    'gaussian_NB__gaussian_NB__var_smoothing': [1e-9,1e-8],
    'final_estimator__fit_intercept': [True, False]
}

# Initialize GridSearchCV
grid_search_voting = GridSearchCV(Stacking_clas_voting, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search_voting.fit(x_train, y_train)

ValueError: Invalid parameter 'svc' for estimator StackingClassifier(estimators=VotingClassifier(estimators=[('svc',
                                                            Pipeline(steps=[('standardscaler',
                                                                             StandardScaler()),
                                                                            ('svc',
                                                                             SVC(probability=True))])),
                                                           ('decision_tree',
                                                            DecisionTreeClassifier()),
                                                           ('gaussian_NB',
                                                            Pipeline(steps=[('standardscaler',
                                                                             StandardScaler()),
                                                                            ('gaussiannb',
                                                                             GaussianNB())]))]),
                   final_estimator=LogisticRegression()). Valid parameters are: ['cv', 'estimators', 'final_estimator', 'n_jobs', 'passthrough', 'stack_method', 'verbose'].

In [None]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

In [None]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

# Create the individual classifiers
svc = SVC(probability=True)
decision_tree = DecisionTreeClassifier()
gaussian_NB = GaussianNB()

# Create the VotingClassifier with the different models
voting_classifier = VotingClassifier(estimators=[
    ('SVC', svc),
    ('decision_tree', decision_tree),
    ('gaussian_NB', gaussian_NB)
])

# Create the StackingClassifier with VotingClassifier
Stacking_clas_voting = StackingClassifier(estimator=voting_classifier, algorithm='SAMME', random_state=42,final_estimator=LogisticRegression())

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__SVC__C': [0.1, 1, 10, 100],
    'estimator__SVC__gamma': [0.1, 0.2, 0.5, 1.0],
    'estimator__SVC__kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'estimator__decision_tree__max_depth': [None, 10, 20, 30, 40, 50],
    'estimator__decision_tree__min_samples_split': [2, 5, 10, 15],
    'estimator__gaussian_NB__var_smoothing': np.logspace(-9, 0, 10)
}

# Initialize RandomizedSearchCV
random_search_voting = RandomizedSearchCV(Stacking_clas_voting, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search_voting.fit(x_train, y_train)


In [None]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

In [None]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train StackingClassifier without search

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC

# Create the individual regressors without the Pipeline of scaler
svc = SVC(kernel='linear', gamma=1, C=1)
decision_tree = DecisionTreeClassifier(max_depth=5, min_samples_split=2)
gaussian_NB = GaussianNB(var_smoothing=0.001)

# Create the VotingClassifier with the different models
voting_regressor = VotingClassifier(estimators=[
    ('SVC', svc),
    ('decision_tree', decision_tree),
    ('gaussian_NB', gaussian_NB)
])


model = StackingClassifier(estimator=voting_regressor,n_estimators=50,random_state=42,final_estimator=LogisticRegression())
# model.fit(x_train, y_train)

