# Use of the stacking method

We will show the use of the stacking class implemented both for classification. The mecanic is the exact same for regression.

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv("pulsar_stars.csv")
df.head(10)

Unnamed: 0,Mean_ip,Std_ip,Excess_kurtosis_ip,Skewness_ip,Mean_DM,Std_DM,Excess_kurtosis_DM,Skewness_DM,target
0,140.5625,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225,0
1,102.507812,58.88243,0.465318,-0.515088,1.677258,14.860146,10.576487,127.39358,0
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909,0
3,136.75,57.178449,-0.068415,-0.636238,3.642977,20.95928,6.896499,53.593661,0
4,88.726562,40.672225,0.600866,1.123492,1.17893,11.46872,14.269573,252.567306,0
5,93.570312,46.698114,0.531905,0.416721,1.636288,14.545074,10.621748,131.394004,0
6,119.484375,48.765059,0.03146,-0.112168,0.999164,9.279612,19.20623,479.756567,0
7,130.382812,39.844056,-0.158323,0.38954,1.220736,14.378941,13.539456,198.236457,0
8,107.25,52.627078,0.452688,0.170347,2.33194,14.486853,9.001004,107.972506,0
9,107.257812,39.496488,0.465882,1.162877,4.079431,24.980418,7.39708,57.784738,0


This dataset comes from Kaggle, I just changed the column names for better readability. We want to predict the *target* variable based on the rest of the variables. Let's build the train and test :

In [2]:
X = df.drop("target", axis=1)
y = df["target"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

We used the *stratify* argument, because in our case the dataset is imbalanced. The *StackedClassifier* class handle the imbalanced dataset by using the *StratifiedKFold* function from *sklearn.model_selection*. We need to import the class and define the stacked model :

In [3]:
from StackedClassifier import *


from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

models = [RandomForestClassifier(n_estimators=100), 
          XGBClassifier(n_estimators=100),
          LGBMClassifier(n_estimators=100)]
meta_model = LGBMClassifier(n_estimators=200)

model = StackedClassifier(models, meta_model)

We are going to train and then test the stacked model :

In [4]:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_true=y_test, y_pred=y_pred))

[[4022   43]
 [  68  342]]


Also, there is a method called *evaluate* in the *StackedClassifier* which aim is to give information about how each classifier performed :

In [5]:
from sklearn.metrics import f1_score

model.evaluate(X_test, y_test, metric=f1_score)

Model 0, performance : 0.8698
Model 1, performance : 0.8734
Model 2, performance : 0.8709
Stacked model performance : 0.8604
