## What is Ensemble Learning ?
Ensemble learning is a machine learning technique that aggregates two or more learners (eg. regressor models, neural networks ) in order to produce better predictions.

Why Ensemble learning ?
1. imprved accuracy : Reduccees error by combining predicctins.
2. Reduced Overfitting 
3. robustness : Perform better on noisy or complex data>

Example Project 

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv('datasets/Ensemble.csv', low_memory=False, dtype=str)

# Convert relevant columns to numeric and handle missing
numeric_cols = ['Age', 'DistanceFromHome', 'MonthlyIncome', 'TotalWorkingYears']
for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(df[col].median())

# Handle missing categorical values
df['Attrition'] = df['Attrition'].fillna(df['Attrition'].mode()[0])
df['Attrition'] = df['Attrition'].map({'Voluntary Resignation': 1, 'Current employee': 0})

# Select features
df = df[numeric_cols + ['Attrition']]

# Split data
X = df.drop('Attrition', axis=1)
y = df['Attrition']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [15]:
# Implement Stacking Classifier
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression


In [16]:
base_models = [
    ('rf',RandomForestClassifier(random_state=42)),
    ('gb',GradientBoostingClassifier(random_state=42)),
    ('svc',SVC(probability=True,random_state=42))
]


In [17]:
# meta model 
meta_model = LogisticRegression()

In [19]:
# stacking classifier 
stack_clf = StackingClassifier(estimators=base_models,final_estimator=meta_model)
stack_clf.fit(X_train,y_train)

In [28]:
# Implementing voting classsifier 
from sklearn.ensemble import VotingClassifier
# voting classifier (hard)
vote_clf_hard = VotingClassifier(
    estimators = base_models,
    voting = 'hard'

)

vote_clf_hard.fit(X_train,y_train)
y_pred_vote_hard = vote_clf_hard.predict(X_test)
print(f"Hard Voting accuracy: {accuracy_score(y_test, y_pred_vote_hard)}")

# voting_classifier_soft 
vote_clf_soft = VotingClassifier(
    estimators = base_models,
    voting= 'soft'
)
vote_clf_soft.fit(X_train,y_train)
y_pred_vote_soft = vote_clf_soft.predict(X_test)
print(f"Soft Voting accuracy: {accuracy_score(y_test, y_pred_vote_soft)}")



Hard Voting accuracy: 0.865401023890785
Soft Voting accuracy: 0.8833191126279863
