# 🎒 Bootstrap Aggregating (Bagging): Ensemble Learning Fundamentals
# ===================================================================

## 📚 **Learning Objectives**
By the end of this comprehensive tutorial, you will:

🎯 **Core Understanding:**
- **Master** the Bootstrap Aggregating (Bagging) concept and methodology
- **Understand** why ensemble methods outperform single models
- **Learn** the mathematical foundations of bootstrap sampling
- **Grasp** the bias-variance trade-off in ensemble learning

🛠️ **Practical Skills:**
- **Implement** bagging from scratch using scikit-learn
- **Create** diverse base learners for optimal ensemble performance
- **Tune** bagging hyperparameters for specific datasets
- **Evaluate** ensemble performance vs individual models

📊 **Advanced Techniques:**
- **Compare** bagging with other ensemble methods
- **Analyze** feature importance in bagged models
- **Handle** overfitting through ensemble diversity
- **Optimize** computational efficiency in ensemble training

---

## 🧠 **What is Bootstrap Aggregating (Bagging)?**

**Bagging** is a powerful ensemble learning technique that combines predictions from multiple models to create a more robust and accurate final prediction.

### 🎯 **The Core Idea:**
> **"If one model is good, many diverse models working together are better!"**

**The Process:**
1. **🎲 Bootstrap Sampling**: Create diverse training sets through random sampling with replacement
2. **🤖 Train Multiple Models**: Train the same algorithm on different bootstrap samples
3. **🗳️ Aggregate Predictions**: Combine predictions through voting (classification) or averaging (regression)
4. **🎯 Final Prediction**: Output the ensemble decision

### 📐 **Mathematical Foundation:**

**Bootstrap Sample Creation:**
- Given dataset D with n samples
- Create bootstrap sample D' by sampling n samples WITH replacement from D
- Each D' is slightly different, creating model diversity

**Prediction Aggregation:**
- **Classification**: Majority voting
  ```
  ŷ = mode(h₁(x), h₂(x), ..., hₘ(x))
  ```
- **Regression**: Average prediction  
  ```
  ŷ = (1/M) × Σᵢ₌₁ᴹ hᵢ(x)
  ```

### 🌟 **Why Bagging Works: The Bias-Variance Trade-off**

**Individual Model Issues:**
- **High Variance**: Small changes in training data cause large changes in model
- **Overfitting**: Model learns noise in specific training set

**Bagging Solution:**
- **Variance Reduction**: Averaging reduces prediction variance
- **Bias Preservation**: Generally maintains the same bias as base learner
- **Improved Generalization**: More stable predictions on unseen data

### 🏆 **Benefits of Bagging:**
1. **📈 Better Accuracy**: Often outperforms single models
2. **🛡️ Overfitting Reduction**: Averaging smooths out individual model quirks
3. **🔧 Model Agnostic**: Works with any base learning algorithm
4. **⚡ Parallelizable**: Models can be trained independently
5. **📊 Uncertainty Estimation**: Variance in predictions indicates confidence

---

## 🎨 **Real-World Analogy**

Think of bagging like **asking multiple experts for advice**:

**Single Expert (Single Model):**
- One doctor diagnoses your symptoms
- Risk: Could miss important details or be biased

**Panel of Experts (Bagging):**
- 10 doctors independently examine you (different bootstrap samples)
- Each sees slightly different information
- Final diagnosis: Majority vote of all doctors
- Result: More reliable, less prone to individual error

---

## 📋 **Chapter Overview**

This notebook will guide you through:

1. **🔧 Environment Setup** - Import libraries and prepare tools
2. **📊 Data Preparation** - Create and explore datasets for bagging
3. **🎲 Bootstrap Sampling** - Understand bootstrap sample creation
4. **🤖 Bagging Implementation** - Build bagging ensembles from scratch
5. **📈 Performance Analysis** - Compare single vs ensemble models
6. **⚙️ Hyperparameter Tuning** - Optimize bagging parameters
7. **🎯 Real-World Applications** - Apply to practical datasets
8. **📚 Advanced Concepts** - Out-of-bag error, feature importance

Let's dive into the world of ensemble learning! 🚀

In [None]:
# 🔧 ENVIRONMENT SETUP: BAGGING TOOLKIT
# =====================================

# Essential libraries for comprehensive bagging analysis
import pandas as pd              # Data manipulation and analysis
import numpy as np               # Numerical computing and array operations
import matplotlib.pyplot as plt  # Static plotting and visualization
import seaborn as sns           # Statistical data visualization with beautiful defaults
from sklearn.datasets import make_classification, load_iris, load_wine
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import BaggingClassifier, BaggingRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')  # Clean output for better learning experience

# Configure matplotlib for beautiful plots
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

print("🔧 BAGGING ENVIRONMENT SETUP COMPLETE!")
print("=" * 50)
print("✅ Data Analysis: pandas, numpy")
print("✅ Visualization: matplotlib, seaborn") 
print("✅ Machine Learning: scikit-learn ensemble methods")
print("✅ Evaluation Metrics: accuracy, classification reports")
print("✅ Data Processing: scaling, train/test split")
print()
print("🎯 Ready to explore Bootstrap Aggregating!")
print("📚 All tools loaded for comprehensive ensemble learning")

# Set random seed for reproducible results
np.random.seed(42)
print(f"\n🎲 Random seed set to 42 for reproducible experiments")

In [2]:
X_test.shape,X_train.shape

((200, 20), (800, 20))

In [3]:
X_train

array([[-0.64628725,  2.56641658, -0.59358153, ...,  0.90827206,
        -1.71695976, -0.38555497],
       [-1.50444431,  0.39575313, -0.83153588, ...,  2.47800544,
         1.29962918, -0.59207261],
       [-0.39151879, -0.09869076,  1.32459307, ...,  0.66345104,
         0.35720182, -0.98291706],
       ...,
       [-0.00383186,  0.09203615, -0.70121505, ..., -2.16249308,
        -0.33068834,  0.07266646],
       [ 2.45888615, -0.35941149,  0.02597242, ..., -0.056518  ,
        -0.59516384, -0.51703825],
       [ 1.84222699, -0.46434485, -0.05381296, ...,  0.80643863,
         0.40912571, -0.19642545]], shape=(800, 20))

In [4]:
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC



In [5]:
nb_clf=GaussianNB()
lr_clf=LogisticRegression()
dt_clf=DecisionTreeClassifier()
svm_clf=SVC(kernel='linear')


In [7]:
#for ensemble
from sklearn.ensemble import VotingClassifier
ensemble_clf=VotingClassifier(estimators=[('decision_tree',dt_clf),('naive_bayes',nb_clf),("log_reg",lr_clf),("svm",svm_clf)], voting="hard")

In [8]:
ensemble_clf

0,1,2
,estimators,"[('decision_tree', ...), ('naive_bayes', ...), ...]"
,voting,'hard'
,weights,
,n_jobs,
,flatten_transform,True
,verbose,False

0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0

0,1,2
,priors,
,var_smoothing,1e-09

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100

0,1,2
,C,1.0
,kernel,'linear'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [9]:
ensemble_clf.fit(X_train,y_train)

0,1,2
,estimators,"[('decision_tree', ...), ('naive_bayes', ...), ...]"
,voting,'hard'
,weights,
,n_jobs,
,flatten_transform,True
,verbose,False

0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0

0,1,2
,priors,
,var_smoothing,1e-09

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100

0,1,2
,C,1.0
,kernel,'linear'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [10]:
y_pred=ensemble_clf.predict(X_test)

In [11]:
y_pred

array([0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0,
       1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 1])

In [12]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
accuracy_score(y_test,y_pred)

0.875

In [13]:
classification_report(y_test,y_pred)

'              precision    recall  f1-score   support\n\n           0       0.87      0.86      0.86        90\n           1       0.88      0.89      0.89       110\n\n    accuracy                           0.88       200\n   macro avg       0.87      0.87      0.87       200\nweighted avg       0.87      0.88      0.87       200\n'