<a href="https://colab.research.google.com/github/tgarg535/Machine-Learning/blob/main/BOOSTING.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Theoretical Questions**

### **1. What is Boosting in Machine Learning?**

Boosting is an **ensemble meta-algorithm** that converts a set of "weak learners" into a "strong learner." Unlike Bagging (where models are built in parallel), Boosting builds models **sequentially**. Each new model attempts to correct the errors made by the previous models.

### **2. How does Boosting differ from Bagging?**

| Feature | Bagging (e.g., Random Forest) | Boosting (e.g., AdaBoost, XGBoost) |
| --- | --- | --- |
| **Model Building** | Parallel (independent models). | Sequential (dependent models). |
| **Goal** | Primarily to **reduce variance** (overfitting). | Primarily to **reduce bias** (underfitting). |
| **Data Sampling** | Uses Bootstrapping (random sampling). | Focuses on misclassified samples from previous rounds. |

### **3. What is the key idea behind AdaBoost?**

AdaBoost (Adaptive Boosting) works by **reweighting**. It assigns higher weights to data points that were misclassified by the previous weak learner. The subsequent learner is forced to focus more on these "difficult" cases.

### **4. Explain the working of AdaBoost with an example.**

Imagine a binary classification task (Circles vs. Squares):

1. **Round 1:** A simple "stump" (a one-level decision tree) splits the data. It gets 3 points wrong.
2. **Reweight:** Those 3 wrong points are given a higher weight (they become "heavier").
3. **Round 2:** The next stump tries to split the data but prioritizes getting those 3 heavy points right.
4. **Final Vote:** All stumps are combined. Stumps with higher accuracy are given more "say" in the final prediction.

### **5. What is Gradient Boosting vs. AdaBoost?**

While AdaBoost uses weighted data points, **Gradient Boosting** uses the **Residuals** (errors). It fits the new model to the negative gradient of the loss function. Essentially, each new model tries to predict the *error* of the combined ensemble so far.

### **6. What is the loss function in Gradient Boosting?**

The loss function measures how far the model's predictions are from the actual values.

* **Regression:** Usually Mean Squared Error (MSE) or Mean Absolute Error (MAE).
* **Classification:** Usually Log-Loss (Cross-Entropy).

### **7. How does XGBoost improve over traditional Gradient Boosting?**

XGBoost (Extreme Gradient Boosting) introduces several optimizations:

* **Regularization:** Includes L1 and L2 regularization to prevent overfitting.
* **Parallel Processing:** Uses a block structure to parallelize tree construction.
* **Tree Pruning:** Uses a "depth-first" approach and prunes trees backward.
* **Missing Values:** Automatically learns the best direction to handle missing data.

### **8. What is the difference between XGBoost and CatBoost?**

* **XGBoost:** General purpose, very fast, requires manual encoding of categorical variables (like One-Hot Encoding).
* **CatBoost:** Developed by Yandex, it is optimized for **categorical data**. It handles categories internally using "Ordered Boosting" to prevent data leakage and handles high-cardinality features efficiently.

---

## **Practical Implementation (Python)**

### **9. AdaBoost Classifier (Question 30)**

```python
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
clf = AdaBoostClassifier(n_estimators=50, learning_rate=1.0)
clf.fit(X, y)
print(f"AdaBoost Accuracy: {accuracy_score(y, clf.predict(X)):.2f}")

```

### **10. Gradient Boosting Feature Importance (Question 32)**

Feature importance tells you which variables contributed most to the model's decisions.

```python
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
gbc = GradientBoostingClassifier().fit(data.data, data.target)

# Visualize Feature Importance
import matplotlib.pyplot as plt
plt.barh(data.feature_names[:10], gbc.feature_importances_[:10])
plt.title("Top 10 Feature Importances")
plt.show()

```

### **11. XGBoost vs. Gradient Boosting Accuracy (Question 34)**

```python
from xgboost import XGBClassifier
from sklearn.ensemble import GradientBoostingClassifier

# Assuming X_train, y_train are defined
xgb = XGBClassifier().fit(X_train, y_train)
gbc = GradientBoostingClassifier().fit(X_train, y_train)

print(f"XGBoost: {xgb.score(X_test, y_test)}")
print(f"GradBoost: {gbc.score(X_test, y_test)}")

```

### **12. XGBoost Learning Rate Tuning (Question 41)**

The learning rate (or "shrinkage") scales the contribution of each tree.

```python
from sklearn.model_selection import GridSearchCV
from xgboost import XGBRegressor

params = {'learning_rate': [0.01, 0.1, 0.2, 0.3]}
grid = GridSearchCV(XGBRegressor(), params, cv=3)
grid.fit(X, y)
print(f"Best Learning Rate: {grid.best_params_}")

```

### **13. CatBoost with Class Weighting (Question 42)**

For imbalanced datasets, `auto_class_weights` helps the model not ignore the minority class.

```python
from catboost import CatBoostClassifier

# 'Balanced' calculates weights inversely proportional to class frequencies
model = CatBoostClassifier(auto_class_weights='Balanced', verbose=0)
model.fit(X_train, y_train)

```



# **Practical Questions**

#### **14. Train an AdaBoost Classifier and print accuracy**

```python
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = AdaBoostClassifier(n_estimators=50, random_state=42)
model.fit(X_train, y_train)
print(f"Accuracy: {accuracy_score(y_test, model.predict(X_test)):.2f}")

```

#### **15. AdaBoost Regressor: Mean Absolute Error (MAE)**

```python
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_absolute_error

# Using synthetic data
model_reg = AdaBoostRegressor(n_estimators=50, random_state=42)
model_reg.fit(X_train, y_train) # Using iris for demo
print(f"MAE: {mean_absolute_error(y_test, model_reg.predict(X_test)):.2f}")

```

#### **16. Gradient Boosting Classifier: Feature Importance**

Gradient Boosting allows you to see which features were most influential in reducing the loss function.

```python
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
gbc = GradientBoostingClassifier(random_state=42).fit(X_train, y_train)
print("Feature Importances:", gbc.feature_importances_)

```

#### **17. Gradient Boosting Regressor: R-Squared Score**

```python
from sklearn.ensemble import GradientBoostingRegressor
gbr = GradientBoostingRegressor().fit(X_train, y_train)
print(f"R-Squared: {gbr.score(X_test, y_test):.2f}")

```

#### **18. XGBoost vs. Gradient Boosting Accuracy**

```python
from xgboost import XGBClassifier
from sklearn.ensemble import GradientBoostingClassifier

xgb = XGBClassifier().fit(X_train, y_train)
gbc = GradientBoostingClassifier().fit(X_train, y_train)
print(f"XGB Score: {xgb.score(X_test, y_test)}, GBC Score: {gbc.score(X_test, y_test)}")

```

#### **19. CatBoost Classifier: F1-Score**

```python
from catboost import CatBoostClassifier
from sklearn.metrics import f1_score

cat = CatBoostClassifier(verbose=0).fit(X_train, y_train)
print(f"F1-Score: {f1_score(y_test, cat.predict(X_test), average='macro'):.2f}")

```

#### **20. XGBoost Regressor: Mean Squared Error (MSE)**

```python
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

xgbr = XGBRegressor().fit(X_train, y_train)
print(f"MSE: {mean_squared_error(y_test, xgbr.predict(X_test)):.2f}")

```

#### **21. AdaBoost Classifier: Visualize Feature Importance**

```python
import matplotlib.pyplot as plt
plt.bar(range(len(model.feature_importances_)), model.feature_importances_)
plt.title("AdaBoost Feature Importance")
plt.show()

```

#### **22. Gradient Boosting: Learning Curves**

Learning curves help detect if the model is overfitting or underfitting as the number of iterations increases.

#### **23. XGBoost: Visualize Feature Importance**

```python
from xgboost import plot_importance
plot_importance(xgb)
plt.show()

```

#### **24. CatBoost: Confusion Matrix**

```python
from sklearn.metrics import confusion_matrix
import seaborn as sns

cm = confusion_matrix(y_test, cat.predict(X_test))
sns.heatmap(cm, annot=True)
plt.show()

```

#### **25. AdaBoost: Number of Estimators vs. Accuracy**

Vary `n_estimators` (e.g., 10, 50, 100, 500) and plot accuracy to find the point of diminishing returns.

#### **26. Gradient Boosting: ROC Curve**

```python
from sklearn.metrics import plot_roc_curve
# Note: In newer scikit-learn versions, use RocCurveDisplay

```

#### **27. XGBoost Regressor: Learning Rate Tuning (GridSearchCV)**

```python
from sklearn.model_selection import GridSearchCV
params = {'learning_rate': [0.01, 0.1, 0.3], 'n_estimators': [50, 100]}
grid = GridSearchCV(XGBRegressor(), params).fit(X_train, y_train)
print("Best Params:", grid.best_params_)

```

#### **28. CatBoost: Class Weighting for Imbalance**

Use `class_weights=[1, 5]` (where 5 is the weight for the minority class) to penalize the model more for missing minority samples.

#### **29. AdaBoost: Effect of Learning Rates**

Compare performance with `learning_rate=0.01` vs. `learning_rate=1.0`. A lower rate usually requires more estimators.

#### **30. XGBoost: Multi-class Log-Loss**

```python
from sklearn.metrics import log_loss
probs = xgb.predict_proba(X_test)
print(f"Log-Loss: {log_loss(y_test, probs):.4f}")

```

---
