# Unit 4 Hyperparameter Tuning for Ensembles

Welcome\! Today, we'll explore "Hyperparameter Tuning for Ensembles." It might sound complex, but we'll break it down step by step.

In machine learning, models learn from data and make predictions. **Ensembles** combine the predictions of multiple models to improve accuracy. However, tuning these models' settings (hyperparameters) is key to getting the best performance.

By the end of this lesson, you'll understand:

  * What ensemble methods are.
  * How to apply **GridSearch** to tune hyperparameters for ensemble models, specifically using the **AdaBoost** algorithm with a **DecisionTreeClassifier** as the base estimator.

### Recalling Ensemble Methods

Before diving into hyperparameter tuning, let's recall what **ensemble methods** are.

Ensemble methods use multiple models (base estimators) to make predictions. Think of them as a team of weather forecasters. Each forecaster gives their prediction, and then you combine all their predictions to get a more accurate forecast. Using ensemble methods improve model's performance and add robustness.

One popular ensemble method is **AdaBoost** (Adaptive Boosting), which improves model performance by combining multiple weak classifiers. Each of them focuses on errors of the previous models.

### Setting Up the Dataset

Now, let's get hands-on by setting up our dataset. We'll use the **wine dataset from Scikit-Learn**. This dataset contains information about different types of wines.

We need to split our dataset into training and test sets to train our model and evaluate its performance.

```python
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

# Load the wine dataset
X, y = load_wine(return_X_y=True)
print(X.shape, y.shape)  # Output: (178, 13) (178,)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape)  # Output: (142, 13) (36, 13)
```

### Defining the Parameter Grid

Now, we need to define the hyperparameters we want to tune using **GridSearch**. This is called the **parameter grid**.

For **AdaBoost**, we can tune:

  * `n_estimators`: Number of boosting stages.
  * `learning_rate`: How much each model is influenced by the errors of the previous model.
  * `estimator__max_depth`: The depth of the tree when using a `DecisionTreeClassifier`.

<!-- end list -->

```python
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1],
    'estimator__max_depth': [1, 3, 5]
}
```

This grid helps us test different combinations to find the best ones.

### Initializing the Base Estimator

Next, we need to choose our base estimator. For this lesson, let's use a **DecisionTreeClassifier**.

```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Initialize the base estimator
base_estimator = DecisionTreeClassifier()

# Initialize the AdaBoost classifier using the base estimator
ada_clf = AdaBoostClassifier(estimator=base_estimator)
```

By setting `estimator=base_estimator`, we are telling **AdaBoost** to use the decision tree as the base estimator.

### Performing Grid Search

Now comes the exciting part: performing a **GridSearch** to tune the hyperparameters. We use **GridSearchCV** to search the hyperparameter grid.

**GridSearchCV** helps us find the best set of hyperparameters by systematically testing each combination.

```python
from sklearn.model_selection import GridSearchCV

# Set up GridSearchCV
ada_grid_search = GridSearchCV(ada_clf, param_grid, cv=5)

# Fit the model
ada_grid_search.fit(X_train, y_train)
```

### Interpreting Results

Finally, let's interpret the results to find the best hyperparameters and understand their impact on the model's performance.

```python
print(f"Best parameters for AdaBoost: {ada_grid_search.best_params_}")
print(f"Best cross-validation score for AdaBoost: {ada_grid_search.best_score_}")

# Output:
# Best parameters for AdaBoost: {'estimator__max_depth': 1, 'learning_rate': 0.1, 'n_estimators': 50}
# Best cross-validation score for AdaBoost: 0.9617857142857144
```

This will print the combination of hyperparameters that performed the best during the **GridSearch**.

The `best_params_` helps us understand which combination of hyperparameters gave the best performance. The `best_score_` indicates how well the model performed during cross-validation.

### Final Prediction and Evaluation

Now that we have the best hyperparameters, let's use them to make predictions on our testing set and evaluate the model's performance.

```python
from sklearn.metrics import accuracy_score

# Use the best estimator to make predictions on the test set
best_ada_model = ada_grid_search.best_estimator_
y_pred = best_ada_model.predict(X_test)

# Calculate accuracy
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test set accuracy: {test_accuracy}")

# Output: Test set accuracy: 1.0 (or whatever the accuracy is)
```

This code will help us understand how well our model generalizes to unseen data.

### Lesson Summary

Great job on making it through the lesson\! Today, we learned how to define a parameter grid for an ensemble model, perform hyperparameter tuning using **GridSearchCV**, and evaluate the model on a test set. Hyperparameter tuning is essential to improve the performance of your machine learning models, especially ensemble models like **AdaBoost**.

Now, it's time for you to apply what you've learned. You'll move to the practice section where you'll get hands-on experience with hyperparameter tuning for ensemble models. This practice will solidify your understanding and give you the confidence to use these techniques on your own projects. Good luck\!

## Discover Best Hyperparameters for Wine Classification

Space Explorer, let's boost our wine classification model using GridSearch to find the best hyperparameters! The provided code already completes all the steps, from data loading to hyperparameter tuning for an AdaBoost classifier.

Your task is to run the code and examine the results to understand which hyperparameters were chosen as the best. Pay attention to the output and consider why those hyperparameters might be optimal. How can you use these insights to improve other machine learning models in real-world applications?

```python
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV

# Load the wine dataset and split into training and test sets
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the base estimator and AdaBoost classifier
base_estimator = DecisionTreeClassifier()
ada_clf = AdaBoostClassifier(estimator=base_estimator, algorithm='SAMME')

# Define the parameter grid for GridSearch
param_grid = {'n_estimators': [10, 50], 'learning_rate': [0.01, 0.1], 'estimator__max_depth': [1, 2]}

# Perform GridSearch with cross-validation
grid_search = GridSearchCV(ada_clf, param_grid, cv=3)
grid_search.fit(X_train, y_train)

# Print the best parameters found by GridSearch
print(f"Best parameters for AdaBoost: {grid_search.best_params_}")

```

The GridSearch for the AdaBoost classifier on the wine dataset has identified the following as the best hyperparameters:

* `estimator__max_depth`: 2
* `learning_rate`: 0.1
* `n_estimators`: 10

These parameters suggest that for this specific wine classification task, a relatively simple base estimator (a decision tree with a maximum depth of 2) combined with a moderate learning rate and a smaller number of boosting stages yields optimal performance within the defined search space.

**Why these might be optimal:**

* **`estimator__max_depth`: 2**
    * A shallow tree (depth of 2) acts as a "weak learner," which is characteristic of boosting algorithms like AdaBoost. [cite_start]Complex individual models can lead to overfitting, but AdaBoost thrives on combining many simple, yet effective, learners[cite: 1].
    * This depth likely captures important features without becoming overly specialized to the training data.
* **`learning_rate`: 0.1**
    * A smaller learning rate means each subsequent tree contributes less to the final prediction, allowing the model to learn more slowly and potentially generalize better. This can help prevent rapid overfitting, especially with sequential ensemble methods.
* **`n_estimators`: 10**
    * A relatively small number of estimators indicates that the model converges quickly to a good solution or that adding more estimators beyond this point might lead to diminishing returns or even overfitting for this specific dataset and base estimator complexity.

**How these insights can improve other machine learning models in real-world applications:**

1.  **Starting Point for Tuning:** The optimal hyperparameters found here can serve as a valuable starting point for tuning similar ensemble models on other datasets. While not universally applicable, they provide an educated guess that can save significant computational resources during initial experimentation.
2.  **Understanding Model Complexity:** Observing that a shallow `max_depth` is optimal for the base estimator in AdaBoost reinforces the principle of using weak learners in boosting. This insight encourages prioritizing simpler base models that focus on specific data patterns, with the ensemble handling overall complexity.
3.  **Balancing Bias-Variance Trade-off:** The combination of `learning_rate` and `n_estimators` directly influences the bias-variance trade-off. A lower `learning_rate` often necessitates more `n_estimators` to achieve similar performance, but in this case, a balance was found with a relatively small `n_estimators`. This highlights the importance of exploring these parameters together to prevent underfitting or overfitting.
4.  **Efficiency in Deployment:** Models with fewer `n_estimators` are generally faster to train and predict, which is crucial for real-time applications or environments with limited computational resources. Discovering that a lower number of estimators performs best can guide the development of more efficient models.
5.  **Feature Importance (Indirectly):** While not directly showing feature importance, a shallow tree in the base estimator implies that the model is identifying strong, fundamental relationships in the data early on. This can sometimes hint at the presence of highly discriminative features, prompting further investigation into feature engineering or selection.
6.  **Customizing Grid Search Spaces:** The results help in refining future `param_grid` definitions. If a parameter's optimal value is consistently found at the edge of the defined range (e.g., `max_depth` was 2, which was the maximum in this small grid), it suggests expanding that range in future searches. Conversely, if it's consistently in the middle, the current range might be sufficient.

## Hyperparameter Tuning for Wine Classification

Alright, Space Voyager, let's hone our hyperparameter tuning skills! You have the code to perform GridSearch on the AdaBoost ensemble. Fill in the missing pieces to fully define the parameter grid and run the grid search. You may define the parameter grid any way you like, but make sure to hypertune the decision tree's max depth.

This will help optimize our vineyard model's performance. You've got this!

```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV

# Load the wine dataset
X, y = load_wine(return_X_y=True)

# Split the data for vineyard management
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the base estimator and AdaBoost classifier
base_estimator = DecisionTreeClassifier()
ada_clf = AdaBoostClassifier(estimator=base_estimator, algorithm='SAMME')

# TODO: Define the parameter grid as you find fit

# TODO: Perform the grid search with the ada_clf and param_grid using 5-fold cross-validation

# Output the best parameters found for the vineyard management model
print(f"Best parameters for wine tasting: {ada_grid_search.best_params_}")

```

```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

# Load the wine dataset and split into training and test sets
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the base estimator and AdaBoost classifier
base_estimator = DecisionTreeClassifier()
ada_clf = AdaBoostClassifier(estimator=base_estimator, algorithm='SAMME')

# Define the parameter grid for GridSearch
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1],
    'estimator__max_depth': [1, 3, 5]
}

# Perform GridSearch with cross-validation
ada_grid_search = GridSearchCV(ada_clf, param_grid, cv=5)
ada_grid_search.fit(X_train, y_train)

# Print the best parameters found by GridSearch
print(f"Best parameters for AdaBoost: {ada_grid_search.best_params_}")
print(f"Best cross-validation score for AdaBoost: {ada_grid_search.best_score_}")

# Use the best estimator to make predictions on the test set
best_ada_model = ada_grid_search.best_estimator_
y_pred = best_ada_model.predict(X_test)

# Calculate accuracy
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test set accuracy: {test_accuracy}")
```

## Update AdaBoost

Hey, Space Explorer!

Great job with AdaBoost! Now, let's extend your knowledge further by tuning additional hyperparameters in your AdaBoostClassifier.

Update the parameter grid to include estimator__min_samples_split and estimator__min_samples_leaf. Ensure you properly adjust your GridSearchCV setup accordingly.

Let's code!
```python
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Load the wine dataset
X, y = load_wine(return_X_y=True)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the base estimator
base_estimator = DecisionTreeClassifier()

# Initialize the AdaBoost classifier using the base estimator
ada_clf = AdaBoostClassifier(estimator=base_estimator)

# TODO: Update this parameter grid
param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1],
    'estimator__max_depth': [1, 3]
}

# Set up GridSearchCV
ada_grid_search = GridSearchCV(ada_clf, param_grid, cv=3)

# Fit the model
ada_grid_search.fit(X_train, y_train)

print(f"Best parameters for AdaBoost: {ada_grid_search.best_params_}")

```

Fantastic job, Space Explorer\! You've successfully expanded the hyperparameter grid for your `AdaBoostClassifier` and re-tuned the model. This is a crucial step in finding the optimal configuration for your vineyard management model.

Here's the updated code with the `param_grid` including `estimator__min_samples_split` and `estimator__min_samples_leaf`, and the output of the GridSearch:

```python
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Load the wine dataset
X, y = load_wine(return_X_y=True)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the base estimator
base_estimator = DecisionTreeClassifier()

# Initialize the AdaBoost classifier using the base estimator
# Added algorithm='SAMME' for compatibility with older scikit-learn versions,
# although it's deprecated in newer versions as indicated by the FutureWarning.
ada_clf = AdaBoostClassifier(estimator=base_estimator, algorithm='SAMME')

# Updated parameter grid
param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1],
    'estimator__max_depth': [1, 3],
    'estimator__min_samples_split': [2, 5], # New hyperparameter
    'estimator__min_samples_leaf': [1, 2] # New hyperparameter
}

# Set up GridSearchCV
ada_grid_search = GridSearchCV(ada_clf, param_grid, cv=3)

# Fit the model
ada_grid_search.fit(X_train, y_train)

print(f"Best parameters for AdaBoost: {ada_grid_search.best_params_}")
```

**Output:**

```
Best parameters for AdaBoost: {'estimator__max_depth': 3, 'estimator__min_samples_leaf': 2, 'estimator__min_samples_split': 2, 'learning_rate': 0.01, 'n_estimators': 50}
```

By including `min_samples_split` and `min_samples_leaf`, you're now exploring even more fine-grained control over the individual decision tree base estimators. These parameters help to control the complexity of the trees and prevent overfitting by setting minimum requirements for splitting nodes and creating new leaves.

The best parameters show that:

  * `estimator__max_depth`: `3`
  * `estimator__min_samples_leaf`: `2`
  * `estimator__min_samples_split`: `2`
  * `learning_rate`: `0.01`
  * `n_estimators`: `50`

This more extensive tuning process has potentially identified a more robust set of hyperparameters for classifying wine types\!

We are almost there, Galactic Pioneer! Let's wrap up our lesson with the final challenge. This time, you are given a basic preprocessing code for the famous Titanic dataset. As a reminder, the goal is to predict if a person survived or not.

Your goal is to find the best model and the best parameters to achieve the highest accuracy on this task. Follow the TODO steps!

```python
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

data = sns.load_dataset('titanic')


# Preprocess the data
data['age'] = data['age'].fillna(data['age'].mean())
data['fare'] = data['fare'].fillna(data['fare'].mean())
data = pd.get_dummies(data, columns=['pclass', 'sex', 'embark_town'], drop_first=True)
data = data.drop(columns=['who', 'embarked', 'deck', 'alive', 'class'])

# Split the data
X = data.drop(columns=['survived'])
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Define AdaBoost, Random Forest and Decision Tree classifiers and their parameter grids. You may choose any parameter grid you find fit.

# TODO: Perform GridSearchCV for each model to find the best model

# Evaluate the best model on the test set
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Best model: {best_model}")
print(f"Test set accuracy: {accuracy:.2f}")

```

To achieve the highest accuracy on the Titanic dataset, we'll follow these steps:

1.  **Define Classifiers and Parameter Grids**: We'll set up `AdaBoostClassifier`, `RandomForestClassifier`, and `DecisionTreeClassifier` along with their respective parameter grids. The parameter grids will be designed to explore a reasonable range of hyperparameter values for each model.
2.  **Perform GridSearchCV for Each Model**: We'll use `GridSearchCV` with 5-fold cross-validation for each defined model and its parameter grid. This will systematically search for the best combination of hyperparameters that maximizes the cross-validation score.
3.  **Identify the Best Model**: After fitting all `GridSearchCV` instances, we'll compare their `best_score_` to determine which model achieved the highest performance during cross-validation. The estimator associated with the highest `best_score_` will be selected as the `best_model`.
4.  **Evaluate on Test Set**: Finally, we'll use the `best_model` to make predictions on the `X_test` set and calculate the `accuracy_score` to see how well our chosen model generalizes to unseen data.

Let's implement the solution:

```python
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.svm import SVC # SVC is imported but not used in the solution, as per the original problem statement.
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

data = sns.load_dataset('titanic')

# Preprocess the data
data['age'] = data['age'].fillna(data['age'].mean())
data['fare'] = data['fare'].fillna(data['fare'].mean())
data = pd.get_dummies(data, columns=['pclass', 'sex', 'embark_town'], drop_first=True)
data = data.drop(columns=['who', 'embarked', 'deck', 'alive', 'class'])

# Split the data
X = data.drop(columns=['survived'])
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define AdaBoost, Random Forest and Decision Tree classifiers and their parameter grids.
# AdaBoost Classifier
ada_clf = AdaBoostClassifier(estimator=DecisionTreeClassifier(random_state=42), random_state=42)
ada_param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1, 1.0],
    'estimator__max_depth': [1, 2, 3]
}

# Random Forest Classifier
rf_clf = RandomForestClassifier(random_state=42)
rf_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, None],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

# Decision Tree Classifier
dt_clf = DecisionTreeClassifier(random_state=42)
dt_param_grid = {
    'max_depth': [3, 5, 7, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 5]
}

# Perform GridSearchCV for each model to find the best model
best_score = 0
best_model = None

# Grid Search for AdaBoost
print("Performing GridSearchCV for AdaBoost...")
grid_search_ada = GridSearchCV(ada_clf, ada_param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_ada.fit(X_train, y_train)
print(f"AdaBoost - Best parameters: {grid_search_ada.best_params_}")
print(f"AdaBoost - Best cross-validation accuracy: {grid_search_ada.best_score_:.4f}")

if grid_search_ada.best_score_ > best_score:
    best_score = grid_search_ada.best_score_
    best_model = grid_search_ada.best_estimator_

# Grid Search for Random Forest
print("\nPerforming GridSearchCV for Random Forest...")
grid_search_rf = GridSearchCV(rf_clf, rf_param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_rf.fit(X_train, y_train)
print(f"Random Forest - Best parameters: {grid_search_rf.best_params_}")
print(f"Random Forest - Best cross-validation accuracy: {grid_search_rf.best_score_:.4f}")

if grid_search_rf.best_score_ > best_score:
    best_score = grid_search_rf.best_score_
    best_model = grid_search_rf.best_estimator_

# Grid Search for Decision Tree
print("\nPerforming GridSearchCV for Decision Tree...")
grid_search_dt = GridSearchCV(dt_clf, dt_param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_dt.fit(X_train, y_train)
print(f"Decision Tree - Best parameters: {grid_search_dt.best_params_}")
print(f"Decision Tree - Best cross-validation accuracy: {grid_search_dt.best_score_:.4f}")

if grid_search_dt.best_score_ > best_score:
    best_score = grid_search_dt.best_score_
    best_model = grid_search_dt.best_estimator_

# Evaluate the best model on the test set
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"\nBest model (selected based on cross-validation score): {best_model.__class__.__name__}")
print(f"Test set accuracy of the best model: {accuracy:.2f}")

```