# Model Tuning: A Comprehensive Guide

---

<center><h2>Lesson 05</h2></center>

## Learning Objectives

By the end of this lesson, you will be able to:

1. **Understand the fundamentals** of model tuning and hyperparameter optimization
2. **Implement and tune** Multi-Layer Perceptrons (MLPs) for classification and regression
3. **Apply Random Forest** algorithms with proper hyperparameter tuning strategies
4. **Utilize Support Vector Machines (SVM)** with different kernels and optimization techniques
5. **Deploy XGBoost** models with advanced hyperparameter tuning methods
6. **Interpret model predictions** using SHAP (SHapley Additive exPlanations) values
7. **Compare and select** appropriate models for different types of problems

---

## Table of Contents

1. [Introduction to Model Tuning](#introduction)
2. [Dataset Preparation](#dataset)
3. [Multi-Layer Perceptrons (MLPs)](#mlp)
4. [Random Forest](#random-forest)
5. [Support Vector Machines (SVM)](#svm)
6. [XGBoost](#xgboost)
7. [Model Interpretability with SHAP](#shap)
8. [Model Comparison and Selection](#comparison)
9. [Conclusion and Best Practices](#conclusion)

## 1. Introduction to Model Tuning {#introduction}

### What is Model Tuning?

Model tuning, also known as hyperparameter optimization, is the process of finding the optimal configuration of hyperparameters for a machine learning model to achieve the best performance on your specific dataset.

### Key Concepts:

- **Parameters**: Learned from data during training (e.g., weights in neural networks)
- **Hyperparameters**: Set before training and control the learning process (e.g., learning rate, regularization strength)
- **Cross-validation**: Technique to assess model performance and avoid overfitting
- **Grid Search**: Exhaustive search over specified parameter values
- **Random Search**: Random sampling of hyperparameters
- **Bayesian Optimization**: Smart search using probabilistic models

### Why is Model Tuning Important?

1. **Performance Improvement**: Proper tuning can significantly boost model accuracy
2. **Generalization**: Helps models perform well on unseen data
3. **Efficiency**: Optimizes computational resources
4. **Robustness**: Creates more stable and reliable models

## 2. Dataset Preparation {#dataset}

Let's start by importing necessary libraries and preparing our dataset. We'll use multiple datasets to demonstrate different aspects of each algorithm.

In [None]:
# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_regression, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, mean_squared_error, r2_score

# Model imports
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.svm import SVC, SVR
import xgboost as xgb

# SHAP for model interpretability
import shap

# Bayesian optimization
from skopt import BayesSearchCV
from skopt.space import Real, Integer, Categorical

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
%matplotlib inline

print("All libraries imported successfully!")

In [None]:
# Create synthetic datasets for demonstration
print("Creating datasets...")

# Classification dataset
X_class, y_class = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    n_classes=3,
    random_state=42
)

# Regression dataset
X_reg, y_reg = make_regression(
    n_samples=1000,
    n_features=15,
    noise=0.1,
    random_state=42
)

# Load real datasets
breast_cancer = load_breast_cancer()
X_bc, y_bc = breast_cancer.data, breast_cancer.target

# Split datasets
X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(
    X_class, y_class, test_size=0.2, random_state=42, stratify=y_class
)

X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

X_bc_train, X_bc_test, y_bc_train, y_bc_test = train_test_split(
    X_bc, y_bc, test_size=0.2, random_state=42, stratify=y_bc
)

# Scale features
scaler_class = StandardScaler()
scaler_reg = StandardScaler()
scaler_bc = StandardScaler()

X_class_train_scaled = scaler_class.fit_transform(X_class_train)
X_class_test_scaled = scaler_class.transform(X_class_test)

X_reg_train_scaled = scaler_reg.fit_transform(X_reg_train)
X_reg_test_scaled = scaler_reg.transform(X_reg_test)

X_bc_train_scaled = scaler_bc.fit_transform(X_bc_train)
X_bc_test_scaled = scaler_bc.transform(X_bc_test)

print(f"Classification dataset: {X_class.shape[0]} samples, {X_class.shape[1]} features")
print(f"Regression dataset: {X_reg.shape[0]} samples, {X_reg.shape[1]} features")
print(f"Breast cancer dataset: {X_bc.shape[0]} samples, {X_bc.shape[1]} features")
print("\nDatasets prepared and scaled successfully!")

## 3. Multi-Layer Perceptrons (MLPs) {#mlp}

### Theory

Multi-Layer Perceptrons are feedforward artificial neural networks that consist of:
- **Input layer**: Receives the input features
- **Hidden layers**: Perform transformations using activation functions
- **Output layer**: Produces the final prediction

### Key Hyperparameters:
- `hidden_layer_sizes`: Architecture of hidden layers
- `activation`: Activation function ('relu', 'tanh', 'logistic')
- `solver`: Optimization algorithm ('adam', 'lbfgs', 'sgd')
- `alpha`: L2 regularization parameter
- `learning_rate`: Learning rate schedule
- `max_iter`: Maximum number of iterations

In [None]:
print("=== Multi-Layer Perceptrons (MLPs) ===")
print()

# MLP Classification
print("1. MLP Classification")

# Define hyperparameter grid for MLP Classification
mlp_class_param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (50, 25), (100, 50), (100, 50, 25)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'lbfgs'],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate': ['constant', 'adaptive']
}

# Create MLP classifier
mlp_class = MLPClassifier(max_iter=1000, random_state=42)

# Perform grid search
print("Performing grid search for MLP Classification...")
mlp_class_grid = GridSearchCV(
    mlp_class, 
    mlp_class_param_grid, 
    cv=5, 
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

mlp_class_grid.fit(X_class_train_scaled, y_class_train)

print(f"Best parameters: {mlp_class_grid.best_params_}")
print(f"Best cross-validation score: {mlp_class_grid.best_score_:.4f}")

# Evaluate on test set
mlp_class_pred = mlp_class_grid.predict(X_class_test_scaled)
mlp_class_accuracy = accuracy_score(y_class_test, mlp_class_pred)
print(f"Test accuracy: {mlp_class_accuracy:.4f}")
print()

In [None]:
# MLP Regression
print("2. MLP Regression")

# Define hyperparameter grid for MLP Regression
mlp_reg_param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (50, 25), (100, 50)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'lbfgs'],
    'alpha': [0.0001, 0.001, 0.01]
}

# Create MLP regressor
mlp_reg = MLPRegressor(max_iter=1000, random_state=42)

# Perform grid search
print("Performing grid search for MLP Regression...")
mlp_reg_grid = GridSearchCV(
    mlp_reg, 
    mlp_reg_param_grid, 
    cv=5, 
    scoring='neg_mean_squared_error',
    n_jobs=-1
)

mlp_reg_grid.fit(X_reg_train_scaled, y_reg_train)

print(f"Best parameters: {mlp_reg_grid.best_params_}")
print(f"Best cross-validation score: {mlp_reg_grid.best_score_:.4f}")

# Evaluate on test set
mlp_reg_pred = mlp_reg_grid.predict(X_reg_test_scaled)
mlp_reg_mse = mean_squared_error(y_reg_test, mlp_reg_pred)
mlp_reg_r2 = r2_score(y_reg_test, mlp_reg_pred)
print(f"Test MSE: {mlp_reg_mse:.4f}")
print(f"Test R²: {mlp_reg_r2:.4f}")
print()

## 4. Random Forest {#random-forest}

### Theory

Random Forest is an ensemble method that combines multiple decision trees using:
- **Bootstrap Aggregating (Bagging)**: Each tree is trained on a random subset of data
- **Feature Randomness**: Each split considers only a random subset of features
- **Voting/Averaging**: Final prediction is made by majority vote (classification) or average (regression)

### Key Hyperparameters:
- `n_estimators`: Number of trees in the forest
- `max_depth`: Maximum depth of trees
- `min_samples_split`: Minimum samples required to split a node
- `min_samples_leaf`: Minimum samples required at a leaf node
- `max_features`: Number of features to consider for best split
- `bootstrap`: Whether to use bootstrap sampling

In [None]:
print("=== Random Forest ===")
print()

# Random Forest Classification
print("1. Random Forest Classification")

# Define hyperparameter grid
rf_class_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', None]
}

# Create Random Forest classifier
rf_class = RandomForestClassifier(random_state=42)

# Use RandomizedSearchCV for efficiency
print("Performing randomized search for Random Forest Classification...")
rf_class_random = RandomizedSearchCV(
    rf_class, 
    rf_class_param_grid, 
    n_iter=50,
    cv=5, 
    scoring='accuracy',
    n_jobs=-1,
    random_state=42
)

rf_class_random.fit(X_class_train, y_class_train)

print(f"Best parameters: {rf_class_random.best_params_}")
print(f"Best cross-validation score: {rf_class_random.best_score_:.4f}")

# Evaluate on test set
rf_class_pred = rf_class_random.predict(X_class_test)
rf_class_accuracy = accuracy_score(y_class_test, rf_class_pred)
print(f"Test accuracy: {rf_class_accuracy:.4f}")
print()

In [None]:
# Feature importance visualization
print("Feature Importance Analysis")

# Get feature importances
feature_importance = rf_class_random.best_estimator_.feature_importances_

# Create feature importance plot
plt.figure(figsize=(10, 6))
feature_names = [f'Feature_{i}' for i in range(len(feature_importance))]
indices = np.argsort(feature_importance)[::-1][:10]  # Top 10 features

plt.bar(range(10), feature_importance[indices])
plt.xticks(range(10), [feature_names[i] for i in indices], rotation=45)
plt.title('Top 10 Feature Importances - Random Forest')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.tight_layout()
plt.show()
print()

In [None]:
# Random Forest Regression
print("2. Random Forest Regression")

# Define hyperparameter grid for regression
rf_reg_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Create Random Forest regressor
rf_reg = RandomForestRegressor(random_state=42)

# Perform randomized search
print("Performing randomized search for Random Forest Regression...")
rf_reg_random = RandomizedSearchCV(
    rf_reg, 
    rf_reg_param_grid, 
    n_iter=30,
    cv=5, 
    scoring='neg_mean_squared_error',
    n_jobs=-1,
    random_state=42
)

rf_reg_random.fit(X_reg_train, y_reg_train)

print(f"Best parameters: {rf_reg_random.best_params_}")
print(f"Best cross-validation score: {rf_reg_random.best_score_:.4f}")

# Evaluate on test set
rf_reg_pred = rf_reg_random.predict(X_reg_test)
rf_reg_mse = mean_squared_error(y_reg_test, rf_reg_pred)
rf_reg_r2 = r2_score(y_reg_test, rf_reg_pred)
print(f"Test MSE: {rf_reg_mse:.4f}")
print(f"Test R²: {rf_reg_r2:.4f}")
print()

## 5. Support Vector Machines (SVM) {#svm}

### Theory

Support Vector Machines work by:
- Finding the optimal hyperplane that separates classes with maximum margin
- Using kernel functions to transform data into higher-dimensional spaces
- Focusing on support vectors (data points closest to the decision boundary)

### Key Hyperparameters:
- `C`: Regularization parameter (controls trade-off between smooth decision boundary and classifying training points correctly)
- `kernel`: Kernel function ('linear', 'poly', 'rbf', 'sigmoid')
- `gamma`: Kernel coefficient for 'rbf', 'poly', and 'sigmoid'
- `degree`: Degree of polynomial kernel
- `coef0`: Independent term in kernel function

In [None]:
print("=== Support Vector Machines (SVM) ===")
print()

# SVM Classification
print("1. SVM Classification")

# Define hyperparameter grid for different kernels
svm_class_param_grid = [
    {
        'kernel': ['linear'],
        'C': [0.01, 0.1, 1, 10, 100]
    },
    {
        'kernel': ['rbf'],
        'C': [0.01, 0.1, 1, 10, 100],
        'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1]
    },
    {
        'kernel': ['poly'],
        'C': [0.1, 1, 10],
        'degree': [2, 3, 4],
        'gamma': ['scale', 'auto']
    }
]

# Create SVM classifier
svm_class = SVC(random_state=42)

# Perform grid search
print("Performing grid search for SVM Classification...")
svm_class_grid = GridSearchCV(
    svm_class, 
    svm_class_param_grid, 
    cv=5, 
    scoring='accuracy',
    n_jobs=-1
)

svm_class_grid.fit(X_bc_train_scaled, y_bc_train)

print(f"Best parameters: {svm_class_grid.best_params_}")
print(f"Best cross-validation score: {svm_class_grid.best_score_:.4f}")

# Evaluate on test set
svm_class_pred = svm_class_grid.predict(X_bc_test_scaled)
svm_class_accuracy = accuracy_score(y_bc_test, svm_class_pred)
print(f"Test accuracy: {svm_class_accuracy:.4f}")

# Classification report
print("\nClassification Report:")
print(classification_report(y_bc_test, svm_class_pred, target_names=['Malignant', 'Benign']))
print()

In [None]:
# SVM Regression
print("2. SVM Regression")

# Define hyperparameter grid for SVR
svm_reg_param_grid = [
    {
        'kernel': ['linear'],
        'C': [0.1, 1, 10, 100],
        'epsilon': [0.01, 0.1, 0.2]
    },
    {
        'kernel': ['rbf'],
        'C': [0.1, 1, 10, 100],
        'gamma': ['scale', 'auto', 0.001, 0.01, 0.1],
        'epsilon': [0.01, 0.1, 0.2]
    }
]

# Create SVR
svm_reg = SVR()

# Perform grid search
print("Performing grid search for SVM Regression...")
svm_reg_grid = GridSearchCV(
    svm_reg, 
    svm_reg_param_grid, 
    cv=5, 
    scoring='neg_mean_squared_error',
    n_jobs=-1
)

svm_reg_grid.fit(X_reg_train_scaled, y_reg_train)

print(f"Best parameters: {svm_reg_grid.best_params_}")
print(f"Best cross-validation score: {svm_reg_grid.best_score_:.4f}")

# Evaluate on test set
svm_reg_pred = svm_reg_grid.predict(X_reg_test_scaled)
svm_reg_mse = mean_squared_error(y_reg_test, svm_reg_pred)
svm_reg_r2 = r2_score(y_reg_test, svm_reg_pred)
print(f"Test MSE: {svm_reg_mse:.4f}")
print(f"Test R²: {svm_reg_r2:.4f}")
print()

## 6. XGBoost {#xgboost}

### Theory

XGBoost (eXtreme Gradient Boosting) is an optimized gradient boosting framework that:
- Uses gradient boosting to combine weak learners (decision trees)
- Implements advanced regularization techniques
- Optimizes for speed and performance
- Handles missing values automatically

### Key Hyperparameters:
- `n_estimators`: Number of boosting rounds
- `learning_rate`: Step size shrinkage to prevent overfitting
- `max_depth`: Maximum depth of trees
- `min_child_weight`: Minimum sum of instance weight needed in a child
- `subsample`: Subsample ratio of training instances
- `colsample_bytree`: Subsample ratio of columns when constructing each tree
- `reg_alpha`: L1 regularization term
- `reg_lambda`: L2 regularization term

In [None]:
print("=== XGBoost ===")
print()

# XGBoost Classification
print("1. XGBoost Classification")

# Define hyperparameter space for Bayesian optimization
xgb_class_param_space = {
    'n_estimators': Integer(50, 500),
    'learning_rate': Real(0.01, 0.3, prior='log-uniform'),
    'max_depth': Integer(3, 10),
    'min_child_weight': Integer(1, 10),
    'subsample': Real(0.6, 1.0),
    'colsample_bytree': Real(0.6, 1.0),
    'reg_alpha': Real(0.0, 1.0),
    'reg_lambda': Real(0.0, 1.0)
}

# Create XGBoost classifier
xgb_class = xgb.XGBClassifier(
    random_state=42,
    eval_metric='mlogloss'
)

# Perform Bayesian optimization
print("Performing Bayesian optimization for XGBoost Classification...")
xgb_class_bayes = BayesSearchCV(
    xgb_class,
    xgb_class_param_space,
    n_iter=30,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42
)

xgb_class_bayes.fit(X_class_train, y_class_train)

print(f"Best parameters: {xgb_class_bayes.best_params_}")
print(f"Best cross-validation score: {xgb_class_bayes.best_score_:.4f}")

# Evaluate on test set
xgb_class_pred = xgb_class_bayes.predict(X_class_test)
xgb_class_accuracy = accuracy_score(y_class_test, xgb_class_pred)
print(f"Test accuracy: {xgb_class_accuracy:.4f}")
print()

In [None]:
# Feature importance for XGBoost
print("XGBoost Feature Importance Analysis")

# Get feature importances
xgb_feature_importance = xgb_class_bayes.best_estimator_.feature_importances_

# Create feature importance plot
plt.figure(figsize=(10, 6))
feature_names = [f'Feature_{i}' for i in range(len(xgb_feature_importance))]
indices = np.argsort(xgb_feature_importance)[::-1][:10]

plt.bar(range(10), xgb_feature_importance[indices])
plt.xticks(range(10), [feature_names[i] for i in indices], rotation=45)
plt.title('Top 10 Feature Importances - XGBoost')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.tight_layout()
plt.show()
print()

In [None]:
# XGBoost Regression
print("2. XGBoost Regression")

# Define hyperparameter space for regression
xgb_reg_param_space = {
    'n_estimators': Integer(50, 300),
    'learning_rate': Real(0.01, 0.3, prior='log-uniform'),
    'max_depth': Integer(3, 8),
    'min_child_weight': Integer(1, 6),
    'subsample': Real(0.7, 1.0),
    'colsample_bytree': Real(0.7, 1.0)
}

# Create XGBoost regressor
xgb_reg = xgb.XGBRegressor(random_state=42)

# Perform Bayesian optimization
print("Performing Bayesian optimization for XGBoost Regression...")
xgb_reg_bayes = BayesSearchCV(
    xgb_reg,
    xgb_reg_param_space,
    n_iter=25,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1,
    random_state=42
)

xgb_reg_bayes.fit(X_reg_train, y_reg_train)

print(f"Best parameters: {xgb_reg_bayes.best_params_}")
print(f"Best cross-validation score: {xgb_reg_bayes.best_score_:.4f}")

# Evaluate on test set
xgb_reg_pred = xgb_reg_bayes.predict(X_reg_test)
xgb_reg_mse = mean_squared_error(y_reg_test, xgb_reg_pred)
xgb_reg_r2 = r2_score(y_reg_test, xgb_reg_pred)
print(f"Test MSE: {xgb_reg_mse:.4f}")
print(f"Test R²: {xgb_reg_r2:.4f}")
print()

## 7. Model Interpretability with SHAP {#shap}

### Theory

SHAP (SHapley Additive exPlanations) is a unified framework for explaining machine learning model predictions by:
- Computing Shapley values from cooperative game theory
- Providing consistent and locally accurate explanations
- Showing how each feature contributes to individual predictions
- Offering both global and local interpretability

### Types of SHAP Explanations:
- **Local explanations**: Why did the model make a specific prediction?
- **Global explanations**: How does each feature affect the model overall?
- **Feature interactions**: How do features work together?

In [None]:
print("=== SHAP Model Interpretability ===")
print()

# Initialize SHAP
shap.initjs()

# 1. SHAP for Random Forest (Tree Explainer)
print("1. SHAP Analysis for Random Forest")

# Create Tree Explainer for Random Forest
rf_explainer = shap.TreeExplainer(rf_class_random.best_estimator_)
rf_shap_values = rf_explainer.shap_values(X_class_test)

# Summary plot
plt.figure(figsize=(10, 6))
shap.summary_plot(rf_shap_values[0], X_class_test, feature_names=[f'Feature_{i}' for i in range(X_class_test.shape[1])], show=False)
plt.title('SHAP Summary Plot - Random Forest (Class 0)')
plt.tight_layout()
plt.show()
print()

In [None]:
# 2. SHAP for XGBoost
print("2. SHAP Analysis for XGBoost")

# Create Tree Explainer for XGBoost
xgb_explainer = shap.TreeExplainer(xgb_class_bayes.best_estimator_)
xgb_shap_values = xgb_explainer.shap_values(X_class_test)

# Summary plot
plt.figure(figsize=(10, 6))
shap.summary_plot(xgb_shap_values, X_class_test, feature_names=[f'Feature_{i}' for i in range(X_class_test.shape[1])], show=False)
plt.title('SHAP Summary Plot - XGBoost')
plt.tight_layout()
plt.show()
print()

In [None]:
# 3. SHAP for SVM (using Kernel Explainer)
print("3. SHAP Analysis for SVM")

# Create a smaller sample for SVM SHAP analysis (KernelExplainer is computationally expensive)
sample_size = 50
sample_indices = np.random.choice(X_bc_test_scaled.shape[0], sample_size, replace=False)
X_sample = X_bc_test_scaled[sample_indices]

# Create Kernel Explainer for SVM
svm_explainer = shap.KernelExplainer(
    svm_class_grid.best_estimator_.predict_proba, 
    X_bc_train_scaled[:100]  # Use a small background dataset
)

print("Computing SHAP values for SVM (this may take a moment...)")
svm_shap_values = svm_explainer.shap_values(X_sample[:10])  # Analyze first 10 samples

# Create feature names for breast cancer dataset
feature_names_bc = [f'Feature_{i}' for i in range(X_sample.shape[1])]

# Summary plot
plt.figure(figsize=(10, 6))
shap.summary_plot(svm_shap_values[1], X_sample[:10], feature_names=feature_names_bc, show=False)
plt.title('SHAP Summary Plot - SVM (Benign Class)')
plt.tight_layout()
plt.show()
print()

In [None]:
# 4. Local explanation example
print("4. Local SHAP Explanation Example")

# Select a single instance for detailed explanation
instance_idx = 0
single_instance = X_class_test[instance_idx:instance_idx+1]

# Get SHAP values for single instance
single_shap_values = xgb_explainer.shap_values(single_instance)

# Create waterfall plot
plt.figure(figsize=(10, 6))
shap.waterfall_plot(
    shap.Explanation(
        values=single_shap_values[0], 
        base_values=xgb_explainer.expected_value, 
        data=single_instance[0],
        feature_names=[f'Feature_{i}' for i in range(single_instance.shape[1])]
    ),
    show=False
)
plt.title(f'SHAP Waterfall Plot - XGBoost (Instance {instance_idx})')
plt.tight_layout()
plt.show()

# Print prediction details
prediction = xgb_class_bayes.best_estimator_.predict(single_instance)[0]
probability = xgb_class_bayes.best_estimator_.predict_proba(single_instance)[0]
print(f"Predicted class: {prediction}")
print(f"Class probabilities: {probability}")
print(f"Actual class: {y_class_test[instance_idx]}")
print()

## 8. Model Comparison and Selection {#comparison}

Let's create a comprehensive comparison of all the models we've tuned to help guide model selection decisions.

In [None]:
print("=== Model Comparison and Selection ===")
print()

# Create comparison results
classification_results = {
    'Model': ['MLP', 'Random Forest', 'SVM', 'XGBoost'],
    'Test Accuracy': [mlp_class_accuracy, rf_class_accuracy, svm_class_accuracy, xgb_class_accuracy],
    'CV Score': [mlp_class_grid.best_score_, rf_class_random.best_score_, svm_class_grid.best_score_, xgb_class_bayes.best_score_]
}

# Note: For regression results, we'll use the available models
regression_results = {
    'Model': ['MLP', 'Random Forest', 'SVM', 'XGBoost'],
    'Test MSE': [mlp_reg_mse, rf_reg_mse, svm_reg_mse, xgb_reg_mse],
    'Test R²': [mlp_reg_r2, rf_reg_r2, svm_reg_r2, xgb_reg_r2]
}

# Create DataFrames
class_df = pd.DataFrame(classification_results)
reg_df = pd.DataFrame(regression_results)

print("Classification Results:")
print(class_df.round(4))
print()

print("Regression Results:")
print(reg_df.round(4))
print()

# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Classification accuracy comparison
axes[0, 0].bar(class_df['Model'], class_df['Test Accuracy'])
axes[0, 0].set_title('Classification Test Accuracy')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].tick_params(axis='x', rotation=45)

# Classification CV score comparison
axes[0, 1].bar(class_df['Model'], class_df['CV Score'])
axes[0, 1].set_title('Classification CV Score')
axes[0, 1].set_ylabel('CV Score')
axes[0, 1].tick_params(axis='x', rotation=45)

# Regression MSE comparison
axes[1, 0].bar(reg_df['Model'], reg_df['Test MSE'])
axes[1, 0].set_title('Regression Test MSE')
axes[1, 0].set_ylabel('MSE')
axes[1, 0].tick_params(axis='x', rotation=45)

# Regression R² comparison
axes[1, 1].bar(reg_df['Model'], reg_df['Test R²'])
axes[1, 1].set_title('Regression Test R²')
axes[1, 1].set_ylabel('R²')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()
print()

In [None]:
# Model selection guidance
print("=== Model Selection Guidance ===")
print()

guidance = {
    'Model': ['MLP', 'Random Forest', 'SVM', 'XGBoost'],
    'Best Use Cases': [
        'Complex non-linear patterns, large datasets, feature learning',
        'Interpretable models, handles mixed data types, robust to outliers',
        'High-dimensional data, small datasets, strong theoretical foundation',
        'Structured data, competitions, handles missing values, high performance'
    ],
    'Pros': [
        'Flexible, learns complex patterns, no assumptions about data distribution',
        'Feature importance, handles overfitting well, fast training',
        'Memory efficient, works well with small datasets, strong generalization',
        'High performance, built-in regularization, handles missing values'
    ],
    'Cons': [
        'Black box, requires tuning, sensitive to scaling',
        'Can overfit with noisy data, biased toward categorical features',
        'Slow on large datasets, sensitive to feature scaling, difficult to interpret',
        'Many hyperparameters, can overfit, requires careful tuning'
    ]
}

guidance_df = pd.DataFrame(guidance)
print(guidance_df.to_string(index=False))
print()

## 9. Conclusion and Best Practices {#conclusion}

### Key Takeaways

1. **Model Selection**: No single model works best for all problems. Consider:
   - Dataset size and dimensionality
   - Interpretability requirements
   - Training time constraints
   - Performance requirements

2. **Hyperparameter Tuning Strategies**:
   - **Grid Search**: Exhaustive but computationally expensive
   - **Random Search**: More efficient for high-dimensional spaces
   - **Bayesian Optimization**: Smart search using probabilistic models

3. **Cross-Validation**: Always use cross-validation to:
   - Get reliable performance estimates
   - Avoid overfitting to validation data
   - Compare models fairly

4. **Feature Scaling**: Critical for:
   - Neural networks (MLPs)
   - Support Vector Machines
   - Less important for tree-based methods

5. **Model Interpretability**: Use SHAP for:
   - Understanding model decisions
   - Debugging model behavior
   - Building trust with stakeholders
   - Identifying important features

### Best Practices

1. **Start Simple**: Begin with simpler models before moving to complex ones
2. **Baseline Models**: Always establish a baseline (e.g., majority class, mean prediction)
3. **Data Quality**: Ensure high-quality data through proper preprocessing
4. **Validation Strategy**: Use appropriate validation techniques for your problem type
5. **Performance Metrics**: Choose metrics that align with your business objectives
6. **Computational Resources**: Consider training time and inference speed requirements
7. **Model Monitoring**: Monitor model performance in production and retrain when necessary

### Next Steps

- Experiment with ensemble methods (combining multiple models)
- Explore deep learning frameworks for more complex neural networks
- Learn about automated machine learning (AutoML) tools
- Study domain-specific modeling techniques
- Practice with real-world datasets from your field of interest

## Summary

In this comprehensive lesson, we covered:

- **Model Tuning Fundamentals**: Understanding hyperparameters vs parameters
- **Multi-Layer Perceptrons**: Neural networks for complex pattern recognition
- **Random Forest**: Ensemble method with built-in feature importance
- **Support Vector Machines**: Powerful method for high-dimensional data
- **XGBoost**: State-of-the-art gradient boosting framework
- **SHAP**: Model interpretability and explanation framework
- **Model Comparison**: Systematic approach to model selection

Each algorithm has its strengths and ideal use cases. The key to successful machine learning is understanding when to apply each method and how to tune them effectively for your specific problem.

**Remember**: The best model is not always the most complex one, but the one that generalizes well to new data while meeting your specific requirements for interpretability, speed, and accuracy.