# Lab 11: Advanced Model Evaluation and Hyperparameter Tuning

In this lab, we will explore advanced techniques for evaluating machine learning models and optimizing their performance.

**Learning Objectives:**
- Understand and compute advanced model evaluation metrics beyond accuracy (e.g., precision, recall, F1-score, ROC-AUC).
- Implement k-fold and stratified k-fold cross-validation to ensure robust model evaluation.
- Perform hyperparameter tuning using Grid Search and Random Search.

## Step 1: Import Libraries and Load Dataset

In [1]:
# Import required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data  # Features
y = data.target  # Labels

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training samples: {X_train.shape[0]}, Testing samples: {X_test.shape[0]}")

Training samples: 455, Testing samples: 114


## Step 2: Evaluate Model with Advanced Metrics

In [2]:
# Train a simple Random Forest Classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]  # Probabilities for ROC-AUC

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_proba)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC Score: {roc_auc:.2f}")

# Display a detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.96
Precision: 0.96
Recall: 0.97
F1 Score: 0.97
ROC-AUC Score: 0.99

Classification Report:
               precision    recall  f1-score   support

           0       0.95      0.93      0.94        42
           1       0.96      0.97      0.97        72

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114



## Step 3: Cross-Validation Techniques

In [3]:
# Perform k-fold cross-validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
fold_accuracies = []

for train_index, val_index in kf.split(X_train):
    X_fold_train, X_fold_val = X_train[train_index], X_train[val_index]
    y_fold_train, y_fold_val = y_train[train_index], y_train[val_index]

    model = RandomForestClassifier(random_state=42)
    model.fit(X_fold_train, y_fold_train)
    y_fold_pred = model.predict(X_fold_val)
    fold_accuracies.append(accuracy_score(y_fold_val, y_fold_pred))

print(f"K-Fold Cross-Validation Accuracies: {fold_accuracies}")
print(f"Mean Accuracy: {np.mean(fold_accuracies):.2f}")

K-Fold Cross-Validation Accuracies: [0.9340659340659341, 0.967032967032967, 0.967032967032967, 0.989010989010989, 0.9230769230769231]
Mean Accuracy: 0.96


In [4]:
# Perform stratified k-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_accuracies = []

for train_index, val_index in skf.split(X_train, y_train):
    X_fold_train, X_fold_val = X_train[train_index], X_train[val_index]
    y_fold_train, y_fold_val = y_train[train_index], y_train[val_index]

    model = RandomForestClassifier(random_state=42)
    model.fit(X_fold_train, y_fold_train)
    y_fold_pred = model.predict(X_fold_val)
    stratified_accuracies.append(accuracy_score(y_fold_val, y_fold_pred))

print(f"Stratified K-Fold Accuracies: {stratified_accuracies}")
print(f"Mean Stratified Accuracy: {np.mean(stratified_accuracies):.2f}")

Stratified K-Fold Accuracies: [0.967032967032967, 0.9560439560439561, 0.9340659340659341, 0.967032967032967, 0.989010989010989]
Mean Stratified Accuracy: 0.96


## Step 4: Hyperparameter Tuning

In [5]:
# Perform Grid Search for hyperparameter tuning
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best Parameters from Grid Search:", grid_search.best_params_)
print("Best Cross-Validation Accuracy:", grid_search.best_score_)

Best Parameters from Grid Search: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 200}
Best Cross-Validation Accuracy: 0.9604395604395606


In [6]:
# Perform Random Search for hyperparameter tuning
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10, 15]
}

random_search = RandomizedSearchCV(RandomForestClassifier(random_state=42), param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

print("Best Parameters from Random Search:", random_search.best_params_)
print("Best Cross-Validation Accuracy:", random_search.best_score_)

Best Parameters from Random Search: {'n_estimators': 50, 'min_samples_split': 5, 'max_depth': None}
Best Cross-Validation Accuracy: 0.9582417582417584
