# 2.4 **Evaluate** and Tune Decision Trees - Predict Student Departure with Optimized Decision Trees

## Model Cycle: The 5 Key Steps

### 1. Build the Model : Create the pipeline with decision tree classifier.  
### 2. Train the Model : Fit the model on the training data.  
### 3. Generate Predictions : Use the trained model to make predictions.  
### **4. Evaluate the Model : Assess performance using evaluation metrics.**  
### **5. Improve the Model : Tune hyperparameters for optimal performance.**

## Introduction

In the previous notebooks, we built and trained decision tree models. Now we complete the ML cycle by evaluating model performance and tuning hyperparameters to optimize results.

Decision trees have several hyperparameters that control model complexity. Finding the right balance is crucial: too simple and the model underfits, too complex and it overfits.

### Learning Objectives

By the end of this notebook, you will be able to:

1. Evaluate decision tree performance using classification metrics
2. Understand the role of key hyperparameters in controlling overfitting
3. Use GridSearchCV to find optimal hyperparameter values
4. Compare decision trees to logistic regression
5. Select and save the best model for deployment

## 1. Load Dependencies and Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
import numpy as np
import pickle
import os

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, cross_val_score, StratifiedKFold
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, ConfusionMatrixDisplay,
    roc_curve, auc, precision_recall_curve, roc_auc_score,
    classification_report
)

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt

pd.options.display.max_columns = None

In [None]:
# Set up file paths
root_filepath = '/content/drive/MyDrive/projects/Applied-Data-Analytics-For-Higher-Education-Course-2/'
data_filepath = f'{root_filepath}data/'
course3_filepath = f'{root_filepath}course_3/'
models_path = f'{course3_filepath}models/'

In [None]:
# Load training and testing data
df_training = pd.read_csv(f'{data_filepath}training.csv')
df_testing = pd.read_csv(f'{data_filepath}testing.csv')

print(f"Training data shape: {df_training.shape}")
print(f"Testing data shape: {df_testing.shape}")
print(f"\nTraining Target distribution:")
print(df_training['SEM_3_STATUS'].value_counts(normalize=True))
print(f"\nTesting Target distribution:")
print(df_testing['SEM_3_STATUS'].value_counts(normalize=True))

In [None]:
# Define feature matrices and targets
X_train = df_training
y_train = df_training['SEM_3_STATUS']

X_test = df_testing
y_test = df_testing['SEM_3_STATUS']

## 2. Load Trained Models

In [None]:
# Load the trained decision tree models
basic_dt_model = pickle.load(open(f'{models_path}basic_decision_tree_trained.pkl', 'rb'))
constrained_dt_model = pickle.load(open(f'{models_path}constrained_decision_tree_trained.pkl', 'rb'))
balanced_dt_model = pickle.load(open(f'{models_path}balanced_decision_tree_trained.pkl', 'rb'))

# Load feature names
feature_names_dict = pickle.load(open(f'{models_path}decision_tree_feature_names.pkl', 'rb'))
feature_names = feature_names_dict['feature_names']

print("Trained models loaded successfully!")

In [None]:
# Store models for comparison
models = {
    'Basic (Unconstrained)': basic_dt_model,
    'Constrained (max_depth=5)': constrained_dt_model,
    'Balanced (class_weight)': balanced_dt_model
}

## 3. Baseline Evaluation

Let's evaluate our trained models on the test set using the metrics we learned in Course 2.

### 3.1 Predictions and Confusion Matrix

In [None]:
# Generate predictions for all models
predictions = {}
probabilities = {}

for name, model in models.items():
    predictions[name] = model.predict(X_test)
    probabilities[name] = model.predict_proba(X_test)[:, 1]  # Probability of 'N' class
    
print("Predictions generated for all models.")

In [None]:
# Plot confusion matrices for all models
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, (name, preds) in enumerate(predictions.items()):
    cm = confusion_matrix(y_test, preds)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['E', 'N'])
    disp.plot(ax=axes[idx], colorbar=False)
    axes[idx].set_title(f'{name}')

plt.suptitle('Confusion Matrices on Test Set', fontsize=14)
plt.tight_layout()
plt.show()

### 3.2 Classification Metrics

In [None]:
# Calculate metrics for all models
def calculate_metrics(y_true, y_pred, y_prob):
    """Calculate classification metrics."""
    return {
        'Accuracy': accuracy_score(y_true, y_pred),
        'Precision (N)': precision_score(y_true, y_pred, pos_label='N'),
        'Recall (N)': recall_score(y_true, y_pred, pos_label='N'),
        'F1-Score (N)': f1_score(y_true, y_pred, pos_label='N'),
        'AUC-ROC': roc_auc_score(y_true, y_prob, labels=['E', 'N'])
    }

# Calculate for all models
metrics_results = []
for name, preds in predictions.items():
    metrics = calculate_metrics(y_test, preds, probabilities[name])
    metrics['Model'] = name
    metrics_results.append(metrics)

metrics_df = pd.DataFrame(metrics_results)
metrics_df = metrics_df[['Model', 'Accuracy', 'Precision (N)', 'Recall (N)', 'F1-Score (N)', 'AUC-ROC']]

# Add null rate for reference
null_rate = y_test.value_counts(normalize=True).max()
print(f"Null Rate (baseline accuracy): {null_rate:.1%}")
print("\nModel Performance on Test Set:")
print(metrics_df.to_string(index=False))

In [None]:
# Visualize metrics comparison
fig = go.Figure()

metrics_to_plot = ['Accuracy', 'Precision (N)', 'Recall (N)', 'F1-Score (N)', 'AUC-ROC']
colors = ['coral', 'steelblue', 'seagreen']

for i, row in metrics_df.iterrows():
    fig.add_trace(go.Bar(
        name=row['Model'],
        x=metrics_to_plot,
        y=[row[m] for m in metrics_to_plot],
        marker_color=colors[i]
    ))

fig.add_hline(y=null_rate, line_dash="dash", line_color="gray", 
              annotation_text=f"Null Rate: {null_rate:.1%}")

fig.update_layout(
    title='Decision Tree Model Performance Comparison',
    yaxis_title='Score',
    barmode='group',
    height=500,
    yaxis=dict(range=[0, 1])
)

fig.show()

### 3.3 ROC and Precision-Recall Curves

In [None]:
# Plot ROC curves for all models
fig = make_subplots(rows=1, cols=2, subplot_titles=('ROC Curves', 'Precision-Recall Curves'))

colors = ['coral', 'steelblue', 'seagreen']

for idx, (name, probs) in enumerate(probabilities.items()):
    # ROC Curve
    fpr, tpr, _ = roc_curve(y_test, probs, pos_label='N')
    roc_auc = auc(fpr, tpr)
    fig.add_trace(go.Scatter(
        x=fpr, y=tpr, mode='lines',
        name=f'{name} (AUC={roc_auc:.2f})',
        line=dict(color=colors[idx], width=2)
    ), row=1, col=1)
    
    # Precision-Recall Curve
    precision, recall, _ = precision_recall_curve(y_test, probs, pos_label='N')
    pr_auc = auc(recall, precision)
    fig.add_trace(go.Scatter(
        x=recall, y=precision, mode='lines',
        name=f'{name} (AUC={pr_auc:.2f})',
        line=dict(color=colors[idx], width=2),
        showlegend=False
    ), row=1, col=2)

# Add diagonal for ROC
fig.add_trace(go.Scatter(
    x=[0, 1], y=[0, 1], mode='lines',
    line=dict(dash='dash', color='gray'),
    name='Random',
    showlegend=False
), row=1, col=1)

# Add baseline for PR
baseline_precision = (y_test == 'N').mean()
fig.add_trace(go.Scatter(
    x=[0, 1], y=[baseline_precision, baseline_precision], mode='lines',
    line=dict(dash='dash', color='gray'),
    name='Baseline',
    showlegend=False
), row=1, col=2)

fig.update_xaxes(title_text='False Positive Rate', row=1, col=1)
fig.update_yaxes(title_text='True Positive Rate', row=1, col=1)
fig.update_xaxes(title_text='Recall', row=1, col=2)
fig.update_yaxes(title_text='Precision', row=1, col=2)

fig.update_layout(height=450, title_text='ROC and Precision-Recall Curves')
fig.show()

## 4. Hyperparameter Tuning

Now let's tune the decision tree hyperparameters to find the optimal configuration.

### 4.1 Understanding Key Hyperparameters

The main hyperparameters that control decision tree complexity are:

| Parameter | Description | Effect on Model |
|:----------|:------------|:----------------|
| `max_depth` | Maximum depth of tree | Lower = simpler, reduces variance |
| `min_samples_split` | Minimum samples to split a node | Higher = simpler, reduces variance |
| `min_samples_leaf` | Minimum samples in leaf nodes | Higher = simpler, reduces variance |
| `max_leaf_nodes` | Maximum number of leaf nodes | Lower = simpler, reduces variance |
| `class_weight` | Weights for classes | 'balanced' helps with imbalanced data |

In [None]:
# Visualize the effect of max_depth on model performance
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# Create preprocessor (same as in 2.2)
numerical_columns = [
    'HS_GPA', 'GPA_1', 'GPA_2', 'DFW_RATE_1', 'DFW_RATE_2',
    'UNITS_ATTEMPTED_1', 'UNITS_ATTEMPTED_2'
]

categorical_columns = ['GENDER', 'RACE_ETHNICITY', 'FIRST_GEN_STATUS']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', 'passthrough', numerical_columns),
        ('cat', OneHotEncoder(handle_unknown='ignore', 
                              drop=['Female', 'Other', 'Unknown'], 
                              sparse_output=False), categorical_columns)
    ],
    remainder='drop'
)

# Test different max_depth values
depths = [1, 2, 3, 4, 5, 6, 7, 8, 10, 15, None]
train_scores = []
test_scores = []

for depth in depths:
    model = Pipeline([
        ('preprocessing', preprocessor),
        ('classifier', DecisionTreeClassifier(
            max_depth=depth,
            class_weight='balanced',
            random_state=42
        ))
    ])
    
    model.fit(X_train, y_train)
    train_scores.append(f1_score(y_train, model.predict(X_train), pos_label='N'))
    test_scores.append(f1_score(y_test, model.predict(X_test), pos_label='N'))

# Convert None to string for plotting
depth_labels = [str(d) if d is not None else 'None' for d in depths]

In [None]:
# Plot validation curve for max_depth
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=depth_labels, y=train_scores,
    mode='lines+markers',
    name='Training F1',
    line=dict(color='blue', width=2)
))

fig.add_trace(go.Scatter(
    x=depth_labels, y=test_scores,
    mode='lines+markers',
    name='Test F1',
    line=dict(color='orange', width=2)
))

fig.add_annotation(
    x=depth_labels[-1], y=train_scores[-1],
    text='Overfitting Zone',
    showarrow=True,
    arrowhead=2
)

fig.update_layout(
    title='Validation Curve: Effect of max_depth on F1 Score (Class N)',
    xaxis_title='max_depth',
    yaxis_title='F1 Score',
    height=450
)

fig.show()

**Observation**: Notice the gap between training and test performance as depth increases. This is the classic sign of overfitting - the model memorizes training data but fails to generalize.

### 4.2 Grid Search with Cross-Validation

We'll use GridSearchCV to systematically search for the best hyperparameter combination.

In [None]:
# Define the parameter grid
param_grid = {
    'classifier__max_depth': [3, 4, 5, 6, 7, 8],
    'classifier__min_samples_split': [10, 20, 30, 50],
    'classifier__min_samples_leaf': [5, 10, 15, 20],
    'classifier__class_weight': ['balanced', None]
}

print("Parameter Grid:")
for param, values in param_grid.items():
    print(f"  {param}: {values}")

total_combinations = 1
for values in param_grid.values():
    total_combinations *= len(values)
print(f"\nTotal combinations to test: {total_combinations}")

In [None]:
# Create base pipeline for tuning
base_pipeline = Pipeline([
    ('preprocessing', preprocessor),
    ('classifier', DecisionTreeClassifier(random_state=42))
])

# Set up cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Perform grid search
print("Running Grid Search (this may take a moment)...")
grid_search = GridSearchCV(
    base_pipeline,
    param_grid,
    cv=cv,
    scoring='f1',  # Optimize for F1 score of positive class
    n_jobs=-1,
    return_train_score=True
)

# Note: scikit-learn needs binary labels for f1 scoring, let's convert
y_train_binary = (y_train == 'N').astype(int)
y_test_binary = (y_test == 'N').astype(int)

grid_search.fit(X_train, y_train_binary)

print("Grid Search Complete!")

In [None]:
# Display best parameters
print("Best Parameters:")
for param, value in grid_search.best_params_.items():
    print(f"  {param}: {value}")

print(f"\nBest Cross-Validation F1 Score: {grid_search.best_score_:.4f}")

### 4.3 Analyzing Tuning Results

In [None]:
# Convert results to DataFrame
results_df = pd.DataFrame(grid_search.cv_results_)

# Get top 10 configurations
top_10 = results_df.nsmallest(10, 'rank_test_score')[[
    'param_classifier__max_depth',
    'param_classifier__min_samples_split',
    'param_classifier__min_samples_leaf',
    'param_classifier__class_weight',
    'mean_test_score',
    'std_test_score',
    'mean_train_score'
]]

top_10.columns = ['max_depth', 'min_samples_split', 'min_samples_leaf', 
                  'class_weight', 'CV F1 (mean)', 'CV F1 (std)', 'Train F1']

print("Top 10 Hyperparameter Configurations:")
print(top_10.to_string(index=False))

In [None]:
# Visualize hyperparameter effects
fig = make_subplots(rows=2, cols=2, subplot_titles=(
    'Effect of max_depth', 'Effect of min_samples_split',
    'Effect of min_samples_leaf', 'Effect of class_weight'
))

# Group by each parameter and calculate mean score
for idx, param in enumerate(['max_depth', 'min_samples_split', 'min_samples_leaf', 'class_weight']):
    row = idx // 2 + 1
    col = idx % 2 + 1
    
    grouped = results_df.groupby(f'param_classifier__{param}')['mean_test_score'].mean()
    
    fig.add_trace(go.Bar(
        x=[str(x) for x in grouped.index],
        y=grouped.values,
        marker_color='steelblue'
    ), row=row, col=col)

fig.update_layout(height=600, title_text='Hyperparameter Effects on Cross-Validation F1 Score',
                  showlegend=False)
fig.show()

## 5. Evaluating the Tuned Model

In [None]:
# Get the best model
best_model = grid_search.best_estimator_

# Make predictions (need to convert back to original labels for consistency)
y_pred_tuned = best_model.predict(X_test)
y_prob_tuned = best_model.predict_proba(X_test)[:, 1]

# Convert predictions back to original labels
y_pred_tuned_labels = np.where(y_pred_tuned == 1, 'N', 'E')

print("Tuned Model Evaluation on Test Set:")
print("="*50)
print(classification_report(y_test, y_pred_tuned_labels))

In [None]:
# Calculate metrics for tuned model
tuned_metrics = {
    'Model': 'Tuned Decision Tree',
    'Accuracy': accuracy_score(y_test, y_pred_tuned_labels),
    'Precision (N)': precision_score(y_test, y_pred_tuned_labels, pos_label='N'),
    'Recall (N)': recall_score(y_test, y_pred_tuned_labels, pos_label='N'),
    'F1-Score (N)': f1_score(y_test, y_pred_tuned_labels, pos_label='N'),
    'AUC-ROC': roc_auc_score(y_test_binary, y_prob_tuned)
}

# Add to comparison
metrics_results.append(tuned_metrics)
metrics_df_updated = pd.DataFrame(metrics_results)
metrics_df_updated = metrics_df_updated[['Model', 'Accuracy', 'Precision (N)', 'Recall (N)', 'F1-Score (N)', 'AUC-ROC']]

print("\nUpdated Model Comparison:")
print(metrics_df_updated.to_string(index=False))

In [None]:
# Confusion matrix for tuned model
plt.figure(figsize=(6, 5))
cm_tuned = confusion_matrix(y_test, y_pred_tuned_labels)
disp = ConfusionMatrixDisplay(confusion_matrix=cm_tuned, display_labels=['E', 'N'])
disp.plot(colorbar=False)
plt.title('Confusion Matrix: Tuned Decision Tree')
plt.tight_layout()
plt.show()

## 6. Model Comparison

### 6.1 Decision Trees vs Logistic Regression

Let's compare our best decision tree to the logistic regression models from Course 2 and Module 1.

In [None]:
# Load baseline logistic regression for comparison
try:
    baseline_lr = pickle.load(open(f'{root_filepath}models/baseline_logistic_model.pkl', 'rb'))
    baseline_lr.fit(X_train, y_train)
    
    # Get predictions
    lr_pred = baseline_lr.predict(X_test)
    lr_prob = baseline_lr.predict_proba(X_test)[:, 1]
    
    lr_metrics = {
        'Model': 'Logistic Regression (Baseline)',
        'Accuracy': accuracy_score(y_test, lr_pred),
        'Precision (N)': precision_score(y_test, lr_pred, pos_label='N'),
        'Recall (N)': recall_score(y_test, lr_pred, pos_label='N'),
        'F1-Score (N)': f1_score(y_test, lr_pred, pos_label='N'),
        'AUC-ROC': roc_auc_score(y_test, lr_prob, labels=['E', 'N'])
    }
    
    print("Logistic Regression (Baseline) loaded and evaluated.")
except:
    print("Note: Could not load baseline logistic regression model.")
    lr_metrics = None

In [None]:
# Create comprehensive comparison
comparison_data = [
    {'Model': 'Basic DT', 'Accuracy': metrics_df_updated.loc[0, 'Accuracy'], 
     'F1 (N)': metrics_df_updated.loc[0, 'F1-Score (N)'], 
     'AUC-ROC': metrics_df_updated.loc[0, 'AUC-ROC'], 'Type': 'Decision Tree'},
    {'Model': 'Constrained DT', 'Accuracy': metrics_df_updated.loc[1, 'Accuracy'], 
     'F1 (N)': metrics_df_updated.loc[1, 'F1-Score (N)'], 
     'AUC-ROC': metrics_df_updated.loc[1, 'AUC-ROC'], 'Type': 'Decision Tree'},
    {'Model': 'Balanced DT', 'Accuracy': metrics_df_updated.loc[2, 'Accuracy'], 
     'F1 (N)': metrics_df_updated.loc[2, 'F1-Score (N)'], 
     'AUC-ROC': metrics_df_updated.loc[2, 'AUC-ROC'], 'Type': 'Decision Tree'},
    {'Model': 'Tuned DT', 'Accuracy': metrics_df_updated.loc[3, 'Accuracy'], 
     'F1 (N)': metrics_df_updated.loc[3, 'F1-Score (N)'], 
     'AUC-ROC': metrics_df_updated.loc[3, 'AUC-ROC'], 'Type': 'Decision Tree'},
]

if lr_metrics:
    comparison_data.append({
        'Model': 'Logistic Reg', 'Accuracy': lr_metrics['Accuracy'],
        'F1 (N)': lr_metrics['F1-Score (N)'], 'AUC-ROC': lr_metrics['AUC-ROC'],
        'Type': 'Logistic Regression'
    })

comparison_df = pd.DataFrame(comparison_data)
print("\nComprehensive Model Comparison:")
print(comparison_df.to_string(index=False))

In [None]:
# Visualize comparison
fig = go.Figure()

colors = {'Decision Tree': 'steelblue', 'Logistic Regression': 'coral'}

fig.add_trace(go.Bar(
    name='F1 Score (N)',
    x=comparison_df['Model'],
    y=comparison_df['F1 (N)'],
    marker_color=[colors[t] for t in comparison_df['Type']]
))

fig.add_trace(go.Scatter(
    name='AUC-ROC',
    x=comparison_df['Model'],
    y=comparison_df['AUC-ROC'],
    mode='markers+lines',
    marker=dict(size=12, color='green'),
    line=dict(color='green', width=2)
))

fig.update_layout(
    title='Model Performance Comparison: Decision Trees vs Logistic Regression',
    yaxis_title='Score',
    height=500,
    yaxis=dict(range=[0, 1])
)

fig.show()

### 6.2 Overfitting Analysis

In [None]:
# Compare training vs test performance for each model
def get_train_test_gap(model, X_train, y_train, X_test, y_test):
    """Calculate train/test performance gap."""
    train_pred = model.predict(X_train)
    test_pred = model.predict(X_test)
    
    train_f1 = f1_score(y_train, train_pred, pos_label='N')
    test_f1 = f1_score(y_test, test_pred, pos_label='N')
    
    return train_f1, test_f1, train_f1 - test_f1

overfit_analysis = []
for name, model in models.items():
    train_f1, test_f1, gap = get_train_test_gap(model, X_train, y_train, X_test, y_test)
    overfit_analysis.append({
        'Model': name,
        'Train F1': train_f1,
        'Test F1': test_f1,
        'Gap (Overfitting)': gap
    })

# Add tuned model (convert labels back)
train_pred_tuned = best_model.predict(X_train)
train_pred_tuned_labels = np.where(train_pred_tuned == 1, 'N', 'E')
train_f1_tuned = f1_score(y_train, train_pred_tuned_labels, pos_label='N')
test_f1_tuned = f1_score(y_test, y_pred_tuned_labels, pos_label='N')

overfit_analysis.append({
    'Model': 'Tuned Decision Tree',
    'Train F1': train_f1_tuned,
    'Test F1': test_f1_tuned,
    'Gap (Overfitting)': train_f1_tuned - test_f1_tuned
})

overfit_df = pd.DataFrame(overfit_analysis)
print("\nOverfitting Analysis:")
print(overfit_df.to_string(index=False))

In [None]:
# Visualize overfitting
fig = go.Figure()

fig.add_trace(go.Bar(
    name='Train F1',
    x=overfit_df['Model'],
    y=overfit_df['Train F1'],
    marker_color='lightblue'
))

fig.add_trace(go.Bar(
    name='Test F1',
    x=overfit_df['Model'],
    y=overfit_df['Test F1'],
    marker_color='steelblue'
))

fig.update_layout(
    title='Training vs Test Performance (Overfitting Analysis)',
    yaxis_title='F1 Score',
    barmode='group',
    height=450
)

fig.show()

**Interpretation**: A large gap between training and test performance indicates overfitting. The basic (unconstrained) tree has the largest gap, while the tuned model has a smaller gap, indicating better generalization.

## 7. Final Model Selection

In [None]:
# Create final recommendation table
final_comparison = metrics_df_updated.copy()
final_comparison['Interpretability'] = ['Medium', 'High', 'High', 'High']
final_comparison['Overfitting Risk'] = ['High', 'Low', 'Low', 'Low']
final_comparison['Recommended For'] = [
    'Not recommended',
    'General use',
    'Minority class focus',
    'Optimal performance'
]

print("\nFinal Model Comparison and Recommendations:")
print(final_comparison[['Model', 'F1-Score (N)', 'AUC-ROC', 'Interpretability', 'Overfitting Risk', 'Recommended For']].to_string(index=False))

In [None]:
# Display best model configuration
print("\nBest Model Configuration:")
print("="*50)
print(f"Model: Tuned Decision Tree")
print(f"\nHyperparameters:")
for param, value in grid_search.best_params_.items():
    print(f"  {param.replace('classifier__', '')}: {value}")

print(f"\nPerformance on Test Set:")
print(f"  - F1 Score (Class N): {test_f1_tuned:.3f}")
print(f"  - AUC-ROC: {tuned_metrics['AUC-ROC']:.3f}")
print(f"  - Recall (Class N): {tuned_metrics['Recall (N)']:.3f}")

## 8. Save Best Model

In [None]:
# Save the tuned model
# First, retrain on original labels for production use
best_params = grid_search.best_params_

final_model = Pipeline([
    ('preprocessing', preprocessor),
    ('classifier', DecisionTreeClassifier(
        max_depth=best_params['classifier__max_depth'],
        min_samples_split=best_params['classifier__min_samples_split'],
        min_samples_leaf=best_params['classifier__min_samples_leaf'],
        class_weight=best_params['classifier__class_weight'],
        random_state=42
    ))
])

final_model.fit(X_train, y_train)

# Save model
filepath = f'{models_path}tuned_decision_tree_final.pkl'
pickle.dump(final_model, open(filepath, 'wb'))
print(f"Saved tuned model to: {filepath}")

In [None]:
# Verify the saved model
loaded_model = pickle.load(open(filepath, 'rb'))
verify_pred = loaded_model.predict(X_test)
verify_f1 = f1_score(y_test, verify_pred, pos_label='N')
print(f"Verification - Test F1 Score: {verify_f1:.3f}")
print("Model saved and verified successfully!")

## 9. Summary

In this notebook, we completed the ML cycle for decision trees by evaluating and tuning our models.

In [None]:
# Final summary table
summary_data = {
    'Metric': ['Best F1 Score (N)', 'Best AUC-ROC', 'Best Recall (N)', 
               'Optimal max_depth', 'Optimal min_samples_split', 
               'Optimal min_samples_leaf', 'Optimal class_weight'],
    'Value': [
        f"{test_f1_tuned:.3f}",
        f"{tuned_metrics['AUC-ROC']:.3f}",
        f"{tuned_metrics['Recall (N)']:.3f}",
        str(best_params['classifier__max_depth']),
        str(best_params['classifier__min_samples_split']),
        str(best_params['classifier__min_samples_leaf']),
        str(best_params['classifier__class_weight'])
    ]
}

summary_df = pd.DataFrame(summary_data)
print("\nFinal Summary:")
print(summary_df.to_string(index=False))

### Key Takeaways

| Topic | Key Learning |
|:------|:-------------|
| **Hyperparameter Tuning** | max_depth, min_samples_split, min_samples_leaf control complexity |
| **Grid Search** | Systematic search finds optimal hyperparameter combinations |
| **Cross-Validation** | Essential for reliable hyperparameter selection |
| **Overfitting** | Unconstrained trees overfit; constraints improve generalization |
| **Class Imbalance** | class_weight='balanced' helps identify minority class |

### ML Cycle Summary for Decision Trees

| Step | What We Did |
|:-----|:------------|
| **1. Build** | Created pipelines with DecisionTreeClassifier |
| **2. Train** | Fit trees using recursive partitioning |
| **3. Predict** | Generated predictions using tree rules |
| **4. Evaluate** | Assessed using accuracy, F1, AUC, confusion matrices |
| **5. Improve** | Tuned hyperparameters with GridSearchCV |

### When to Use Decision Trees

| Use Decision Trees When... | Consider Alternatives When... |
|:---------------------------|:------------------------------|
| Interpretability is critical | Maximum predictive accuracy needed |
| Stakeholders need explanations | Smooth probability estimates needed |
| Feature interactions matter | Linear relationships dominate |
| Building ensemble methods | Few features available |

### Next Steps

In the next module, we will explore **ensemble methods** that combine multiple decision trees:
- **Random Forests**: Reduce variance through bagging
- **Gradient Boosting**: Reduce bias through boosting

These methods often achieve better predictive performance while maintaining some interpretability.

**Proceed to:** `Module 3: Ensemble Methods`