# 6.1 Systematic Model Comparison Framework

## Course 3: Advanced Classification Models for Student Success

## Introduction

Throughout Course 3, we have explored five distinct families of machine learning models for predicting student departure:

1. **Module 1**: Regularized Logistic Regression (L1, L2, ElasticNet)
2. **Module 2**: Decision Trees
3. **Module 3**: Random Forests
4. **Module 4**: Gradient Boosting (XGBoost, LightGBM, CatBoost)
5. **Module 5**: Neural Networks (MLPClassifier)

Each model family has unique strengths and weaknesses. In this module, we bring all models together for a systematic, head-to-head comparison. This framework will help you make informed decisions about which model to deploy in your institution's student success initiatives.

### Learning Objectives

By the end of this notebook, you will be able to:

1. Train and evaluate all five model families on the same dataset
2. Create comprehensive comparison tables across multiple metrics
3. Visualize model performance using radar charts and comparison plots
4. Understand the trade-offs between accuracy, interpretability, and training time
5. Apply model selection criteria appropriate for higher education contexts

## 1. Setup and Data Preparation

### 1.1 Import Libraries

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Preprocessing
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Models
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier

# Metrics
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, roc_curve, precision_recall_curve, average_precision_score,
    confusion_matrix, classification_report, brier_score_loss, log_loss
)

# Timing
import time

# Set random seed for reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

print("All libraries imported successfully!")

### 1.2 Load Training and Testing Data

In [None]:
# Load the training and testing datasets
train_df = pd.read_csv('../../data/training.csv')
test_df = pd.read_csv('../../data/testing.csv')

print(f"Training set: {train_df.shape[0]:,} students, {train_df.shape[1]} features")
print(f"Testing set: {test_df.shape[0]:,} students, {test_df.shape[1]} features")
print(f"\nTarget variable distribution (Training):")
print(train_df['SEM_3_STATUS'].value_counts())

In [None]:
# Create binary target variable: 1 = Departed, 0 = Enrolled
train_df['DEPARTED'] = (train_df['SEM_3_STATUS'] != 'E').astype(int)
test_df['DEPARTED'] = (test_df['SEM_3_STATUS'] != 'E').astype(int)

print(f"Departure rate (Training): {train_df['DEPARTED'].mean():.2%}")
print(f"Departure rate (Testing): {test_df['DEPARTED'].mean():.2%}")

### 1.3 Define Feature Sets

In [None]:
# Define feature categories
numeric_features = [
    'HS_GPA', 'HS_MATH_GPA', 'HS_ENGL_GPA',
    'UNITS_ATTEMPTED_1', 'UNITS_ATTEMPTED_2',
    'UNITS_COMPLETED_1', 'UNITS_COMPLETED_2',
    'DFW_UNITS_1', 'DFW_UNITS_2',
    'GPA_1', 'GPA_2',
    'DFW_RATE_1', 'DFW_RATE_2',
    'GRADE_POINTS_1', 'GRADE_POINTS_2'
]

categorical_features = [
    'RACE_ETHNICITY', 'GENDER', 'FIRST_GEN_STATUS', 'COLLEGE'
]

target = 'DEPARTED'

print(f"Number of numeric features: {len(numeric_features)}")
print(f"Number of categorical features: {len(categorical_features)}")
print(f"Total features: {len(numeric_features) + len(categorical_features)}")

In [None]:
# Prepare data for modeling
# One-hot encode categorical variables
train_encoded = pd.get_dummies(train_df[numeric_features + categorical_features], 
                               columns=categorical_features, drop_first=True)
test_encoded = pd.get_dummies(test_df[numeric_features + categorical_features], 
                              columns=categorical_features, drop_first=True)

# Align columns between train and test
train_encoded, test_encoded = train_encoded.align(test_encoded, join='left', axis=1, fill_value=0)

# Handle any missing values
train_encoded = train_encoded.fillna(train_encoded.median())
test_encoded = test_encoded.fillna(test_encoded.median())

# Prepare X and y
X_train = train_encoded
y_train = train_df[target]
X_test = test_encoded
y_test = test_df[target]

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"\nFeature columns: {X_train.columns.tolist()[:10]}... (showing first 10)")

In [None]:
# Scale features for models that require it
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Features scaled successfully!")
print(f"Scaled X_train mean: {X_train_scaled.mean():.6f}")
print(f"Scaled X_train std: {X_train_scaled.std():.6f}")

## 2. Building All Models

We will now train each model family with well-tuned hyperparameters. For this comparison, we use configurations that represent each model's typical best performance based on our exploration in previous modules.

In [None]:
# Dictionary to store all models and their results
models = {}
results = {}
training_times = {}

### 2.1 Regularized Logistic Regression

In [None]:
# L2 Regularized Logistic Regression (Ridge)
print("Training L2 Regularized Logistic Regression...")
start_time = time.time()

lr_l2 = LogisticRegression(
    penalty='l2',
    C=0.1,  # Inverse of regularization strength
    solver='lbfgs',
    max_iter=1000,
    random_state=RANDOM_STATE
)
lr_l2.fit(X_train_scaled, y_train)

training_times['Logistic Regression (L2)'] = time.time() - start_time
models['Logistic Regression (L2)'] = lr_l2

print(f"Training completed in {training_times['Logistic Regression (L2)']:.2f} seconds")

In [None]:
# L1 Regularized Logistic Regression (Lasso)
print("Training L1 Regularized Logistic Regression...")
start_time = time.time()

lr_l1 = LogisticRegression(
    penalty='l1',
    C=0.1,
    solver='saga',
    max_iter=1000,
    random_state=RANDOM_STATE
)
lr_l1.fit(X_train_scaled, y_train)

training_times['Logistic Regression (L1)'] = time.time() - start_time
models['Logistic Regression (L1)'] = lr_l1

print(f"Training completed in {training_times['Logistic Regression (L1)']:.2f} seconds")

In [None]:
# ElasticNet Logistic Regression
print("Training ElasticNet Logistic Regression...")
start_time = time.time()

lr_elastic = LogisticRegression(
    penalty='elasticnet',
    C=0.1,
    solver='saga',
    l1_ratio=0.5,  # Balance between L1 and L2
    max_iter=1000,
    random_state=RANDOM_STATE
)
lr_elastic.fit(X_train_scaled, y_train)

training_times['Logistic Regression (ElasticNet)'] = time.time() - start_time
models['Logistic Regression (ElasticNet)'] = lr_elastic

print(f"Training completed in {training_times['Logistic Regression (ElasticNet)']:.2f} seconds")

### 2.2 Decision Tree Classifier

In [None]:
# Decision Tree with optimized hyperparameters
print("Training Decision Tree Classifier...")
start_time = time.time()

dt = DecisionTreeClassifier(
    max_depth=8,
    min_samples_split=20,
    min_samples_leaf=10,
    max_features='sqrt',
    class_weight='balanced',
    random_state=RANDOM_STATE
)
dt.fit(X_train, y_train)  # No scaling needed for tree-based models

training_times['Decision Tree'] = time.time() - start_time
models['Decision Tree'] = dt

print(f"Training completed in {training_times['Decision Tree']:.2f} seconds")

### 2.3 Random Forest Classifier

In [None]:
# Random Forest with optimized hyperparameters
print("Training Random Forest Classifier...")
start_time = time.time()

rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=12,
    min_samples_split=10,
    min_samples_leaf=5,
    max_features='sqrt',
    class_weight='balanced',
    n_jobs=-1,
    random_state=RANDOM_STATE
)
rf.fit(X_train, y_train)

training_times['Random Forest'] = time.time() - start_time
models['Random Forest'] = rf

print(f"Training completed in {training_times['Random Forest']:.2f} seconds")

### 2.4 Gradient Boosting Models

In [None]:
# Scikit-learn Gradient Boosting
print("Training Gradient Boosting Classifier...")
start_time = time.time()

gb = GradientBoostingClassifier(
    n_estimators=150,
    learning_rate=0.1,
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=5,
    subsample=0.8,
    random_state=RANDOM_STATE
)
gb.fit(X_train, y_train)

training_times['Gradient Boosting'] = time.time() - start_time
models['Gradient Boosting'] = gb

print(f"Training completed in {training_times['Gradient Boosting']:.2f} seconds")

In [None]:
# Try to import XGBoost (optional)
try:
    from xgboost import XGBClassifier
    
    print("Training XGBoost Classifier...")
    start_time = time.time()
    
    xgb = XGBClassifier(
        n_estimators=150,
        learning_rate=0.1,
        max_depth=5,
        min_child_weight=3,
        subsample=0.8,
        colsample_bytree=0.8,
        scale_pos_weight=len(y_train[y_train==0]) / len(y_train[y_train==1]),
        use_label_encoder=False,
        eval_metric='logloss',
        random_state=RANDOM_STATE
    )
    xgb.fit(X_train, y_train)
    
    training_times['XGBoost'] = time.time() - start_time
    models['XGBoost'] = xgb
    
    print(f"Training completed in {training_times['XGBoost']:.2f} seconds")
    
except ImportError:
    print("XGBoost not installed. Skipping...")

In [None]:
# Try to import LightGBM (optional)
try:
    from lightgbm import LGBMClassifier
    
    print("Training LightGBM Classifier...")
    start_time = time.time()
    
    lgbm = LGBMClassifier(
        n_estimators=150,
        learning_rate=0.1,
        max_depth=5,
        num_leaves=31,
        min_child_samples=20,
        subsample=0.8,
        colsample_bytree=0.8,
        class_weight='balanced',
        random_state=RANDOM_STATE,
        verbose=-1
    )
    lgbm.fit(X_train, y_train)
    
    training_times['LightGBM'] = time.time() - start_time
    models['LightGBM'] = lgbm
    
    print(f"Training completed in {training_times['LightGBM']:.2f} seconds")
    
except ImportError:
    print("LightGBM not installed. Skipping...")

### 2.5 Neural Network Classifier

In [None]:
# Neural Network (MLP)
print("Training Neural Network (MLP) Classifier...")
start_time = time.time()

nn = MLPClassifier(
    hidden_layer_sizes=(64, 32, 16),
    activation='relu',
    solver='adam',
    alpha=0.001,  # L2 regularization
    batch_size=32,
    learning_rate='adaptive',
    learning_rate_init=0.001,
    max_iter=500,
    early_stopping=True,
    validation_fraction=0.1,
    n_iter_no_change=20,
    random_state=RANDOM_STATE
)
nn.fit(X_train_scaled, y_train)

training_times['Neural Network'] = time.time() - start_time
models['Neural Network'] = nn

print(f"Training completed in {training_times['Neural Network']:.2f} seconds")

In [None]:
# Summary of all trained models
print("="*60)
print("MODEL TRAINING SUMMARY")
print("="*60)
print(f"{'Model':<35} {'Training Time (s)':<20}")
print("-"*60)
for model_name, train_time in sorted(training_times.items(), key=lambda x: x[1]):
    print(f"{model_name:<35} {train_time:>15.3f}")
print("="*60)
print(f"Total models trained: {len(models)}")

## 3. Comprehensive Model Evaluation

Now we evaluate all models on the test set using multiple metrics.

### 3.1 Performance Metrics Comparison

In [None]:
def evaluate_model(model, X_test, y_test, model_name, requires_scaling=False, X_test_scaled=None):
    """
    Comprehensive model evaluation returning multiple metrics.
    """
    # Select appropriate test set
    if requires_scaling and X_test_scaled is not None:
        X_eval = X_test_scaled
    else:
        X_eval = X_test
    
    # Get predictions
    y_pred = model.predict(X_eval)
    y_prob = model.predict_proba(X_eval)[:, 1]
    
    # Calculate metrics
    metrics = {
        'Model': model_name,
        'Accuracy': accuracy_score(y_test, y_pred),
        'Precision': precision_score(y_test, y_pred, zero_division=0),
        'Recall': recall_score(y_test, y_pred, zero_division=0),
        'F1 Score': f1_score(y_test, y_pred, zero_division=0),
        'ROC-AUC': roc_auc_score(y_test, y_prob),
        'Avg Precision': average_precision_score(y_test, y_prob),
        'Brier Score': brier_score_loss(y_test, y_prob),
        'Log Loss': log_loss(y_test, y_prob)
    }
    
    return metrics, y_pred, y_prob

In [None]:
# Models that require scaling
scaling_required = {
    'Logistic Regression (L2)': True,
    'Logistic Regression (L1)': True,
    'Logistic Regression (ElasticNet)': True,
    'Decision Tree': False,
    'Random Forest': False,
    'Gradient Boosting': False,
    'XGBoost': False,
    'LightGBM': False,
    'Neural Network': True
}

# Evaluate all models
all_results = []
predictions = {}
probabilities = {}

for model_name, model in models.items():
    requires_scaling = scaling_required.get(model_name, False)
    metrics, y_pred, y_prob = evaluate_model(
        model, X_test, y_test, model_name, 
        requires_scaling=requires_scaling, 
        X_test_scaled=X_test_scaled
    )
    all_results.append(metrics)
    predictions[model_name] = y_pred
    probabilities[model_name] = y_prob

# Create results DataFrame
results_df = pd.DataFrame(all_results)
results_df = results_df.set_index('Model')

# Add training time
results_df['Training Time (s)'] = results_df.index.map(training_times)

print("Model evaluation complete!")

In [None]:
# Display comprehensive results table
print("="*100)
print("COMPREHENSIVE MODEL COMPARISON - PERFORMANCE METRICS")
print("="*100)
display_cols = ['Accuracy', 'Precision', 'Recall', 'F1 Score', 'ROC-AUC', 'Avg Precision']
print(results_df[display_cols].round(4).to_string())
print("="*100)

In [None]:
# Create performance comparison bar chart
metrics_to_plot = ['Accuracy', 'Precision', 'Recall', 'F1 Score', 'ROC-AUC']

fig = go.Figure()

colors = px.colors.qualitative.Set2

for i, metric in enumerate(metrics_to_plot):
    fig.add_trace(go.Bar(
        name=metric,
        x=results_df.index,
        y=results_df[metric],
        marker_color=colors[i % len(colors)]
    ))

fig.update_layout(
    title='Model Performance Comparison Across Multiple Metrics',
    xaxis_title='Model',
    yaxis_title='Score',
    barmode='group',
    height=500,
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1),
    xaxis_tickangle=-45
)

fig.show()

### 3.2 Training Time Analysis

In [None]:
# Create training time comparison chart
time_df = pd.DataFrame({
    'Model': list(training_times.keys()),
    'Training Time (s)': list(training_times.values())
}).sort_values('Training Time (s)', ascending=True)

fig = go.Figure()

fig.add_trace(go.Bar(
    x=time_df['Training Time (s)'],
    y=time_df['Model'],
    orientation='h',
    marker_color='steelblue',
    text=time_df['Training Time (s)'].round(2),
    textposition='outside'
))

fig.update_layout(
    title='Model Training Time Comparison',
    xaxis_title='Training Time (seconds)',
    yaxis_title='Model',
    height=450,
    margin=dict(l=200)
)

fig.show()

### 3.3 ROC Curve Comparison

In [None]:
# Plot ROC curves for all models
fig = go.Figure()

colors = px.colors.qualitative.Plotly

for i, (model_name, y_prob) in enumerate(probabilities.items()):
    fpr, tpr, _ = roc_curve(y_test, y_prob)
    auc = roc_auc_score(y_test, y_prob)
    
    fig.add_trace(go.Scatter(
        x=fpr, y=tpr,
        mode='lines',
        name=f'{model_name} (AUC={auc:.3f})',
        line=dict(color=colors[i % len(colors)], width=2)
    ))

# Add diagonal reference line
fig.add_trace(go.Scatter(
    x=[0, 1], y=[0, 1],
    mode='lines',
    name='Random Classifier',
    line=dict(color='gray', dash='dash', width=1)
))

fig.update_layout(
    title='ROC Curve Comparison - All Models',
    xaxis_title='False Positive Rate',
    yaxis_title='True Positive Rate',
    height=600,
    legend=dict(x=0.6, y=0.1),
    xaxis=dict(constrain='domain'),
    yaxis=dict(scaleanchor='x', scaleratio=1)
)

fig.show()

### 3.4 Precision-Recall Comparison

In [None]:
# Plot Precision-Recall curves for all models
fig = go.Figure()

colors = px.colors.qualitative.Plotly

for i, (model_name, y_prob) in enumerate(probabilities.items()):
    precision, recall, _ = precision_recall_curve(y_test, y_prob)
    ap = average_precision_score(y_test, y_prob)
    
    fig.add_trace(go.Scatter(
        x=recall, y=precision,
        mode='lines',
        name=f'{model_name} (AP={ap:.3f})',
        line=dict(color=colors[i % len(colors)], width=2)
    ))

# Add baseline (prevalence)
prevalence = y_test.mean()
fig.add_hline(y=prevalence, line_dash='dash', line_color='gray',
              annotation_text=f'Baseline (prevalence={prevalence:.2%})')

fig.update_layout(
    title='Precision-Recall Curve Comparison - All Models',
    xaxis_title='Recall (True Positive Rate)',
    yaxis_title='Precision (Positive Predictive Value)',
    height=600,
    legend=dict(x=0.02, y=0.02),
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 1])
)

fig.show()

## 4. Multi-Dimensional Model Comparison

Beyond raw performance metrics, models differ in interpretability, computational requirements, and suitability for different use cases.

### 4.1 Radar Chart: Model Capabilities

In [None]:
# Define interpretability and other qualitative scores (1-10 scale)
# These are based on general ML best practices
qualitative_scores = {
    'Logistic Regression (L2)': {
        'Interpretability': 9,
        'Training Speed': 10,
        'Prediction Speed': 10,
        'Handles Non-linearity': 3,
        'Feature Interactions': 2,
        'Robustness to Outliers': 5
    },
    'Logistic Regression (L1)': {
        'Interpretability': 9,
        'Training Speed': 9,
        'Prediction Speed': 10,
        'Handles Non-linearity': 3,
        'Feature Interactions': 2,
        'Robustness to Outliers': 5
    },
    'Decision Tree': {
        'Interpretability': 10,
        'Training Speed': 9,
        'Prediction Speed': 9,
        'Handles Non-linearity': 8,
        'Feature Interactions': 8,
        'Robustness to Outliers': 7
    },
    'Random Forest': {
        'Interpretability': 5,
        'Training Speed': 6,
        'Prediction Speed': 7,
        'Handles Non-linearity': 9,
        'Feature Interactions': 9,
        'Robustness to Outliers': 8
    },
    'Gradient Boosting': {
        'Interpretability': 4,
        'Training Speed': 5,
        'Prediction Speed': 7,
        'Handles Non-linearity': 9,
        'Feature Interactions': 9,
        'Robustness to Outliers': 7
    },
    'XGBoost': {
        'Interpretability': 4,
        'Training Speed': 7,
        'Prediction Speed': 8,
        'Handles Non-linearity': 10,
        'Feature Interactions': 10,
        'Robustness to Outliers': 7
    },
    'LightGBM': {
        'Interpretability': 4,
        'Training Speed': 8,
        'Prediction Speed': 9,
        'Handles Non-linearity': 10,
        'Feature Interactions': 10,
        'Robustness to Outliers': 7
    },
    'Neural Network': {
        'Interpretability': 2,
        'Training Speed': 4,
        'Prediction Speed': 8,
        'Handles Non-linearity': 10,
        'Feature Interactions': 10,
        'Robustness to Outliers': 5
    }
}

# Only include models that we actually trained
qualitative_scores = {k: v for k, v in qualitative_scores.items() if k in models}

In [None]:
# Create radar chart for model comparison
categories = ['Interpretability', 'Training Speed', 'Prediction Speed', 
              'Handles Non-linearity', 'Feature Interactions', 'Robustness to Outliers']

fig = go.Figure()

colors = px.colors.qualitative.Plotly

# Select key models for radar chart (to avoid clutter)
key_models = ['Logistic Regression (L2)', 'Decision Tree', 'Random Forest', 
              'Gradient Boosting', 'Neural Network']

for i, model_name in enumerate(key_models):
    if model_name in qualitative_scores:
        values = [qualitative_scores[model_name][cat] for cat in categories]
        values.append(values[0])  # Close the polygon
        
        fig.add_trace(go.Scatterpolar(
            r=values,
            theta=categories + [categories[0]],
            name=model_name,
            line=dict(color=colors[i % len(colors)], width=2),
            fill='toself',
            fillcolor=colors[i % len(colors)],
            opacity=0.3
        ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 10]
        )
    ),
    title='Radar Chart: Model Capabilities Comparison',
    height=600,
    showlegend=True
)

fig.show()

**Interpretation:**

- **Logistic Regression**: Excels in interpretability and speed, but limited in capturing complex patterns
- **Decision Tree**: Highly interpretable and handles non-linearity well
- **Random Forest**: Strong overall performance but reduced interpretability
- **Gradient Boosting**: Top predictive power, but slower and less interpretable
- **Neural Network**: Flexible but acts as a "black box"

### 4.2 Interpretability vs Performance Trade-off

In [None]:
# Create interpretability vs AUC scatter plot
plot_data = []
for model_name in models.keys():
    if model_name in qualitative_scores:
        plot_data.append({
            'Model': model_name,
            'Interpretability': qualitative_scores[model_name]['Interpretability'],
            'ROC-AUC': results_df.loc[model_name, 'ROC-AUC'],
            'Training Time': training_times[model_name]
        })

plot_df = pd.DataFrame(plot_data)

fig = px.scatter(
    plot_df,
    x='Interpretability',
    y='ROC-AUC',
    size='Training Time',
    color='Model',
    text='Model',
    title='Interpretability vs. Predictive Performance Trade-off',
    labels={
        'Interpretability': 'Interpretability Score (1-10)',
        'ROC-AUC': 'ROC-AUC Score',
        'Training Time': 'Training Time (s)'
    },
    height=600
)

fig.update_traces(textposition='top center')
fig.update_layout(
    showlegend=False,
    xaxis=dict(range=[0, 11]),
    yaxis=dict(range=[0.5, 1.0])
)

# Add quadrant labels
fig.add_annotation(x=2, y=0.95, text="High Performance,<br>Low Interpretability",
                   showarrow=False, font=dict(size=10, color='gray'))
fig.add_annotation(x=8, y=0.95, text="High Performance,<br>High Interpretability",
                   showarrow=False, font=dict(size=10, color='gray'))
fig.add_annotation(x=2, y=0.55, text="Low Performance,<br>Low Interpretability",
                   showarrow=False, font=dict(size=10, color='gray'))
fig.add_annotation(x=8, y=0.55, text="Low Performance,<br>High Interpretability",
                   showarrow=False, font=dict(size=10, color='gray'))

fig.show()

**Key Insight**: The ideal model would be in the upper-right quadrant (high performance AND high interpretability). In practice, there is often a trade-off. The bubble size represents training time - larger bubbles indicate longer training times.

### 4.3 Comprehensive Comparison Table

In [None]:
# Create comprehensive comparison table
comparison_data = []

for model_name in models.keys():
    row = {
        'Model': model_name,
        'ROC-AUC': results_df.loc[model_name, 'ROC-AUC'],
        'F1 Score': results_df.loc[model_name, 'F1 Score'],
        'Recall': results_df.loc[model_name, 'Recall'],
        'Training Time (s)': training_times[model_name],
        'Interpretability': qualitative_scores.get(model_name, {}).get('Interpretability', 'N/A'),
        'Scaling Required': 'Yes' if scaling_required.get(model_name, False) else 'No'
    }
    comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
comparison_df = comparison_df.sort_values('ROC-AUC', ascending=False)

print("="*100)
print("COMPREHENSIVE MODEL COMPARISON TABLE")
print("="*100)
print(comparison_df.to_string(index=False))
print("="*100)

In [None]:
# Create heatmap of performance metrics
heatmap_metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score', 'ROC-AUC', 'Avg Precision']
heatmap_data = results_df[heatmap_metrics]

# Normalize each column to 0-1 for better visualization
heatmap_normalized = (heatmap_data - heatmap_data.min()) / (heatmap_data.max() - heatmap_data.min())

fig = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_metrics,
    y=heatmap_data.index,
    colorscale='RdYlGn',
    text=np.round(heatmap_data.values, 3),
    texttemplate='%{text}',
    textfont={'size': 10},
    hovertemplate='Model: %{y}<br>Metric: %{x}<br>Value: %{z:.4f}<extra></extra>'
))

fig.update_layout(
    title='Model Performance Heatmap',
    xaxis_title='Metric',
    yaxis_title='Model',
    height=500
)

fig.show()

## 5. Higher Education Context Analysis

Model selection in higher education involves more than just performance metrics. We must consider institutional constraints, stakeholder needs, and ethical implications.

### 5.1 Model Selection Criteria for Universities

In [None]:
# Define higher education specific criteria
he_criteria = {
    'Explainability to Advisors': {
        'Description': 'Can academic advisors understand and explain predictions to students?',
        'Logistic Regression (L2)': 9,
        'Decision Tree': 10,
        'Random Forest': 5,
        'Gradient Boosting': 3,
        'Neural Network': 1
    },
    'Actionable Insights': {
        'Description': 'Does the model reveal what factors advisors can influence?',
        'Logistic Regression (L2)': 8,
        'Decision Tree': 9,
        'Random Forest': 6,
        'Gradient Boosting': 5,
        'Neural Network': 2
    },
    'Regulatory Compliance': {
        'Description': 'Can decisions be audited and explained for compliance?',
        'Logistic Regression (L2)': 10,
        'Decision Tree': 9,
        'Random Forest': 5,
        'Gradient Boosting': 4,
        'Neural Network': 2
    },
    'Integration Ease': {
        'Description': 'How easily can the model be integrated into existing systems?',
        'Logistic Regression (L2)': 10,
        'Decision Tree': 9,
        'Random Forest': 7,
        'Gradient Boosting': 6,
        'Neural Network': 5
    },
    'Maintenance Burden': {
        'Description': 'How much effort is needed to maintain and update the model?',
        'Logistic Regression (L2)': 9,
        'Decision Tree': 8,
        'Random Forest': 6,
        'Gradient Boosting': 5,
        'Neural Network': 4
    }
}

# Display criteria
print("Higher Education Model Selection Criteria:")
print("="*80)
for criterion, data in he_criteria.items():
    print(f"\n{criterion}:")
    print(f"  {data['Description']}")

In [None]:
# Create Higher Education suitability radar chart
he_models = ['Logistic Regression (L2)', 'Decision Tree', 'Random Forest', 
             'Gradient Boosting', 'Neural Network']
he_categories = list(he_criteria.keys())

fig = go.Figure()

colors = px.colors.qualitative.Set1

for i, model_name in enumerate(he_models):
    if model_name in models:
        values = [he_criteria[cat].get(model_name, 5) for cat in he_categories]
        values.append(values[0])  # Close polygon
        
        fig.add_trace(go.Scatterpolar(
            r=values,
            theta=he_categories + [he_categories[0]],
            name=model_name,
            line=dict(color=colors[i % len(colors)], width=2)
        ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 10]
        )
    ),
    title='Higher Education Suitability: Model Comparison',
    height=600,
    showlegend=True
)

fig.show()

### 5.2 When to Use Which Model

In [None]:
# Model recommendation framework
recommendations = {
    'Scenario': [
        'Early Warning System for Advisors',
        'Institutional Research Reports',
        'Grant-funded Research Project',
        'Real-time Dashboard Integration',
        'Small Institution (<5,000 students)',
        'Large Institution (>30,000 students)',
        'Maximum Recall for At-Risk Students',
        'Balanced Precision and Recall',
        'Limited IT Resources',
        'Advanced Data Science Team'
    ],
    'Recommended Model': [
        'Decision Tree or Logistic Regression',
        'Logistic Regression (high interpretability)',
        'Gradient Boosting / XGBoost (max performance)',
        'Logistic Regression (fast predictions)',
        'Logistic Regression or Decision Tree',
        'Random Forest or Gradient Boosting',
        'Tune threshold on any high-recall model',
        'Random Forest or Gradient Boosting',
        'Logistic Regression (minimal maintenance)',
        'XGBoost / LightGBM / Neural Network'
    ],
    'Rationale': [
        'Advisors need to understand and explain predictions to students',
        'Coefficients directly show factor importance for reports',
        'Research publications value performance over interpretability',
        'Logistic regression has fastest inference time',
        'Simpler models less prone to overfitting on smaller datasets',
        'Ensemble methods scale well and capture complex patterns',
        'Adjust classification threshold to prioritize recall',
        'Ensemble methods typically achieve best F1 scores',
        'Simpler models require less tuning and monitoring',
        'Complex models require expertise to tune and maintain'
    ]
}

rec_df = pd.DataFrame(recommendations)

print("MODEL RECOMMENDATION GUIDE FOR HIGHER EDUCATION")
print("="*120)
for i, row in rec_df.iterrows():
    print(f"\nScenario: {row['Scenario']}")
    print(f"  Recommended: {row['Recommended Model']}")
    print(f"  Rationale: {row['Rationale']}")
print("\n" + "="*120)

In [None]:
# Create decision flowchart data
print("\n" + "="*80)
print("MODEL SELECTION DECISION FLOWCHART")
print("="*80)
print("""
START: What is your primary constraint?
|
+-- INTERPRETABILITY is critical?
|   |
|   +-- Yes --> Need feature selection?
|   |           |
|   |           +-- Yes --> Logistic Regression (L1)
|   |           +-- No  --> Decision Tree or Logistic Regression (L2)
|   |
|   +-- No --> Continue below
|
+-- MAXIMUM PERFORMANCE is the goal?
|   |
|   +-- Yes --> Have large dataset (>10K samples)?
|   |           |
|   |           +-- Yes --> XGBoost / LightGBM
|   |           +-- No  --> Random Forest
|   |
|   +-- No --> Continue below
|
+-- TRAINING TIME is a constraint?
|   |
|   +-- Yes --> Logistic Regression or Decision Tree
|   +-- No  --> Random Forest or Gradient Boosting
|
+-- Default Recommendation:
    --> Random Forest (good balance of all factors)

""")
print("="*80)

## 6. Summary

In [None]:
# Final summary table
print("="*100)
print("FINAL MODEL COMPARISON SUMMARY")
print("="*100)

# Rank models by AUC
ranked_df = results_df.sort_values('ROC-AUC', ascending=False)

print("\nModels Ranked by ROC-AUC:")
print("-"*60)
for i, (model_name, row) in enumerate(ranked_df.iterrows(), 1):
    print(f"{i}. {model_name}: AUC = {row['ROC-AUC']:.4f}, F1 = {row['F1 Score']:.4f}")

# Identify best model by different criteria
print("\n" + "-"*60)
print("BEST MODEL BY CRITERION:")
print("-"*60)
print(f"Best ROC-AUC: {results_df['ROC-AUC'].idxmax()} ({results_df['ROC-AUC'].max():.4f})")
print(f"Best F1 Score: {results_df['F1 Score'].idxmax()} ({results_df['F1 Score'].max():.4f})")
print(f"Best Recall: {results_df['Recall'].idxmax()} ({results_df['Recall'].max():.4f})")
print(f"Best Precision: {results_df['Precision'].idxmax()} ({results_df['Precision'].max():.4f})")
print(f"Fastest Training: {min(training_times, key=training_times.get)} ({min(training_times.values()):.3f}s)")
print("="*100)

### Key Takeaways

| Model Family | Strengths | Weaknesses | Best For |
|:-------------|:----------|:-----------|:---------|
| **Logistic Regression** | Highly interpretable, fast, coefficients show importance | Limited non-linearity | Reports, compliance, small data |
| **Decision Tree** | Very interpretable, visual, handles non-linearity | Prone to overfitting | Advisor tools, simple rules |
| **Random Forest** | Good performance, robust, feature importance | Less interpretable | Balanced needs, medium-large data |
| **Gradient Boosting** | Top performance, handles complex patterns | Slow training, black-box | Research, maximum performance |
| **Neural Network** | Flexible, powerful with lots of data | Black-box, needs tuning | Large data, complex patterns |

### Recommendations for Higher Education

1. **For most institutions**: Start with **Logistic Regression** or **Random Forest** - they offer the best balance of performance and interpretability

2. **For advisor-facing tools**: Use **Decision Trees** or **Logistic Regression** where explanations are critical

3. **For institutional research**: **Logistic Regression** provides clear coefficient interpretations for reports

4. **For maximum performance**: **Gradient Boosting** (XGBoost/LightGBM) typically achieves the best metrics

5. **Consider ensemble approaches**: Combine predictions from multiple models for robust results

### Next Steps

In the next notebook, we will select the best model for deployment, prepare it for production use, and discuss deployment considerations for higher education contexts.

**Proceed to:** `6.2 Final Model Selection and Deployment`