# 5.3 **Train** Neural Networks - Predict Student Departure

## Model Cycle: The 5 Key Steps

### 1. Build the Model : Create the Neural Network architecture with Keras.  
### **2. Train the Model : Fit the model on the training data.**  
### **3. Generate Predictions : Use the trained model to make predictions.**  
### 4. Evaluate the Model : Assess performance using evaluation metrics.  
### 5. Improve the Model : Tune hyperparameters for optimal performance.

## Introduction

In the previous notebook, we built neural network architectures using Keras. Now we train these models on our student departure data.

Training neural networks involves several concepts that differ from traditional machine learning:
- **Epochs**: Multiple passes through the entire dataset
- **Batch size**: Processing data in chunks
- **Callbacks**: Hooks to monitor and control training
- **Early stopping**: Preventing overfitting automatically

### Learning Objectives

By the end of this notebook, you will be able to:

1. Understand epochs, batch sizes, and their effects on training
2. Use callbacks to monitor and control the training process
3. Implement early stopping to prevent overfitting
4. Visualize training history with loss curves
5. Generate predictions from trained neural networks

## 1. Load Dependencies and Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import pickle
import time

# Visualization
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, History

# Display settings
pd.options.display.max_columns = None

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")

In [None]:
# Set up file paths
root_filepath = '/content/drive/MyDrive/projects/Applied-Data-Analytics-For-Higher-Education-Course-2/'
data_filepath = f'{root_filepath}data/'
course3_filepath = f'{root_filepath}course_3/'
module5_filepath = f'{course3_filepath}module_5/'
models_path = f'{module5_filepath}models/'

In [None]:
# Load the saved data splits from previous notebook
data_splits = pickle.load(open(f'{models_path}data_splits.pkl', 'rb'))

X_train = data_splits['X_train']
X_val = data_splits['X_val']
y_train = data_splits['y_train']
y_val = data_splits['y_val']

print("Data loaded successfully!")
print(f"\nTraining set: {X_train.shape[0]} samples, {X_train.shape[1]} features")
print(f"Validation set: {X_val.shape[0]} samples")
print(f"\nTarget distribution (training):")
print(f"  Class 0 (Retained): {(y_train == 0).sum()} ({(y_train == 0).mean()*100:.1f}%)")
print(f"  Class 1 (Departed): {(y_train == 1).sum()} ({(y_train == 1).mean()*100:.1f}%)")

In [None]:
# Load feature information
feature_info = pickle.load(open(f'{models_path}feature_info.pkl', 'rb'))
input_dim = feature_info['input_dim']
print(f"Input dimension: {input_dim}")

## 2. Understanding Training Concepts

### 2.1 Epochs

An **epoch** is one complete pass through the entire training dataset.

- **1 epoch** = model has seen every training sample once
- **10 epochs** = model has seen every training sample 10 times

**How many epochs?**
- Too few: Model hasn't learned enough (underfitting)
- Too many: Model memorizes training data (overfitting)
- Solution: Use **early stopping** to find the sweet spot

In [None]:
# Visualize the concept of epochs
np.random.seed(42)

# Simulate training and validation loss over epochs
epochs = np.arange(1, 101)
train_loss = 0.7 * np.exp(-epochs/20) + 0.1 + np.random.normal(0, 0.02, len(epochs))
val_loss = 0.7 * np.exp(-epochs/25) + 0.15 + 0.003 * epochs + np.random.normal(0, 0.02, len(epochs))

# Find optimal stopping point
optimal_epoch = np.argmin(val_loss) + 1

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=epochs, y=train_loss,
    mode='lines',
    name='Training Loss',
    line=dict(color='blue', width=2)
))

fig.add_trace(go.Scatter(
    x=epochs, y=val_loss,
    mode='lines',
    name='Validation Loss',
    line=dict(color='red', width=2)
))

# Add regions
fig.add_vrect(x0=1, x1=20, fillcolor='yellow', opacity=0.2, layer='below', line_width=0)
fig.add_vrect(x0=optimal_epoch, x1=100, fillcolor='red', opacity=0.1, layer='below', line_width=0)

fig.add_vline(x=optimal_epoch, line_dash='dash', line_color='green', 
              annotation_text=f'Optimal: Epoch {optimal_epoch}')

fig.add_annotation(x=10, y=0.55, text='Underfitting', showarrow=False, font=dict(size=12))
fig.add_annotation(x=70, y=0.55, text='Overfitting', showarrow=False, font=dict(size=12))

fig.update_layout(
    title='Training vs. Validation Loss Over Epochs',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    height=450
)

fig.show()

### 2.2 Batch Size

**Batch size** is the number of samples processed before updating weights.

| Batch Size | Description | Trade-offs |
|:-----------|:------------|:-----------|
| **1** (Stochastic) | Update after each sample | Noisy but fast updates |
| **32-256** (Mini-batch) | Update after small batches | Balance of speed and stability |
| **All data** (Batch) | Update after entire dataset | Stable but slow |

**Common choices**: 32, 64, 128, or 256. We'll use **32** as our default.

In [None]:
# Visualize batch size effects
n_samples = X_train.shape[0]
batch_sizes = [1, 16, 32, 64, 128, 256, n_samples]

data = []
for bs in batch_sizes:
    iterations_per_epoch = np.ceil(n_samples / bs)
    data.append({
        'Batch Size': bs if bs != n_samples else 'Full',
        'Iterations per Epoch': int(iterations_per_epoch),
        'Memory Usage': 'Low' if bs <= 32 else ('Medium' if bs <= 128 else 'High'),
        'Update Stability': 'Noisy' if bs <= 16 else ('Balanced' if bs <= 128 else 'Stable')
    })

batch_df = pd.DataFrame(data)
print(f"Batch Size Analysis (Training samples: {n_samples})")
batch_df

In [None]:
# Visualize gradient updates with different batch sizes
np.random.seed(42)

# Simulate optimization path with different batch sizes
def simulate_optimization(batch_size, n_steps=50):
    path = [(5, 5)]  # Starting point
    noise_scale = 1.0 / np.sqrt(batch_size)  # Smaller batch = more noise
    
    for _ in range(n_steps):
        x, y = path[-1]
        # Move toward minimum (0, 0) with noise
        dx = -0.1 * x + np.random.normal(0, noise_scale)
        dy = -0.1 * y + np.random.normal(0, noise_scale)
        path.append((x + dx, y + dy))
    
    return path

fig = make_subplots(rows=1, cols=3, subplot_titles=(
    'Small Batch (8)', 'Medium Batch (64)', 'Large Batch (256)'
))

batch_configs = [(8, 'red'), (64, 'green'), (256, 'blue')]

for col, (bs, color) in enumerate(batch_configs, 1):
    path = simulate_optimization(bs)
    x_path = [p[0] for p in path]
    y_path = [p[1] for p in path]
    
    fig.add_trace(go.Scatter(
        x=x_path, y=y_path,
        mode='lines+markers',
        line=dict(color=color, width=1),
        marker=dict(size=4),
        showlegend=False
    ), row=1, col=col)
    
    # Add target (minimum)
    fig.add_trace(go.Scatter(
        x=[0], y=[0],
        mode='markers',
        marker=dict(size=15, color='gold', symbol='star'),
        showlegend=False
    ), row=1, col=col)

fig.update_xaxes(range=[-3, 6])
fig.update_yaxes(range=[-3, 6])
fig.update_layout(
    title='Optimization Path with Different Batch Sizes (Star = Minimum)',
    height=350
)

fig.show()

**Interpretation**:
- **Small batch**: Noisy path but explores more (can escape local minima)
- **Medium batch**: Good balance of exploration and stability
- **Large batch**: Stable path but may converge to sharp minima

### 2.3 Iterations vs. Epochs

Important distinction:
- **Iteration** (or step): One batch processed, one weight update
- **Epoch**: All batches processed, entire dataset seen once

$$\text{Iterations per epoch} = \lceil \frac{\text{Number of samples}}{\text{Batch size}} \rceil$$

In [None]:
# Calculate iterations for our data
batch_size = 32
n_samples = X_train.shape[0]
iterations_per_epoch = int(np.ceil(n_samples / batch_size))

print("Training Configuration:")
print("="*50)
print(f"Training samples: {n_samples}")
print(f"Batch size: {batch_size}")
print(f"Iterations per epoch: {iterations_per_epoch}")
print(f"\nFor 100 epochs:")
print(f"  Total iterations: {iterations_per_epoch * 100:,}")
print(f"  Total weight updates: {iterations_per_epoch * 100:,}")

## 3. Callbacks: Monitoring and Controlling Training

### 3.1 What are Callbacks?

**Callbacks** are functions called at specific points during training:
- At the start/end of training
- At the start/end of each epoch
- At the start/end of each batch

**Common uses:**
- Stop training early if no improvement
- Save model weights at checkpoints
- Adjust learning rate during training
- Log metrics to visualization tools

### 3.2 Early Stopping

**Early stopping** prevents overfitting by stopping training when validation performance stops improving.

```python
EarlyStopping(
    monitor='val_loss',      # What metric to watch
    patience=10,             # Epochs to wait before stopping
    restore_best_weights=True,  # Restore best model weights
    min_delta=0.001          # Minimum change to qualify as improvement
)
```

In [None]:
# Create early stopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',          # Monitor validation loss
    patience=15,                  # Wait 15 epochs for improvement
    restore_best_weights=True,    # Restore best weights when stopped
    min_delta=0.001,              # Minimum change to count as improvement
    verbose=1                     # Print when early stopping triggers
)

print("Early Stopping Configuration:")
print(f"  Monitor: {early_stopping.monitor}")
print(f"  Patience: {early_stopping.patience} epochs")
print(f"  Restore best weights: {early_stopping.restore_best_weights}")

In [None]:
# Visualize how early stopping works
np.random.seed(42)

epochs = np.arange(1, 81)
val_loss = 0.6 * np.exp(-epochs/20) + 0.15 + 0.002 * epochs + np.random.normal(0, 0.015, len(epochs))

# Find where early stopping would trigger (patience=15)
best_epoch = np.argmin(val_loss[:50]) + 1  # Minimum in first 50 epochs
stop_epoch = best_epoch + 15  # Patience of 15

fig = go.Figure()

# Plot full loss curve (faded for epochs after stopping)
fig.add_trace(go.Scatter(
    x=epochs, y=val_loss,
    mode='lines',
    name='Validation Loss (if no early stopping)',
    line=dict(color='lightcoral', width=2, dash='dot')
))

# Plot loss up to early stopping
fig.add_trace(go.Scatter(
    x=epochs[:stop_epoch], y=val_loss[:stop_epoch],
    mode='lines',
    name='Validation Loss (with early stopping)',
    line=dict(color='red', width=2)
))

# Mark best epoch
fig.add_trace(go.Scatter(
    x=[best_epoch], y=[val_loss[best_epoch-1]],
    mode='markers',
    name=f'Best Epoch ({best_epoch})',
    marker=dict(size=15, color='green', symbol='star')
))

# Mark stopping point
fig.add_trace(go.Scatter(
    x=[stop_epoch], y=[val_loss[stop_epoch-1]],
    mode='markers',
    name=f'Early Stop ({stop_epoch})',
    marker=dict(size=12, color='orange', symbol='x')
))

# Add patience region
fig.add_vrect(x0=best_epoch, x1=stop_epoch, fillcolor='yellow', opacity=0.2, 
              layer='below', line_width=0,
              annotation_text='Patience Period (15 epochs)', annotation_position='top')

fig.update_layout(
    title='How Early Stopping Works',
    xaxis_title='Epoch',
    yaxis_title='Validation Loss',
    height=450
)

fig.show()

### 3.3 Model Checkpoint

**ModelCheckpoint** saves model weights at specified intervals.

```python
ModelCheckpoint(
    filepath='best_model.keras',  # Where to save
    monitor='val_loss',           # What metric to watch
    save_best_only=True,          # Only save if improved
    save_weights_only=False       # Save entire model or just weights
)
```

In [None]:
# Create model checkpoint callback
import os
os.makedirs(models_path, exist_ok=True)

def create_checkpoint(model_name):
    return ModelCheckpoint(
        filepath=f'{models_path}{model_name}_best.keras',
        monitor='val_loss',
        save_best_only=True,
        save_weights_only=False,
        verbose=1
    )

print("Model checkpoint will save the best model to:") 
print(f"  {models_path}[model_name]_best.keras")

### 3.4 Learning Rate Scheduler

**ReduceLROnPlateau** reduces learning rate when a metric stops improving.

In [None]:
# Create learning rate reducer callback
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,       # Reduce LR by half
    patience=5,       # Wait 5 epochs
    min_lr=0.00001,   # Minimum learning rate
    verbose=1
)

print("Learning Rate Reducer Configuration:")
print(f"  Monitor: {reduce_lr.monitor}")
print(f"  Factor: {reduce_lr.factor} (reduce by half)")
print(f"  Patience: {reduce_lr.patience} epochs")
print(f"  Minimum LR: {reduce_lr.min_lr}")

## 4. Train the Neural Networks

### 4.1 Recreate the Models

Let's recreate our three model architectures from the previous notebook.

In [None]:
# Model creation functions
def create_simple_model(input_dim):
    """Simple neural network: 1 hidden layer."""
    model = Sequential([
        Input(shape=(input_dim,)),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ], name='simple_nn')
    return model

def create_deep_model(input_dim):
    """Deep neural network: 3 hidden layers."""
    model = Sequential([
        Input(shape=(input_dim,)),
        Dense(16, activation='relu'),
        Dense(8, activation='relu'),
        Dense(4, activation='relu'),
        Dense(1, activation='sigmoid')
    ], name='deep_nn')
    return model

def create_wide_model(input_dim):
    """Wide neural network: 2 wide hidden layers."""
    model = Sequential([
        Input(shape=(input_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu'),
        Dense(1, activation='sigmoid')
    ], name='wide_nn')
    return model

def compile_model(model, learning_rate=0.001):
    """Compile model for binary classification."""
    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            tf.keras.metrics.Precision(name='precision'),
            tf.keras.metrics.Recall(name='recall'),
            tf.keras.metrics.AUC(name='auc')
        ]
    )
    return model

print("Model creation functions defined.")

### 4.2 Training Configuration

In [None]:
# Training configuration
EPOCHS = 100           # Maximum epochs (early stopping will likely stop before)
BATCH_SIZE = 32        # Samples per batch
LEARNING_RATE = 0.001  # Initial learning rate

# Class weights to handle imbalance
# Calculate class weights
n_class_0 = (y_train == 0).sum()
n_class_1 = (y_train == 1).sum()
total = len(y_train)

class_weights = {
    0: total / (2 * n_class_0),
    1: total / (2 * n_class_1)
}

print("Training Configuration:")
print("="*50)
print(f"Max epochs: {EPOCHS}")
print(f"Batch size: {BATCH_SIZE}")
print(f"Learning rate: {LEARNING_RATE}")
print(f"\nClass weights (to handle imbalance):")
print(f"  Class 0 (Retained): {class_weights[0]:.4f}")
print(f"  Class 1 (Departed): {class_weights[1]:.4f}")

### 4.3 Train Simple Neural Network

In [None]:
# Create and compile Simple NN
model_simple = create_simple_model(input_dim)
model_simple = compile_model(model_simple, LEARNING_RATE)

# Create callbacks for this model
callbacks_simple = [
    EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True, verbose=1),
    create_checkpoint('simple_nn'),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.00001, verbose=1)
]

print("Simple Neural Network:")
model_simple.summary()

In [None]:
# Train Simple NN
print("\nTraining Simple Neural Network...")
print("="*50)

start_time = time.time()

history_simple = model_simple.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=(X_val, y_val),
    class_weight=class_weights,
    callbacks=callbacks_simple,
    verbose=1
)

training_time_simple = time.time() - start_time
print(f"\nTraining completed in {training_time_simple:.2f} seconds")
print(f"Final epoch: {len(history_simple.history['loss'])}")

### 4.4 Train Deep Neural Network

In [None]:
# Create and compile Deep NN
model_deep = create_deep_model(input_dim)
model_deep = compile_model(model_deep, LEARNING_RATE)

# Create callbacks for this model
callbacks_deep = [
    EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True, verbose=1),
    create_checkpoint('deep_nn'),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.00001, verbose=1)
]

print("Deep Neural Network:")
model_deep.summary()

In [None]:
# Train Deep NN
print("\nTraining Deep Neural Network...")
print("="*50)

start_time = time.time()

history_deep = model_deep.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=(X_val, y_val),
    class_weight=class_weights,
    callbacks=callbacks_deep,
    verbose=1
)

training_time_deep = time.time() - start_time
print(f"\nTraining completed in {training_time_deep:.2f} seconds")
print(f"Final epoch: {len(history_deep.history['loss'])}")

### 4.5 Train Wide Neural Network

In [None]:
# Create and compile Wide NN
model_wide = create_wide_model(input_dim)
model_wide = compile_model(model_wide, LEARNING_RATE)

# Create callbacks for this model
callbacks_wide = [
    EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True, verbose=1),
    create_checkpoint('wide_nn'),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.00001, verbose=1)
]

print("Wide Neural Network:")
model_wide.summary()

In [None]:
# Train Wide NN
print("\nTraining Wide Neural Network...")
print("="*50)

start_time = time.time()

history_wide = model_wide.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=(X_val, y_val),
    class_weight=class_weights,
    callbacks=callbacks_wide,
    verbose=1
)

training_time_wide = time.time() - start_time
print(f"\nTraining completed in {training_time_wide:.2f} seconds")
print(f"Final epoch: {len(history_wide.history['loss'])}")

## 5. Visualize Training History

### 5.1 Loss Curves

Loss curves show how the loss changes during training. They help diagnose:
- **Underfitting**: Both losses high and not decreasing
- **Overfitting**: Training loss low, validation loss increasing
- **Good fit**: Both losses decrease and stabilize

In [None]:
def plot_training_history(history, title='Training History'):
    """
    Plot training and validation loss/metrics.
    
    Parameters:
    -----------
    history : keras.callbacks.History
        Training history object
    title : str
        Plot title
    """
    hist = history.history
    epochs = range(1, len(hist['loss']) + 1)
    
    fig = make_subplots(rows=2, cols=2, subplot_titles=(
        'Loss', 'Accuracy', 'Precision', 'AUC'
    ))
    
    # Loss
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['loss'], name='Train Loss', 
                             line=dict(color='blue')), row=1, col=1)
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['val_loss'], name='Val Loss', 
                             line=dict(color='red')), row=1, col=1)
    
    # Accuracy
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['accuracy'], name='Train Acc', 
                             line=dict(color='blue'), showlegend=False), row=1, col=2)
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['val_accuracy'], name='Val Acc', 
                             line=dict(color='red'), showlegend=False), row=1, col=2)
    
    # Precision
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['precision'], name='Train Prec', 
                             line=dict(color='blue'), showlegend=False), row=2, col=1)
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['val_precision'], name='Val Prec', 
                             line=dict(color='red'), showlegend=False), row=2, col=1)
    
    # AUC
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['auc'], name='Train AUC', 
                             line=dict(color='blue'), showlegend=False), row=2, col=2)
    fig.add_trace(go.Scatter(x=list(epochs), y=hist['val_auc'], name='Val AUC', 
                             line=dict(color='red'), showlegend=False), row=2, col=2)
    
    fig.update_xaxes(title='Epoch')
    fig.update_layout(
        title=title,
        height=600,
        showlegend=True
    )
    
    return fig

# Plot Simple NN history
fig = plot_training_history(history_simple, 'Simple Neural Network Training History')
fig.show()

In [None]:
# Plot Deep NN history
fig = plot_training_history(history_deep, 'Deep Neural Network Training History')
fig.show()

In [None]:
# Plot Wide NN history
fig = plot_training_history(history_wide, 'Wide Neural Network Training History')
fig.show()

### 5.2 Metric Curves

Let's compare all three models' loss curves side by side.

In [None]:
# Compare all models' loss curves
histories = {
    'Simple NN': history_simple,
    'Deep NN': history_deep,
    'Wide NN': history_wide
}

colors = {'Simple NN': 'blue', 'Deep NN': 'green', 'Wide NN': 'orange'}

fig = make_subplots(rows=1, cols=2, subplot_titles=('Training Loss', 'Validation Loss'))

for name, history in histories.items():
    epochs = range(1, len(history.history['loss']) + 1)
    color = colors[name]
    
    # Training loss
    fig.add_trace(go.Scatter(
        x=list(epochs), y=history.history['loss'],
        name=f'{name}', line=dict(color=color, width=2)
    ), row=1, col=1)
    
    # Validation loss
    fig.add_trace(go.Scatter(
        x=list(epochs), y=history.history['val_loss'],
        name=f'{name}', line=dict(color=color, width=2),
        showlegend=False
    ), row=1, col=2)

fig.update_xaxes(title='Epoch')
fig.update_yaxes(title='Loss')
fig.update_layout(
    title='Model Comparison: Loss Curves',
    height=400
)

fig.show()

In [None]:
# Compare validation AUC
fig = go.Figure()

for name, history in histories.items():
    epochs = range(1, len(history.history['val_auc']) + 1)
    color = colors[name]
    
    fig.add_trace(go.Scatter(
        x=list(epochs), y=history.history['val_auc'],
        name=name, line=dict(color=color, width=2)
    ))

fig.update_layout(
    title='Model Comparison: Validation AUC Over Epochs',
    xaxis_title='Epoch',
    yaxis_title='Validation AUC',
    height=400
)

fig.show()

### 5.3 Detecting Overfitting

Signs of overfitting:
1. Training loss continues to decrease
2. Validation loss starts to increase
3. Large gap between training and validation metrics

In [None]:
# Check for overfitting by comparing final train vs. validation metrics
overfitting_analysis = []

for name, history in histories.items():
    hist = history.history
    final_epoch = len(hist['loss'])
    
    overfitting_analysis.append({
        'Model': name,
        'Final Epoch': final_epoch,
        'Train Loss': f"{hist['loss'][-1]:.4f}",
        'Val Loss': f"{hist['val_loss'][-1]:.4f}",
        'Loss Gap': f"{hist['val_loss'][-1] - hist['loss'][-1]:.4f}",
        'Train AUC': f"{hist['auc'][-1]:.4f}",
        'Val AUC': f"{hist['val_auc'][-1]:.4f}",
        'AUC Gap': f"{hist['auc'][-1] - hist['val_auc'][-1]:.4f}"
    })

overfitting_df = pd.DataFrame(overfitting_analysis)
print("Overfitting Analysis (Larger gaps may indicate overfitting):")
overfitting_df

In [None]:
# Visualize train vs validation gap
fig = go.Figure()

model_names = list(histories.keys())

for name, history in histories.items():
    train_auc = history.history['auc'][-1]
    val_auc = history.history['val_auc'][-1]
    
    fig.add_trace(go.Bar(
        name='Training AUC',
        x=[name],
        y=[train_auc],
        marker_color='lightblue',
        showlegend=(name == model_names[0])
    ))
    
    fig.add_trace(go.Bar(
        name='Validation AUC',
        x=[name],
        y=[val_auc],
        marker_color='darkblue',
        showlegend=(name == model_names[0])
    ))

fig.update_layout(
    title='Training vs. Validation AUC (Check for Overfitting)',
    xaxis_title='Model',
    yaxis_title='AUC',
    barmode='group',
    yaxis=dict(range=[0.5, 1.0]),
    height=400
)

fig.show()

## 6. Generate Predictions

### 6.1 Making Predictions

Neural networks in Keras can generate two types of predictions:
- `model.predict(X)`: Probability scores (continuous 0-1)
- Convert to classes by thresholding (typically 0.5)

In [None]:
# Generate predictions on validation set
predictions = {}
probabilities = {}

models = {
    'Simple NN': model_simple,
    'Deep NN': model_deep,
    'Wide NN': model_wide
}

for name, model in models.items():
    # Get probability predictions
    probs = model.predict(X_val, verbose=0)
    probabilities[name] = probs.flatten()
    
    # Convert to class predictions (threshold = 0.5)
    predictions[name] = (probs > 0.5).astype(int).flatten()
    
    print(f"{name}:")
    print(f"  Predicted 0 (Retained): {(predictions[name] == 0).sum()}")
    print(f"  Predicted 1 (Departed): {(predictions[name] == 1).sum()}")
    print()

### 6.2 Prediction Probabilities

Let's examine the distribution of predicted probabilities.

In [None]:
# Plot probability distributions
fig = make_subplots(rows=1, cols=3, subplot_titles=list(models.keys()))

for col, (name, probs) in enumerate(probabilities.items(), 1):
    # Actual 0 (Retained)
    fig.add_trace(go.Histogram(
        x=probs[y_val == 0],
        name='Actual: Retained',
        opacity=0.7,
        marker_color='blue',
        showlegend=(col == 1)
    ), row=1, col=col)
    
    # Actual 1 (Departed)
    fig.add_trace(go.Histogram(
        x=probs[y_val == 1],
        name='Actual: Departed',
        opacity=0.7,
        marker_color='red',
        showlegend=(col == 1)
    ), row=1, col=col)
    
    # Add threshold line
    fig.add_vline(x=0.5, line_dash='dash', line_color='green', row=1, col=col)

fig.update_xaxes(title='Predicted Probability')
fig.update_yaxes(title='Count')
fig.update_layout(
    title='Predicted Probability Distribution by Actual Class',
    barmode='overlay',
    height=400
)

fig.show()

In [None]:
# Show some example predictions
print("Sample Predictions (Simple NN):")
print("="*60)

sample_indices = np.random.choice(len(X_val), 10, replace=False)

for idx in sample_indices:
    actual = "Departed" if y_val[idx] == 1 else "Retained"
    predicted = "Departed" if predictions['Simple NN'][idx] == 1 else "Retained"
    prob = probabilities['Simple NN'][idx]
    correct = "YES" if y_val[idx] == predictions['Simple NN'][idx] else "NO"
    
    print(f"  Sample {idx}: Actual={actual}, Predicted={predicted} (prob={prob:.3f}) - Correct: {correct}")

## 7. Compare Training Results

In [None]:
# Comprehensive comparison
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

training_times = {
    'Simple NN': training_time_simple,
    'Deep NN': training_time_deep,
    'Wide NN': training_time_wide
}

comparison_results = []

for name in models.keys():
    y_pred = predictions[name]
    y_prob = probabilities[name]
    history = histories[name]
    
    comparison_results.append({
        'Model': name,
        'Parameters': models[name].count_params(),
        'Epochs': len(history.history['loss']),
        'Training Time (s)': f"{training_times[name]:.2f}",
        'Val Accuracy': f"{accuracy_score(y_val, y_pred):.4f}",
        'Val Precision': f"{precision_score(y_val, y_pred):.4f}",
        'Val Recall': f"{recall_score(y_val, y_pred):.4f}",
        'Val F1': f"{f1_score(y_val, y_pred):.4f}",
        'Val AUC': f"{roc_auc_score(y_val, y_prob):.4f}"
    })

comparison_df = pd.DataFrame(comparison_results)
print("Training Results Comparison:")
comparison_df

In [None]:
# Visualize comparison
metrics = ['Val Accuracy', 'Val Precision', 'Val Recall', 'Val F1', 'Val AUC']

fig = go.Figure()

colors_list = list(colors.values())
for i, metric in enumerate(metrics):
    values = [float(comparison_df[comparison_df['Model'] == name][metric].values[0]) 
              for name in models.keys()]
    
    fig.add_trace(go.Bar(
        name=metric.replace('Val ', ''),
        x=list(models.keys()),
        y=values
    ))

fig.update_layout(
    title='Neural Network Performance Comparison (Validation Set)',
    xaxis_title='Model',
    yaxis_title='Score',
    barmode='group',
    yaxis=dict(range=[0, 1]),
    height=450
)

fig.show()

In [None]:
# Training efficiency: Parameters vs. Performance
fig = go.Figure()

params = [models[name].count_params() for name in models.keys()]
aucs = [float(comparison_df[comparison_df['Model'] == name]['Val AUC'].values[0]) 
        for name in models.keys()]

fig.add_trace(go.Scatter(
    x=params,
    y=aucs,
    mode='markers+text',
    text=list(models.keys()),
    textposition='top center',
    marker=dict(size=20, color=colors_list)
))

fig.update_layout(
    title='Model Efficiency: Parameters vs. Performance',
    xaxis_title='Number of Parameters',
    yaxis_title='Validation AUC',
    height=400
)

fig.show()

## 8. Save Trained Models

In [None]:
# Save all trained models
for name, model in models.items():
    filename = name.lower().replace(' ', '_')
    model.save(f'{models_path}{filename}_trained.keras')
    print(f"Saved: {models_path}{filename}_trained.keras")

In [None]:
# Save training histories for later analysis
histories_data = {name: history.history for name, history in histories.items()}
pickle.dump(histories_data, open(f'{models_path}training_histories.pkl', 'wb'))
print(f"Saved training histories to: {models_path}training_histories.pkl")

In [None]:
# Save comparison results
comparison_df.to_csv(f'{models_path}training_comparison.csv', index=False)
print(f"Saved comparison results to: {models_path}training_comparison.csv")

In [None]:
# Verify saved files
import os
print("Saved files:")
print("="*60)
for file in sorted(os.listdir(models_path)):
    filepath = f'{models_path}{file}'
    size = os.path.getsize(filepath)
    print(f"  {file}: {size/1024:.1f} KB")

## 9. Summary

In this notebook, we trained three neural network models for student departure prediction.

### Training Concepts

| Concept | Description | Our Settings |
|:--------|:------------|:-------------|
| **Epochs** | Complete passes through data | Max 100 (early stopped) |
| **Batch Size** | Samples per weight update | 32 |
| **Early Stopping** | Stops when no improvement | Patience: 15 epochs |
| **Learning Rate** | Step size for updates | 0.001 (with reduction) |

### Callbacks Used

| Callback | Purpose | Settings |
|:---------|:--------|:---------|
| **EarlyStopping** | Prevent overfitting | patience=15, restore_best_weights=True |
| **ModelCheckpoint** | Save best model | save_best_only=True |
| **ReduceLROnPlateau** | Adaptive learning rate | factor=0.5, patience=5 |

### Training Results

| Model | Parameters | Epochs | Val AUC |
|:------|:-----------|:-------|:--------|
| Simple NN | ~100 | Variable | Check results above |
| Deep NN | ~300 | Variable | Check results above |
| Wide NN | ~700 | Variable | Check results above |

### Key Takeaways

1. **Loss curves** help diagnose training issues (overfitting, underfitting)
2. **Early stopping** prevents wasting time and overfitting
3. **Class weights** help with imbalanced data
4. **More parameters doesn't always mean better performance**

### Next Steps

In the next notebook, we will:
- Evaluate models on the test set
- Compare with tree-based models (Random Forest)
- Tune hyperparameters (architecture, regularization)
- Apply dropout regularization

**Proceed to:** `5.4 Evaluate and Tune Neural Networks`