# 5.2 **Build** a Neural Network with Keras - Predict Student Departure

## Model Cycle: The 5 Key Steps

### **1. Build the Model : Create the Neural Network architecture with Keras.**  
### 2. Train the Model : Fit the model on the training data.  
### 3. Generate Predictions : Use the trained model to make predictions.  
### 4. Evaluate the Model : Assess performance using evaluation metrics.  
### 5. Improve the Model : Tune hyperparameters for optimal performance.

## Introduction

In the previous notebook, we learned the theory behind neural networks. Now we put that knowledge into practice by building neural network models using **TensorFlow** and **Keras**.

Keras provides a simple, intuitive API for creating neural networks. We will use the **Sequential API** to stack layers and build models for predicting student departure.

### Learning Objectives

By the end of this notebook, you will be able to:

1. Understand the TensorFlow/Keras ecosystem
2. Create neural networks using the Sequential API
3. Configure Dense (fully connected) layers with appropriate activations
4. Compile models with loss functions, optimizers, and metrics
5. Visualize and summarize model architectures

## 1. Load Dependencies and Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import pickle

# Visualization
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Scikit-learn for preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input, Dropout

# Display settings
pd.options.display.max_columns = None

# Check TensorFlow version
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

In [None]:
# Set up file paths
root_filepath = '/content/drive/MyDrive/projects/Applied-Data-Analytics-For-Higher-Education-Course-2/'
data_filepath = f'{root_filepath}data/'
course3_filepath = f'{root_filepath}course_3/'
module5_filepath = f'{course3_filepath}module_5/'

In [None]:
# Load training data
df_training = pd.read_csv(f'{data_filepath}training.csv')

print(f"Training data shape: {df_training.shape}")
print(f"\nTarget distribution:")
print(df_training['SEM_3_STATUS'].value_counts(normalize=True))
print(f"\nClass imbalance ratio: {df_training['SEM_3_STATUS'].value_counts()[0] / df_training['SEM_3_STATUS'].value_counts()[1]:.2f}:1")

In [None]:
# Preview the data
df_training.head()

## 2. Introduction to TensorFlow and Keras

### 2.1 What is TensorFlow?

**TensorFlow** is Google's open-source machine learning framework. It provides:

- Efficient numerical computation
- Automatic differentiation (computing gradients for backpropagation)
- GPU acceleration for faster training
- Tools for deploying models to production

**Tensors** are multi-dimensional arrays - the fundamental data structure:
- 0D tensor: scalar (single number)
- 1D tensor: vector (list of numbers)
- 2D tensor: matrix (table of numbers)
- 3D+ tensor: higher-dimensional arrays

In [None]:
# Demonstrate tensors
# 0D: scalar
scalar = tf.constant(5)
print(f"Scalar (0D): {scalar}, shape: {scalar.shape}")

# 1D: vector
vector = tf.constant([1, 2, 3, 4, 5])
print(f"Vector (1D): {vector}, shape: {vector.shape}")

# 2D: matrix (like a batch of samples with features)
matrix = tf.constant([[1, 2, 3], [4, 5, 6]])
print(f"Matrix (2D): \n{matrix}, shape: {matrix.shape}")

# Example: A batch of 3 students with 4 features each
student_batch = tf.constant([
    [3.5, 0.1, 15, 1],  # Student 1: GPA, DFW_rate, units, first_gen
    [2.8, 0.3, 12, 0],  # Student 2
    [3.9, 0.0, 18, 1],  # Student 3
])
print(f"\nStudent batch shape: {student_batch.shape}")
print(f"(3 students, 4 features each)")

### 2.2 Keras: The High-Level API

**Keras** is TensorFlow's high-level API that makes building neural networks simple and intuitive.

**Why use Keras?**
- Simple, readable code
- Modular and composable
- Works with both beginners and experts
- Integrated directly into TensorFlow

**Two main ways to build models in Keras:**
1. **Sequential API**: For simple, linear stacks of layers (what we'll use)
2. **Functional API**: For complex architectures with multiple inputs/outputs

In [None]:
# Compare the two approaches conceptually
print("Sequential API (what we'll use):")
print("""  
model = Sequential([
    Dense(16, activation='relu', input_shape=(10,)),
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid')
])
""")

print("\nFunctional API (for complex architectures):")
print("""
inputs = Input(shape=(10,))
x = Dense(16, activation='relu')(inputs)
x = Dense(8, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)
model = Model(inputs, outputs)
""")

## 3. The Sequential API

### 3.1 Creating a Sequential Model

The Sequential model is a linear stack of layers. You can create it in two ways:

**Method 1**: Pass a list of layers to the constructor
```python
model = Sequential([
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])
```

**Method 2**: Add layers one by one
```python
model = Sequential()
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
```

Both methods produce the same result. We'll primarily use Method 1 for clarity.

In [None]:
# Create a simple Sequential model
simple_model = Sequential([
    Input(shape=(10,)),  # Input layer: 10 features
    Dense(8, activation='relu'),  # Hidden layer: 8 neurons, ReLU activation
    Dense(1, activation='sigmoid')  # Output layer: 1 neuron, Sigmoid for binary classification
])

# View model summary
print("Simple Neural Network Model:")
simple_model.summary()

**Understanding the Summary:**

- **Layer**: Name and type of each layer
- **Output Shape**: Shape of the output from each layer (None = batch size, determined at runtime)
- **Param #**: Number of trainable parameters (weights + biases)

For a Dense layer: `params = (input_size * output_size) + output_size`
- Example: 10 inputs -> 8 neurons = (10 * 8) + 8 = 88 parameters

### 3.2 Adding Layers

Keras provides many layer types. For tabular data, the most important is the **Dense** layer.

Common layer types in Keras:

| Layer | Use Case | Description |
|:------|:---------|:------------|
| `Dense` | All neural networks | Fully connected layer |
| `Dropout` | Regularization | Randomly drops neurons during training |
| `BatchNormalization` | Training stability | Normalizes layer inputs |
| `Conv2D` | Image data | Convolutional layer |
| `LSTM` | Sequential data | Long Short-Term Memory |

For our student departure prediction, we'll use `Dense` and `Dropout` layers.

## 4. Understanding Dense Layers

### 4.1 Dense Layer Parameters

The `Dense` layer is the workhorse of neural networks for tabular data.

```python
Dense(
    units,              # Number of neurons (required)
    activation=None,    # Activation function
    use_bias=True,      # Include bias term?
    kernel_initializer='glorot_uniform',  # How to initialize weights
    bias_initializer='zeros',             # How to initialize biases
    kernel_regularizer=None,              # L1/L2 regularization on weights
    bias_regularizer=None,                # Regularization on biases
)
```

**Most important parameters:**
- `units`: Number of neurons in the layer
- `activation`: Which activation function to use

In [None]:
# Demonstrate Dense layer configurations
print("Common Dense Layer Configurations:")
print("="*60)

# Hidden layer with ReLU
layer1 = Dense(16, activation='relu')
print(f"\n1. Hidden layer: Dense(16, activation='relu')")
print(f"   - 16 neurons with ReLU activation")
print(f"   - Good default for hidden layers")

# Output layer for binary classification
layer2 = Dense(1, activation='sigmoid')
print(f"\n2. Binary output: Dense(1, activation='sigmoid')")
print(f"   - 1 neuron with Sigmoid activation")
print(f"   - Outputs probability between 0 and 1")

# Output layer for multi-class classification
layer3 = Dense(5, activation='softmax')
print(f"\n3. Multi-class output: Dense(5, activation='softmax')")
print(f"   - 5 neurons (one per class) with Softmax activation")
print(f"   - Outputs probabilities that sum to 1")

# Dense with L2 regularization
from tensorflow.keras.regularizers import l2
layer4 = Dense(16, activation='relu', kernel_regularizer=l2(0.01))
print(f"\n4. Regularized: Dense(16, activation='relu', kernel_regularizer=l2(0.01))")
print(f"   - Adds L2 penalty to weights (prevents overfitting)")

### 4.2 Activation Functions in Keras

Keras provides all common activation functions as strings or through the `activations` module.

In [None]:
# Visualize activation functions available in Keras
x = np.linspace(-5, 5, 200)

# Get activations from Keras
activations = {
    'relu': tf.keras.activations.relu(x).numpy(),
    'sigmoid': tf.keras.activations.sigmoid(x).numpy(),
    'tanh': tf.keras.activations.tanh(x).numpy(),
    'softplus': tf.keras.activations.softplus(x).numpy(),
    'elu': tf.keras.activations.elu(x).numpy(),
    'selu': tf.keras.activations.selu(x).numpy(),
}

fig = make_subplots(rows=2, cols=3, subplot_titles=list(activations.keys()))

colors = px.colors.qualitative.Set2
for idx, (name, y) in enumerate(activations.items()):
    row = idx // 3 + 1
    col = idx % 3 + 1
    fig.add_trace(go.Scatter(
        x=x, y=y, mode='lines',
        line=dict(color=colors[idx], width=3),
        name=name, showlegend=False
    ), row=row, col=col)

fig.update_xaxes(title='z')
fig.update_yaxes(title='f(z)')
fig.update_layout(
    title='Activation Functions Available in Keras',
    height=500
)

fig.show()

In [None]:
# Activation function guide for our problem
activation_guide = pd.DataFrame({
    'Activation': ['relu', 'sigmoid', 'tanh', 'softmax', 'linear'],
    'Use Case': [
        'Hidden layers (default choice)',
        'Binary classification output',
        'Hidden layers (alternative to ReLU)',
        'Multi-class classification output',
        'Regression output'
    ],
    'Output Range': [
        '[0, infinity)',
        '(0, 1)',
        '(-1, 1)',
        '(0, 1) - sums to 1',
        '(-infinity, infinity)'
    ],
    'For Student Departure': [
        'YES - use in hidden layers',
        'YES - use in output layer',
        'Alternative for hidden layers',
        'NO - not binary classification',
        'NO - not regression'
    ]
})

print("Activation Function Guide for Student Departure Prediction:")
activation_guide

## 5. Build Neural Networks for Student Departure

### 5.1 Data Preprocessing Pipeline

Neural networks require:
1. **Scaled numeric features**: Networks are sensitive to feature scales
2. **Encoded categorical features**: Convert categories to numbers

We'll use the same preprocessing as our previous models for fair comparison.

In [None]:
# Define feature groups
minmax_columns = [
    'HS_GPA',
    'GPA_1', 'GPA_2',
    'DFW_RATE_1', 'DFW_RATE_2'
]

standard_columns = [
    'UNITS_ATTEMPTED_1', 'UNITS_ATTEMPTED_2'
]

categorical_columns = [
    'GENDER',
    'RACE_ETHNICITY',
    'FIRST_GEN_STATUS'
]

# All features
all_features = minmax_columns + standard_columns + categorical_columns
print(f"Feature groups:")
print(f"  MinMax scaled: {len(minmax_columns)} features")
print(f"  Standard scaled: {len(standard_columns)} features")
print(f"  Categorical (one-hot): {len(categorical_columns)} features")

In [None]:
# Build the preprocessor
preprocessor = ColumnTransformer(
    transformers=[
        ('minmax', MinMaxScaler(), minmax_columns),
        ('standard', StandardScaler(), standard_columns),
        ('onehot', OneHotEncoder(handle_unknown='ignore', 
                                  drop=['Female', 'Other', 'Unknown'], 
                                  sparse_output=False), categorical_columns)
    ],
    remainder='drop'
)

print("Preprocessor configured.")

In [None]:
# Prepare the data
X = df_training[all_features]
y = df_training['SEM_3_STATUS']

# Fit preprocessor and transform
X_processed = preprocessor.fit_transform(X)

# Get feature names after preprocessing
onehot_features = list(preprocessor.transformers_[2][1].get_feature_names_out(categorical_columns))
feature_names = minmax_columns + standard_columns + onehot_features

print(f"Original features: {X.shape[1]}")
print(f"After preprocessing: {X_processed.shape[1]}")
print(f"\nFeature names after preprocessing:")
for i, name in enumerate(feature_names):
    print(f"  {i+1}. {name}")

In [None]:
# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(
    X_processed, y, 
    test_size=0.2, 
    random_state=42, 
    stratify=y
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Validation set: {X_val.shape[0]} samples")
print(f"\nInput shape for neural network: {X_train.shape[1]} features")

In [None]:
# Store the input dimension for model building
input_dim = X_train.shape[1]
print(f"Input dimension: {input_dim}")

### 5.2 Model 1: Simple Neural Network

Let's start with a simple architecture: one hidden layer.

In [None]:
# Model 1: Simple Neural Network
# Architecture: Input -> 8 neurons (ReLU) -> Output (Sigmoid)

def create_simple_model(input_dim):
    """
    Create a simple neural network with one hidden layer.
    
    Parameters:
    -----------
    input_dim : int
        Number of input features
        
    Returns:
    --------
    model : keras.Sequential
        Compiled neural network model
    """
    model = Sequential([
        # Input layer
        Input(shape=(input_dim,), name='input_layer'),
        
        # Hidden layer 1
        Dense(8, activation='relu', name='hidden_layer_1'),
        
        # Output layer (binary classification)
        Dense(1, activation='sigmoid', name='output_layer')
    ], name='simple_nn')
    
    return model

# Create the model
model_simple = create_simple_model(input_dim)

# Display summary
print("Model 1: Simple Neural Network")
print("="*60)
model_simple.summary()

### 5.3 Model 2: Deeper Neural Network

Now let's add more hidden layers to capture more complex patterns.

In [None]:
# Model 2: Deeper Neural Network
# Architecture: Input -> 16 -> 8 -> 4 -> Output

def create_deep_model(input_dim):
    """
    Create a deeper neural network with multiple hidden layers.
    Uses a "funnel" architecture (progressively fewer neurons).
    
    Parameters:
    -----------
    input_dim : int
        Number of input features
        
    Returns:
    --------
    model : keras.Sequential
        Compiled neural network model
    """
    model = Sequential([
        # Input layer
        Input(shape=(input_dim,), name='input_layer'),
        
        # Hidden layer 1 - widest
        Dense(16, activation='relu', name='hidden_layer_1'),
        
        # Hidden layer 2 - narrower
        Dense(8, activation='relu', name='hidden_layer_2'),
        
        # Hidden layer 3 - narrowest
        Dense(4, activation='relu', name='hidden_layer_3'),
        
        # Output layer
        Dense(1, activation='sigmoid', name='output_layer')
    ], name='deep_nn')
    
    return model

# Create the model
model_deep = create_deep_model(input_dim)

# Display summary
print("Model 2: Deep Neural Network")
print("="*60)
model_deep.summary()

### 5.4 Model 3: Wide Neural Network

An alternative approach: fewer layers but more neurons per layer.

In [None]:
# Model 3: Wide Neural Network
# Architecture: Input -> 32 -> 16 -> Output

def create_wide_model(input_dim):
    """
    Create a wide neural network with more neurons per layer.
    
    Parameters:
    -----------
    input_dim : int
        Number of input features
        
    Returns:
    --------
    model : keras.Sequential
        Compiled neural network model
    """
    model = Sequential([
        # Input layer
        Input(shape=(input_dim,), name='input_layer'),
        
        # Hidden layer 1 - wide
        Dense(32, activation='relu', name='hidden_layer_1'),
        
        # Hidden layer 2
        Dense(16, activation='relu', name='hidden_layer_2'),
        
        # Output layer
        Dense(1, activation='sigmoid', name='output_layer')
    ], name='wide_nn')
    
    return model

# Create the model
model_wide = create_wide_model(input_dim)

# Display summary
print("Model 3: Wide Neural Network")
print("="*60)
model_wide.summary()

In [None]:
# Compare model architectures
models = {
    'Simple NN': model_simple,
    'Deep NN': model_deep,
    'Wide NN': model_wide
}

comparison_data = []
for name, model in models.items():
    comparison_data.append({
        'Model': name,
        'Hidden Layers': len(model.layers) - 1,  # Excluding output
        'Total Parameters': model.count_params(),
        'Architecture': ' -> '.join([str(l.units) for l in model.layers if hasattr(l, 'units')])
    })

comparison_df = pd.DataFrame(comparison_data)
print("Model Architecture Comparison:")
comparison_df

## 6. Compiling Models

Before training, we must **compile** the model by specifying:
1. **Loss function**: What to minimize
2. **Optimizer**: How to update weights
3. **Metrics**: What to track during training

### 6.1 Loss Functions

The loss function measures prediction error. For classification:

| Problem Type | Loss Function | Keras Name |
|:-------------|:--------------|:-----------|
| Binary Classification | Binary Cross-Entropy | `'binary_crossentropy'` |
| Multi-class (one-hot) | Categorical Cross-Entropy | `'categorical_crossentropy'` |
| Multi-class (integers) | Sparse Categorical Cross-Entropy | `'sparse_categorical_crossentropy'` |
| Regression | Mean Squared Error | `'mse'` |

In [None]:
# Demonstrate loss functions
print("Loss Function for Student Departure Prediction:")
print("="*60)
print("\nProblem type: Binary classification (departed vs. retained)")
print("Output activation: Sigmoid (probability 0-1)")
print("Loss function: Binary Cross-Entropy")
print("\nKeras code: loss='binary_crossentropy'")

### 6.2 Optimizers

Optimizers control how weights are updated. Common choices:

| Optimizer | Description | When to Use |
|:----------|:------------|:------------|
| `SGD` | Stochastic Gradient Descent | Simple, requires tuning learning rate |
| `Adam` | Adaptive Moment Estimation | Default choice, works well out-of-box |
| `RMSprop` | Root Mean Square Propagation | Good for recurrent networks |
| `Adagrad` | Adaptive Gradient | Good for sparse data |

**Adam** is our default choice - it adapts learning rates automatically.

In [None]:
# Demonstrate optimizer configuration
from tensorflow.keras.optimizers import Adam, SGD

# Adam with default learning rate (0.001)
optimizer_adam = Adam()
print(f"Adam optimizer:")
print(f"  Learning rate: {optimizer_adam.learning_rate.numpy()}")

# Adam with custom learning rate
optimizer_adam_custom = Adam(learning_rate=0.0005)
print(f"\nAdam with custom learning rate:")
print(f"  Learning rate: {optimizer_adam_custom.learning_rate.numpy()}")

# SGD with momentum
optimizer_sgd = SGD(learning_rate=0.01, momentum=0.9)
print(f"\nSGD with momentum:")
print(f"  Learning rate: {optimizer_sgd.learning_rate.numpy()}")
print(f"  Momentum: {optimizer_sgd.momentum.numpy()}")

### 6.3 Metrics

Metrics are tracked during training but don't affect learning (unlike loss).

Common metrics for classification:
- `'accuracy'`: Overall correct predictions
- `'precision'`: True positives / (True positives + False positives)
- `'recall'`: True positives / (True positives + False negatives)
- `'AUC'`: Area Under ROC Curve

In [None]:
# Compile all models
def compile_model(model, learning_rate=0.001):
    """
    Compile a Keras model for binary classification.
    
    Parameters:
    -----------
    model : keras.Sequential
        The model to compile
    learning_rate : float
        Learning rate for Adam optimizer
        
    Returns:
    --------
    model : keras.Sequential
        Compiled model
    """
    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            tf.keras.metrics.Precision(name='precision'),
            tf.keras.metrics.Recall(name='recall'),
            tf.keras.metrics.AUC(name='auc')
        ]
    )
    return model

# Compile all models
model_simple = compile_model(create_simple_model(input_dim))
model_deep = compile_model(create_deep_model(input_dim))
model_wide = compile_model(create_wide_model(input_dim))

print("All models compiled successfully!")
print("\nCompilation settings:")
print("  Optimizer: Adam (learning_rate=0.001)")
print("  Loss: binary_crossentropy")
print("  Metrics: accuracy, precision, recall, AUC")

In [None]:
# View compiled model configuration
print("Compiled Model Configuration (Simple NN):")
print("="*60)
print(f"\nOptimizer: {model_simple.optimizer.__class__.__name__}")
print(f"Learning rate: {model_simple.optimizer.learning_rate.numpy()}")
print(f"Loss function: {model_simple.loss}")
print(f"Metrics: {[m.name for m in model_simple.metrics]}")

## 7. Visualize Model Architectures

In [None]:
def visualize_architecture(model, title='Neural Network Architecture'):
    """
    Create a visual representation of a neural network architecture.
    
    Parameters:
    -----------
    model : keras.Model
        The model to visualize
    title : str
        Title for the plot
    """
    fig = go.Figure()
    
    # Extract layer information
    layers_info = []
    for layer in model.layers:
        if hasattr(layer, 'units'):
            layers_info.append({
                'name': layer.name,
                'units': layer.units,
                'activation': layer.activation.__name__ if hasattr(layer, 'activation') else 'none'
            })
    
    # Also include input shape
    input_shape = model.input_shape[1]
    all_layers = [{'name': 'Input', 'units': input_shape, 'activation': 'none'}] + layers_info
    
    n_layers = len(all_layers)
    max_units = max([l['units'] for l in all_layers])
    
    # Colors
    colors = ['lightblue'] + ['lightgreen'] * (n_layers - 2) + ['lightyellow']
    border_colors = ['darkblue'] + ['darkgreen'] * (n_layers - 2) + ['orange']
    
    # Draw connections and neurons
    for layer_idx, layer_info in enumerate(all_layers):
        n_neurons = min(layer_info['units'], 10)  # Cap display at 10
        actual_neurons = layer_info['units']
        x = layer_idx * 2
        
        # Center neurons
        start_y = (10 - n_neurons) / 2
        
        # Draw connections to next layer
        if layer_idx < n_layers - 1:
            next_neurons = min(all_layers[layer_idx + 1]['units'], 10)
            next_start_y = (10 - next_neurons) / 2
            
            for i in range(n_neurons):
                for j in range(next_neurons):
                    fig.add_trace(go.Scatter(
                        x=[x, x + 2],
                        y=[start_y + i, next_start_y + j],
                        mode='lines',
                        line=dict(color='lightgray', width=0.3),
                        showlegend=False,
                        hoverinfo='skip'
                    ))
        
        # Draw neurons
        for neuron_idx in range(n_neurons):
            y = start_y + neuron_idx
            fig.add_trace(go.Scatter(
                x=[x], y=[y],
                mode='markers',
                marker=dict(
                    size=20,
                    color=colors[layer_idx],
                    line=dict(width=2, color=border_colors[layer_idx])
                ),
                showlegend=False,
                hoverinfo='skip'
            ))
        
        # Add "..." if more neurons exist
        if actual_neurons > 10:
            fig.add_annotation(
                x=x, y=start_y + n_neurons,
                text=f'...({actual_neurons} total)',
                showarrow=False,
                font=dict(size=10)
            )
        
        # Add layer label
        activation_text = f"({layer_info['activation']})" if layer_info['activation'] != 'none' else ''
        fig.add_annotation(
            x=x, y=-1.5,
            text=f"{layer_info['name']}<br>{actual_neurons} units {activation_text}",
            showarrow=False,
            font=dict(size=9)
        )
    
    fig.update_layout(
        title=title,
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-3, 12]),
        height=500,
        plot_bgcolor='white'
    )
    
    return fig

# Visualize all models
fig = visualize_architecture(model_simple, 'Simple Neural Network Architecture')
fig.show()

In [None]:
fig = visualize_architecture(model_deep, 'Deep Neural Network Architecture')
fig.show()

In [None]:
fig = visualize_architecture(model_wide, 'Wide Neural Network Architecture')
fig.show()

In [None]:
# Compare architectures visually
models_comparison = {
    'Simple NN': model_simple,
    'Deep NN': model_deep,
    'Wide NN': model_wide
}

fig = go.Figure()

# Plot parameter counts
names = list(models_comparison.keys())
params = [m.count_params() for m in models_comparison.values()]

fig.add_trace(go.Bar(
    x=names,
    y=params,
    marker_color=['steelblue', 'darkgreen', 'darkorange'],
    text=params,
    textposition='outside'
))

fig.update_layout(
    title='Model Complexity Comparison: Number of Parameters',
    xaxis_title='Model',
    yaxis_title='Total Parameters',
    height=400
)

fig.show()

## 8. Save Models for Training

In [None]:
# Create directory for module 5 models
import os
models_path = f'{module5_filepath}models/'
os.makedirs(models_path, exist_ok=True)

# Save preprocessor
pickle.dump(preprocessor, open(f'{models_path}preprocessor.pkl', 'wb'))
print(f"Saved preprocessor to: {models_path}preprocessor.pkl")

# Save feature information
feature_info = {
    'minmax_columns': minmax_columns,
    'standard_columns': standard_columns,
    'categorical_columns': categorical_columns,
    'all_features': all_features,
    'feature_names_processed': feature_names,
    'input_dim': input_dim
}
pickle.dump(feature_info, open(f'{models_path}feature_info.pkl', 'wb'))
print(f"Saved feature info to: {models_path}feature_info.pkl")

In [None]:
# Save model creation functions (for later training)
# In Keras, we typically save weights after training, not the model object
# For now, let's save the model configurations

model_configs = {
    'simple_nn': {
        'function': 'create_simple_model',
        'layers': [8, 1],
        'description': 'Single hidden layer (8 neurons)'
    },
    'deep_nn': {
        'function': 'create_deep_model',
        'layers': [16, 8, 4, 1],
        'description': 'Three hidden layers (16, 8, 4 neurons)'
    },
    'wide_nn': {
        'function': 'create_wide_model',
        'layers': [32, 16, 1],
        'description': 'Two hidden layers (32, 16 neurons)'
    }
}

pickle.dump(model_configs, open(f'{models_path}model_configs.pkl', 'wb'))
print(f"Saved model configs to: {models_path}model_configs.pkl")

In [None]:
# Save the processed data splits for training
data_splits = {
    'X_train': X_train,
    'X_val': X_val,
    'y_train': y_train.values,
    'y_val': y_val.values
}

pickle.dump(data_splits, open(f'{models_path}data_splits.pkl', 'wb'))
print(f"Saved data splits to: {models_path}data_splits.pkl")
print(f"\nData shapes:")
print(f"  X_train: {X_train.shape}")
print(f"  X_val: {X_val.shape}")
print(f"  y_train: {y_train.shape}")
print(f"  y_val: {y_val.shape}")

In [None]:
# Summary of saved files
print("Files saved for training:")
print("="*60)
for file in os.listdir(models_path):
    filepath = f'{models_path}{file}'
    size = os.path.getsize(filepath)
    print(f"  {file}: {size/1024:.1f} KB")

## 9. Summary

In this notebook, we built three neural network architectures for predicting student departure using TensorFlow and Keras.

### Models Built

| Model | Architecture | Parameters | Description |
|:------|:-------------|:-----------|:------------|
| **Simple NN** | Input -> 8 -> 1 | ~100 | Single hidden layer |
| **Deep NN** | Input -> 16 -> 8 -> 4 -> 1 | ~300 | Multiple hidden layers (funnel) |
| **Wide NN** | Input -> 32 -> 16 -> 1 | ~700 | Fewer but wider layers |

### Key Keras Concepts

| Concept | Description | Our Choice |
|:--------|:------------|:-----------|
| **Sequential API** | Linear stack of layers | Used for all models |
| **Dense Layer** | Fully connected neurons | Main building block |
| **Activation** | Non-linear transformation | ReLU (hidden), Sigmoid (output) |
| **Loss Function** | What to minimize | Binary cross-entropy |
| **Optimizer** | How to update weights | Adam (default lr=0.001) |
| **Metrics** | What to track | Accuracy, Precision, Recall, AUC |

### Model Building Steps

1. **Define architecture**: Choose layers and neurons
2. **Add layers**: Stack Dense layers with activations
3. **Compile**: Specify optimizer, loss, and metrics
4. **Ready for training!**

### Next Steps

In the next notebook, we will train these models and learn about:
- Epochs and batch sizes
- Callbacks for monitoring training
- Early stopping to prevent overfitting
- Visualizing training history (loss curves)

**Proceed to:** `5.3 Train Neural Networks`