# 6.2 Special Topics: Introduction to Neural Networks

## Course 3: Advanced Classification Models for Student Success

## Introduction

**Neural networks** are a family of algorithms inspired by biological neurons. They are the foundation of deep learning and have achieved remarkable success in areas like image recognition, natural language processing, and game playing.

For **tabular data** (like student records), neural networks are generally **not the first choice**—tree-based models typically perform as well or better with less effort. However, understanding neural networks is valuable for:

1. Completeness of your ML knowledge
2. Understanding deep learning hype and limitations
3. Scenarios with very large datasets and complex interactions
4. Transfer learning and embedding approaches

### Learning Objectives

1. Understand the basic structure of a neural network
2. Build a simple neural network using scikit-learn's MLPClassifier
3. Compare neural network performance to tree-based models
4. Know when neural networks are and aren't appropriate

## 1. Neural Network Basics

A neural network consists of:
- **Input layer**: One neuron per feature
- **Hidden layers**: Neurons that transform inputs through weighted sums and activation functions
- **Output layer**: Produces the final prediction

Each neuron computes: $output = activation(\sum(weights \times inputs) + bias)$

### Common Activation Functions
- **ReLU**: $f(x) = max(0, x)$ — most common for hidden layers
- **Sigmoid**: $f(x) = 1/(1+e^{-x})$ — used for binary classification output
- **Softmax**: Used for multi-class output

In [None]:
# Build a neural network using scikit-learn
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, classification_report
import warnings
warnings.filterwarnings('ignore')

# Load and prepare data
train_df = pd.read_csv('../../data/training.csv')
test_df = pd.read_csv('../../data/testing.csv')
train_df['DEPARTED'] = (train_df['SEM_3_STATUS'] != 'E').astype(int)
test_df['DEPARTED'] = (test_df['SEM_3_STATUS'] != 'E').astype(int)

numeric_features = ['HS_GPA','HS_MATH_GPA','HS_ENGL_GPA','UNITS_ATTEMPTED_1','UNITS_ATTEMPTED_2',
    'UNITS_COMPLETED_1','UNITS_COMPLETED_2','DFW_UNITS_1','DFW_UNITS_2','GPA_1','GPA_2',
    'DFW_RATE_1','DFW_RATE_2','GRADE_POINTS_1','GRADE_POINTS_2']
categorical_features = ['RACE_ETHNICITY','GENDER','FIRST_GEN_STATUS','COLLEGE']

train_enc = pd.get_dummies(train_df[numeric_features + categorical_features],
                           columns=categorical_features, drop_first=True)
test_enc = pd.get_dummies(test_df[numeric_features + categorical_features],
                          columns=categorical_features, drop_first=True)
train_enc, test_enc = train_enc.align(test_enc, join='left', axis=1, fill_value=0)
train_enc = train_enc.fillna(train_enc.median())
test_enc = test_enc.fillna(test_enc.median())

X_train, y_train = train_enc, train_df['DEPARTED']
X_test, y_test = test_enc, test_df['DEPARTED']

# IMPORTANT: Neural networks REQUIRE feature scaling (unlike tree-based models)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Build neural network — still follows instantiate/fit/predict!
nn = MLPClassifier(
    hidden_layer_sizes=(64, 32, 16),  # Three hidden layers
    activation='relu',
    solver='adam',
    alpha=0.001,  # L2 regularization
    batch_size=32,
    learning_rate='adaptive',
    max_iter=500,
    early_stopping=True,
    validation_fraction=0.1,
    random_state=42
)

nn.fit(X_train_scaled, y_train)
nn_prob = nn.predict_proba(X_test_scaled)[:, 1]

print(f"Neural Network ROC-AUC: {roc_auc_score(y_test, nn_prob):.4f}")
print(f"\nNote: Neural networks require scaled features!")
print(f"Note: They also take longer to train and have more hyperparameters.")

## 2. Neural Networks vs. Tree-Based Models for Tabular Data

| Aspect | Neural Networks | Tree-Based Models |
|:-------|:---------------|:-----------------|
| **Preprocessing** | Requires scaling | None needed |
| **Interpretability** | Black box | Moderate to high |
| **Tuning effort** | High (many hyperparameters) | Moderate |
| **Performance on tabular data** | Good, sometimes great | Typically best |
| **Training speed** | Slow | Moderate to fast |
| **Sample efficiency** | Needs more data | Works with less |

### The Practical Reality

For most tabular data problems (including student analytics), tree-based models are preferred because:
1. They perform at least as well
2. They require less preprocessing
3. They're easier to tune
4. They provide feature importances natively

Neural networks shine in other domains: images (CNNs), text (Transformers), sequences (RNNs/LSTMs).

## 3. Summary

- Neural networks follow the same scikit-learn pattern: `MLPClassifier().fit().predict()`
- They **require feature scaling** (unlike tree models)
- For tabular student data, they rarely outperform Random Forest or XGBoost
- They're worth understanding for completeness and for non-tabular applications
- Deep learning (TensorFlow, PyTorch) extends beyond scikit-learn for larger-scale work

### When to Use Neural Networks in Higher Education
- Very large datasets (100K+ students)
- Unstructured data (text from student surveys, images)
- Research projects exploring cutting-edge methods
- When combined with other data types (multimodal learning)