# Appendix B -- Keras 3 Companion
## *Python for AI/ML: A Complete Learning Journey*

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/APP_B_Keras3_Companion.ipynb)
&nbsp;&nbsp;[![Back to TOC](https://img.shields.io/badge/Back_to-Table_of_Contents-1B3A5C?style=flat-square)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/Python_for_AIML_TOC.ipynb)

---

> **Prerequisites:** Chapter 7 (Deep Learning with PyTorch)

This appendix rebuilds the Chapter 7 neural networks using **Keras 3**,
the multi-backend high-level API that runs on PyTorch, TensorFlow, or JAX.

The goal is not to replace Chapter 7 -- it is to show how the same concepts
map to a higher-level API. Once you understand the PyTorch training loop,
the Keras abstractions become transparent rather than magical.

**What you will build:**

- The same salary regression MLP from Chapter 7 in 20 lines of Keras
- The same Python usage classifier from Chapter 7 in Keras
- Side-by-side comparison: PyTorch explicit vs Keras concise
- When to use each: decision guide


---

## Setup


In [None]:
import subprocess
subprocess.run(['pip', 'install', 'keras', '-q'], check=False)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

import keras
from keras import layers, callbacks

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, accuracy_score, mean_absolute_error

print(f'Keras version: {keras.__version__}')
print(f'Keras backend: {keras.backend.backend()}')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.dpi'] = 110

DATASET_URL  = 'https://raw.githubusercontent.com/timothy-watt/python-for-ai-ml/main/data/so_survey_2025_curated.csv'
RANDOM_STATE = 42
keras.utils.set_random_seed(RANDOM_STATE)


In [None]:
# Load and prepare SO 2025 data -- same as Chapter 7
df_raw = pd.read_csv(DATASET_URL)
df = df_raw.copy()
df = df.dropna(subset=['ConvertedCompYearly'])
df['ConvertedCompYearly'] = pd.to_numeric(df['ConvertedCompYearly'], errors='coerce')
Q1, Q3 = df['ConvertedCompYearly'].quantile([0.25, 0.75])
IQR = Q3 - Q1
df = df[
    (df['ConvertedCompYearly'] >= max(Q1 - 3*IQR, 5_000)) &
    (df['ConvertedCompYearly'] <= min(Q3 + 3*IQR, 600_000))
].copy()
if 'YearsCodePro' in df.columns:
    df['YearsCodePro'] = pd.to_numeric(df['YearsCodePro'], errors='coerce')
    df['YearsCodePro'] = df['YearsCodePro'].fillna(df['YearsCodePro'].median())
df['uses_python'] = df.get('LanguageHaveWorkedWith', pd.Series(dtype=str)).str.contains('Python', na=False).astype(int)
df['uses_sql']    = df.get('LanguageHaveWorkedWith', pd.Series(dtype=str)).str.contains('SQL', na=False).astype(int)
df['uses_js']     = df.get('LanguageHaveWorkedWith', pd.Series(dtype=str)).str.contains('JavaScript', na=False).astype(int)
df['uses_ai']     = df.get('AIToolCurrently', pd.Series(dtype=str)).notna().astype(int)
df['log_salary']  = np.log(df['ConvertedCompYearly'])
df = df.reset_index(drop=True)

feature_cols = [c for c in ['YearsCodePro', 'uses_python', 'uses_sql', 'uses_js', 'uses_ai']
                if c in df.columns]
X_raw = df[feature_cols].copy()
for col in feature_cols:
    med = X_raw[col].median()
    X_raw[col] = X_raw[col].fillna(med if pd.notna(med) else 0)
y = df['log_salary'].values

X_tr, X_te, y_tr, y_te = train_test_split(X_raw.values, y, test_size=0.2, random_state=RANDOM_STATE)
scaler = StandardScaler()
X_tr_sc = scaler.fit_transform(X_tr)
X_te_sc = scaler.transform(X_te)

print(f'Dataset ready: {len(df):,} rows')
print(f'Train: {len(X_tr_sc):,}  Test: {len(X_te_sc):,}  Features: {len(feature_cols)}')


---

## B.1 -- The Same MLP in Keras: Side-by-Side Comparison

The table below maps every PyTorch concept from Chapter 7 to its Keras equivalent.
Read this before looking at the code -- it makes the mapping explicit.

| PyTorch (Chapter 7) | Keras (Appendix B) |
|---------------------|--------------------|
| `class MyNet(nn.Module)` | `keras.Sequential([...])` |
| `nn.Linear(in, out)` | `layers.Dense(out)` |
| `nn.ReLU()` | `layers.Activation('relu')` or `activation='relu'` |
| `nn.BatchNorm1d(n)` | `layers.BatchNormalization()` |
| `nn.Dropout(p)` | `layers.Dropout(p)` |
| `optim.AdamW(...)` | `keras.optimizers.AdamW(...)` |
| `nn.MSELoss()` | `loss='mse'` in `model.compile()` |
| Manual training loop | `model.fit(X, y, ...)` |
| `model.eval()` | Automatic during `model.predict()` |
| `torch.no_grad()` | Automatic during inference |
| `model.state_dict()` | `model.save_weights(path)` |

The key difference: Keras hides the training loop inside `model.fit()`.
This is more concise but less transparent. Chapter 7's explicit loop
gives you full control and understanding. Keras gives you speed of development.


In [None]:
# B.1.1 -- Salary regression MLP in Keras
# Compare this directly with the SalaryMLP class in Chapter 7

input_dim = X_tr_sc.shape[1]

# In Keras, the model is defined declaratively as a list of layers
# No explicit forward() method -- Keras infers it from the layer order
salary_model = keras.Sequential([
    # Input layer specifies the feature dimension
    layers.Input(shape=(input_dim,)),

    # Hidden block 1: Linear -> BatchNorm -> ReLU -> Dropout
    layers.Dense(128),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.3),

    # Hidden block 2
    layers.Dense(64),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.3),

    # Hidden block 3
    layers.Dense(32),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.3),

    # Output layer: single neuron, no activation (regression)
    layers.Dense(1),
])

# compile() sets the loss function and optimiser
# This replaces: criterion = nn.MSELoss() and optimizer = optim.AdamW(...)
salary_model.compile(
    optimizer=keras.optimizers.AdamW(learning_rate=1e-3, weight_decay=1e-4),
    loss='mse',
    metrics=['mae']
)

salary_model.summary()


In [None]:
# B.1.2 -- Train with model.fit()
# This replaces the entire manual training loop from Chapter 7

# Callbacks replace manual logic in the training loop:
# EarlyStopping = best-weight checkpointing + stopping
# ReduceLROnPlateau = learning rate scheduler
early_stop = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,   # equivalent to saving best_weights in Ch7
    verbose=0
)
reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.5, patience=5, verbose=0
)

history = salary_model.fit(
    X_tr_sc, y_tr,
    validation_data=(X_te_sc, y_te),
    epochs=60,
    batch_size=256,
    callbacks=[early_stop, reduce_lr],
    verbose=0    # suppress per-epoch output; we plot the history below
)

print(f'Trained for {len(history.history["loss"])} epochs '
      f'(early stopping at {early_stop.stopped_epoch or 60})')

# Evaluate
y_pred_log = salary_model.predict(X_te_sc, verbose=0).flatten()
y_pred_usd = np.exp(y_pred_log)
y_true_usd = np.exp(y_te)

r2  = r2_score(y_te, y_pred_log)
mae = mean_absolute_error(y_true_usd, y_pred_usd)
print(f'Test R^2:  {r2:.4f}')
print(f'Test MAE:  ${mae:,.0f}')

# Plot training history -- history.history is a dict of metric lists
fig, ax = plt.subplots(figsize=(9, 4))
ax.plot(history.history['loss'],     '#E8722A', linewidth=2, label='Train loss')
ax.plot(history.history['val_loss'], '#2E75B6', linewidth=2, label='Val loss')
ax.set_xlabel('Epoch')
ax.set_ylabel('MSE Loss')
ax.set_title(f'Keras Salary MLP Training Curves  (R^2={r2:.3f}, MAE=${mae/1000:.1f}k)')
ax.legend()
plt.tight_layout()
plt.show()


In [None]:
# B.1.3 -- Python usage classifier in Keras

clf_cols = [c for c in ['YearsCodePro', 'ConvertedCompYearly',
                         'uses_sql', 'uses_js', 'uses_ai'] if c in df.columns]
X_clf = df[clf_cols].copy()
for col in clf_cols:
    med = X_clf[col].median()
    X_clf[col] = X_clf[col].fillna(med if pd.notna(med) else 0)
y_clf = df['uses_python'].values.astype(np.float32)

X_tc, X_ec, y_tc, y_ec = train_test_split(
    X_clf.values, y_clf, test_size=0.2, random_state=RANDOM_STATE, stratify=y_clf
)
clf_sc  = StandardScaler()
X_tc_sc = clf_sc.fit_transform(X_tc)
X_ec_sc = clf_sc.transform(X_ec)

clf_model = keras.Sequential([
    layers.Input(shape=(X_tc_sc.shape[1],)),
    layers.Dense(64),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.2),
    layers.Dense(32),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dropout(0.2),
    # sigmoid output for binary classification
    # In PyTorch we used BCEWithLogitsLoss (raw logit + sigmoid in one op)
    # In Keras we use sigmoid activation + binary_crossentropy loss
    layers.Dense(1, activation='sigmoid'),
])

clf_model.compile(
    optimizer=keras.optimizers.AdamW(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

clf_history = clf_model.fit(
    X_tc_sc, y_tc,
    validation_data=(X_ec_sc, y_ec),
    epochs=50, batch_size=256,
    callbacks=[callbacks.EarlyStopping(monitor='val_loss', patience=8,
                                        restore_best_weights=True, verbose=0)],
    verbose=0
)

y_pred_clf = (clf_model.predict(X_ec_sc, verbose=0).flatten() >= 0.5).astype(int)
acc = accuracy_score(y_ec, y_pred_clf)
print(f'Classifier accuracy: {acc:.4f}  ({acc*100:.1f}%)')


---

## B.2 -- When to Use PyTorch vs Keras

Both frameworks are production-grade. The choice depends on your use case:

| Situation | Recommendation |
|-----------|----------------|
| Learning how neural networks work | **PyTorch** -- explicit loop teaches the mechanics |
| Rapid prototyping, standard architectures | **Keras** -- less boilerplate |
| Research, custom training procedures | **PyTorch** -- full control |
| Multi-backend (run on TF, PyTorch, or JAX) | **Keras 3** |
| Production deployment on TensorFlow Serving | **Keras on TF backend** |
| HuggingFace ecosystem, pre-trained transformers | **PyTorch** |
| Academic papers, reproducibility | **PyTorch** (dominant in research) |

The honest answer for most practitioners: **learn PyTorch first** (as this book does),
then use Keras when speed of development matters more than transparency.

---

*End of Appendix B -- Python for AI/ML*  
[![Back to TOC](https://img.shields.io/badge/Back_to-Table_of_Contents-1B3A5C?style=flat-square)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/Python_for_AIML_TOC.ipynb)
