# 03 — Hyperparameter Optimization Analysis

Goals:
- Analyze outcomes of 5 key experiments from `logs/experiments.jsonl`
- Diagnose early stopping behavior (best epoch range 4-11)
- Recommend sweep ranges for `lr`, `dropout`, `batch_size`, `weight_decay`
- Assess regularization ideas for this regression + online inference setup

Artifacts are in `notebooks/artifacts/03_hp_optimization/`.

In [None]:
from pathlib import Path
import json
import pandas as pd
import matplotlib.pyplot as plt

ROOT = Path('..') if Path.cwd().name == 'notebooks' else Path('.')
ART = ROOT / 'notebooks' / 'artifacts' / '03_hp_optimization'
ART.mkdir(parents=True, exist_ok=True)

## 1) Build or refresh optimization artifacts

In [None]:
# Uncomment to recompute analysis artifacts
# import subprocess, sys
# subprocess.run([sys.executable, str(ROOT / 'notebooks' / 'run_03_hp_optimization.py')], check=True)

## 2) Experiment summary (5 runs)

In [None]:
best_runs = pd.read_csv(ART / 'experiment_comparison_best_runs.csv')
all_runs = pd.read_csv(ART / 'experiment_comparison_all_runs.csv')

display(best_runs[['config', 'model', 'val_score_avg', 'val_score_t0', 'val_score_t1', 'best_epoch', 'epochs_trained', 'lr', 'dropout', 'batch_size', 'weight_decay']])

print('All runs (including repeated gru_derived_v1):')
display(all_runs[['config', 'val_score_avg', 'best_epoch', 'epochs_trained', 'timestamp']])

## 3) Stopping dynamics (curve proxy)

Per-epoch train/val history is not persisted for these runs, so this notebook uses `best_epoch`, `epochs_trained`, and score outcomes as curve proxies.

In [None]:
stopping = pd.read_csv(ART / 'stopping_dynamics_summary.csv')
display(stopping)

img = plt.imread(ART / 'stopping_dynamics.png')
plt.figure(figsize=(14, 5))
plt.imshow(img)
plt.axis('off')
plt.title('Stopping dynamics overview')
plt.show()

In [None]:
note = Path(ART / 'curve_data_availability.txt').read_text(encoding='utf-8')
print(note)

## 4) Recommended hyperparameter sweep ranges

In [None]:
ranges = json.load(open(ART / 'recommended_hp_ranges.json', 'r', encoding='utf-8'))
ranges

In [None]:
sensitivity = pd.read_csv(ART / 'hyperparam_sensitivity_coarse.csv')
reg = pd.read_csv(ART / 'regularization_recommendations.csv')

display(sensitivity)
display(reg)

## 5) Practical interpretation

- Early peaks at epochs 4-11 are consistent across all tested configs, including weaker ones.
- This pattern is expected for this dataset size and objective, and indicates quick fitting plus quick overfit.
- Keep early stopping, constrain epoch budget, and focus on regularization + tighter HP ranges rather than larger models.
- Feature dropout / modestly stronger regularization is more promising than label smoothing.