# üìä QEPC NBA Backtest Notebook

This notebook measures how well the QEPC NBA engine is performing over time.

It focuses on:
- üóìÔ∏è Defining a backtest date range
- üßÆ Running `run_season_backtest(...)` to simulate all games in that range
- üìà Summarizing accuracy & spread error
- üß® Inspecting the worst misses
- üßπ Filtering to NBA‚Äìvs‚ÄìNBA only (no exhibitions)
- üåå (Optional) Script / total-error analysis


------

## üß© 1. Environment & Setup


In [1]:
# Universal QEPC header for this notebook

import os
import sys
from pathlib import Path

try:
    from notebook_context import *  # try direct import first
except ModuleNotFoundError:
    cwd = Path.cwd()
    candidate_roots = [cwd, cwd.parent, cwd.parent.parent]

    found_root = None
    for root in candidate_roots:
        if (root / "notebook_context.py").exists():
            found_root = root
            break

    if found_root is None:
        raise ModuleNotFoundError(
            f"Could not find notebook_context.py from {cwd}. "
            "Try opening this notebook from inside your qepc_project folder."
        )

    sys.path.insert(0, str(found_root))
    os.chdir(found_root)
    from notebook_context import *

# Fallback for project_root if notebook_context didn't define it
try:
    project_root
except NameError:
    project_root = Path.cwd()

print("Project root:", project_root)


[QEPC Paths] Project Root set: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project
[QEPC] Autoload complete.
[QEPC] Root Shim Restored. Forwarding to qepc.autoload...
Project root: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project


-----

## üìÖ 2. Select Backtest Range


In [4]:
# 2. Backtest Date Range & Filters (Custom start ‚Üí today, no widgets)

import pandas as pd
from datetime import date

# üëá Edit this line when you want a different start date
BACKTEST_START_DATE = date(2025, 11, 24)

# End date is always "today"
BACKTEST_END_DATE = date.today()

print("Backtest date range:",
      BACKTEST_START_DATE.isoformat(), "‚Üí", BACKTEST_END_DATE.isoformat())


Backtest date range: 2025-11-24 ‚Üí 2025-11-27


-----

## üßÆ 3. Initiate Backtest


In [5]:
# 3. Run Backtest for Selected Range

from qepc.backtest.backtest_engine import run_season_backtest

# Convert date objects to ISO format strings (required by the engine)
start_date_str = BACKTEST_START_DATE.isoformat()
end_date_str = BACKTEST_END_DATE.isoformat()

print(f"üöÄ Running QEPC backtest from {start_date_str} to {end_date_str}\n")

# The engine handles everything internally:
#   - Loads game results from TeamStatistics.csv
#   - Calculates team strengths for each day (time-travel)
#   - Runs simulations and scores predictions
backtest_long = run_season_backtest(start_date_str, end_date_str)

print(f"\nüìä Games simulated: {len(backtest_long)}")
display(backtest_long.head(10))

üöÄ Running QEPC backtest from 2025-11-24 to 2025-11-27

üöÄ STARTING LONG-RANGE BACKTEST (2025-11-24 to 2025-11-27)
Processing... (This will update in place)
‚è≥ Processing Day 4/4: 2025-11-27
‚ùå No games found in this date range.

üìä Games simulated: 0


----

## üèÅ 4. Global Summary (All Games in Range)


In [None]:
# 4. Global Summary (All Games in Selected Range)

if "backtest_long" not in globals() or backtest_long.empty:
    raise RuntimeError("Run Cell 3 (Run Backtest) first!")

total_games = len(backtest_long)
accuracy_pct = backtest_long["Correct_Pick"].mean() * 100
spread_mae = backtest_long["Spread_Error"].abs().mean()

print("=" * 50)
print("üèÜ QEPC BACKTEST SUMMARY")
print("=" * 50)
print(f"üìÖ Date Range: {BACKTEST_START_DATE} ‚Üí {BACKTEST_END_DATE}")
print(f"üèÄ Games Simulated: {total_games}")
print(f"‚úÖ Overall Accuracy: {accuracy_pct:.2f}%")
print(f"üéØ Avg Spread Error (MAE): {spread_mae:.2f} points")
print("=" * 50)

----

## üßπ 5. Clean NBA-Only View (Filter Out Exhibitions)


In [None]:
# 5. Clean NBA-Only View (Filter Out Exhibitions)

NBA_TEAMS = [
    "Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlotte Hornets",
    "Chicago Bulls", "Cleveland Cavaliers", "Dallas Mavericks", "Denver Nuggets",
    "Detroit Pistons", "Golden State Warriors", "Houston Rockets", "Indiana Pacers",
    "Los Angeles Clippers", "Los Angeles Lakers", "Memphis Grizzlies", "Miami Heat",
    "Milwaukee Bucks", "Minnesota Timberwolves", "New Orleans Pelicans", "New York Knicks",
    "Oklahoma City Thunder", "Orlando Magic", "Philadelphia 76ers", "Phoenix Suns",
    "Portland Trail Blazers", "Sacramento Kings", "San Antonio Spurs", "Toronto Raptors",
    "Utah Jazz", "Washington Wizards",
]

backtest_clean = backtest_long[
    backtest_long["Away Team"].isin(NBA_TEAMS)
    & backtest_long["Home Team"].isin(NBA_TEAMS)
].copy()

print(f"Original games: {len(backtest_long)}, After NBA-only filter: {len(backtest_clean)}")

if not backtest_clean.empty:
    acc_clean = backtest_clean["Correct_Pick"].mean() * 100
    mae_clean = backtest_clean["Spread_Error"].abs().mean()

    print("=" * 50)
    print("üèÜ CLEAN NBA-ONLY BACKTEST")
    print("=" * 50)
    print(f"üèÄ Games Simulated: {len(backtest_clean)}")
    print(f"‚úÖ Overall Accuracy: {acc_clean:.2f}%")
    print(f"üéØ Avg Spread Error: {mae_clean:.2f} points")
    print("=" * 50)
else:
    print("‚ö†Ô∏è No NBA-only games found in this range.")

-----

## üß® 6. Biggest Misses (Spread Error Analysis)


In [None]:
# 6. Biggest Misses (Spread Error Analysis)

# Use the cleaned NBA-only data if available, otherwise use all games
df_to_analyze = backtest_clean if not backtest_clean.empty else backtest_long

# Sort by absolute spread error (biggest misses first)
df_to_analyze = df_to_analyze.copy()
df_to_analyze["Abs_Spread_Error"] = df_to_analyze["Spread_Error"].abs()
biggest_misses = df_to_analyze.nlargest(10, "Abs_Spread_Error")

print("üß® TOP 10 BIGGEST MISSES (by Spread Error)")
print("=" * 60)
display(
    biggest_misses[
        ["Date", "Away Team", "Home Team", "Expected_Spread", "Actual_Spread", "Spread_Error"]
    ].reset_index(drop=True)
)

----

## üì¶ 7. Spread Error Buckets (How Often & How Bad)


In [None]:
# 7. Brier Score Analysis
# 
# Brier Score measures how well-calibrated your probabilities are.
# Lower is better. Perfect = 0.0, Random guessing = 0.25

df_brier = backtest_clean.copy() if not backtest_clean.empty else backtest_long.copy()

# For each game, compare predicted probability to actual outcome (0 or 1)
# Brier Score = mean((predicted_prob - actual_outcome)^2)

# Home win probability vs actual home win (1 if home won, 0 if away won)
df_brier["Actual_Home_Win"] = (df_brier["Actual_Home_Score"] > df_brier["Actual_Away_Score"]).astype(int)
df_brier["Brier_Score"] = (df_brier["Home_Win_Prob"] - df_brier["Actual_Home_Win"]) ** 2

avg_brier = df_brier["Brier_Score"].mean()

print("=" * 50)
print("üìà BRIER SCORE ANALYSIS")
print("=" * 50)
print(f"Average Brier Score: {avg_brier:.4f}")
print()
print("Interpretation:")
print("  ‚Ä¢ 0.00 = Perfect predictions")
print("  ‚Ä¢ 0.25 = Random coin flip")
print("  ‚Ä¢ Lower is better!")
print()

if avg_brier < 0.20:
    print("‚úÖ Excellent! Your model is well-calibrated.")
elif avg_brier < 0.22:
    print("üëç Good. Your model beats random chance significantly.")
elif avg_brier < 0.25:
    print("‚ö†Ô∏è Okay, but there's room for improvement.")
else:
    print("‚ùå Model is performing at or below random chance.")

print("=" * 50)

----

## üåå 8. Optional: Total Error & Script Classification (GRIND / BASE / CHAOS)


-----

In [None]:
# 8. Optional: Total Error & Script Classification (GRIND / BASE / CHAOS)

df = backtest_clean.copy()

# 8.1 Compute simulated and actual totals
df["Sim_Total"] = df["Sim_Home_Score"] + df["Sim_Away_Score"]
df["Actual_Total"] = df["Actual_Home_Score"] + df["Actual_Away_Score"]

# Drop games where we effectively didn't simulate (Sim_Total == 0)
before = len(df)
df = df[df["Sim_Total"] > 0]
after = len(df)
print(f"Dropped {before - after} games with Sim_Total == 0.")
print(f"Remaining games for script analysis: {after}")

# 8.2 Total error
df["Total_Error"] = df["Actual_Total"] - df["Sim_Total"]

# Thresholds for GRIND / BASE / CHAOS (can be tuned)
grind_thresh = -15   # 15+ pts under model total ‚Üí GRIND
chaos_thresh = 15    # 15+ pts over model total ‚Üí CHAOS

def classify_script(row):
    if row["Total_Error"] <= grind_thresh:
        return "GRIND"
    elif row["Total_Error"] >= chaos_thresh:
        return "CHAOS"
    else:
        return "BASE"

df["Script_ExPost"] = df.apply(classify_script, axis=1)

print("\nSample of script labels:")
display(
    df[["Date", "Away Team", "Home Team", "Sim_Total", "Actual_Total", "Total_Error", "Script_ExPost"]]
    .head()
)

# 8.3 Script-level summary
script_summary = df.groupby("Script_ExPost").agg(
    Games=("Total_Error", "count"),
    Avg_Total_Error=("Total_Error", "mean"),
    Avg_Abs_Total_Error=("Total_Error", lambda x: x.abs().mean()),
    Avg_Spread_Error=("Spread_Error", "mean"),
    Avg_Abs_Spread_Error=("Spread_Error", lambda x: x.abs().mean()),
)

script_summary["Percent"] = (script_summary["Games"] / len(df) * 100).round(2)

print("\nEx-post script distribution (based on totals):")
display(script_summary)


## Global Lambda Export

In [None]:
# === QEPC Global Lambda Calibration Export ===
# Set this based on your backtest experiments.
# For now you can leave it at 1.0; later you can change it to (for example) 0.97.

scale_factor = 1.0  # üëà tweak this number when you want to shrink/boost totals

import json
from pathlib import Path

calibration = {
    "global_lambda_scale": float(scale_factor),
}

calib_path = project_root / "data" / "qepc_calibration.json"
calib_path.parent.mkdir(parents=True, exist_ok=True)

with open(calib_path, "w") as f:
    json.dump(calibration, f, indent=2)

print("‚úÖ Saved calibration file to:", calib_path)
print("   Contents:", calibration)
