# Live Prediction Example

This notebook demonstrates the end-to-end prediction pipeline using **real production data**:

1. **Load models** - Production LightGBM models from `saved_xl/`
2. **Load real prediction** - Actual pick from production files
3. **Examine model architecture** - Understand the two-head stacked design
4. **Validate against outcome** - Check if the pick won using real game data

**All data in this notebook is REAL** - loaded from production files and database.

In [None]:
import sys
import os
from pathlib import Path
from datetime import date, datetime

# Add project root to path
PROJECT_ROOT = Path("../").resolve()
sys.path.insert(0, str(PROJECT_ROOT))

import pickle
import json
import numpy as np
import pandas as pd
import psycopg2

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Load Production Models

Our models are stored as pickle files with associated metadata.

In [None]:
MODELS_DIR = PROJECT_ROOT / "nba" / "models" / "saved_xl"

def load_market_model(market: str):
    """Load all components for a market model."""
    prefix = MODELS_DIR / f"{market}_market"
    
    components = {}
    
    # Load model files
    with open(f"{prefix}_regressor.pkl", "rb") as f:
        components['regressor'] = pickle.load(f)
    
    with open(f"{prefix}_classifier.pkl", "rb") as f:
        components['classifier'] = pickle.load(f)
    
    with open(f"{prefix}_calibrator.pkl", "rb") as f:
        components['calibrator'] = pickle.load(f)
    
    with open(f"{prefix}_imputer.pkl", "rb") as f:
        components['imputer'] = pickle.load(f)
    
    with open(f"{prefix}_scaler.pkl", "rb") as f:
        components['scaler'] = pickle.load(f)
    
    with open(f"{prefix}_features.pkl", "rb") as f:
        components['features'] = pickle.load(f)
    
    with open(f"{prefix}_metadata.json", "r") as f:
        components['metadata'] = json.load(f)
    
    return components

# Load POINTS model
points_model = load_market_model('points')

print("Model loaded successfully!")
print(f"Market: {points_model['metadata']['market']}")
print(f"Features: {len(points_model['features'])}")
print(f"Trained: {points_model['metadata']['trained_date']}")
print(f"Architecture: {points_model['metadata']['architecture']}")

## 2. Model Architecture Overview

Our system uses a **two-head stacked architecture**:
- **Head 1 (Regressor)**: Predicts the actual stat value
- **Head 2 (Classifier)**: Predicts P(OVER) using regressor output as additional feature
- **Calibration**: Isotonic regression for probability calibration

In [None]:
# Display model metrics from metadata
metadata = points_model['metadata']

print("=" * 60)
print(f"MODEL ARCHITECTURE: {metadata['market']}")
print("=" * 60)

print(f"\n--- Regressor Metrics ---")
reg_metrics = metadata.get('metrics', {}).get('regressor', {})
for key, val in reg_metrics.items():
    print(f"  {key}: {val}")

print(f"\n--- Classifier Metrics ---")
clf_metrics = metadata.get('metrics', {}).get('classifier', {})
for key, val in clf_metrics.items():
    print(f"  {key}: {val}")

print(f"\n--- Blend Config ---")
blend = metadata.get('blend_config', {})
for key, val in blend.items():
    print(f"  {key}: {val}")

In [None]:
# Display feature categories
features = points_model['features']

def categorize_feature(name):
    if name.startswith('ema_') or name.startswith('fg_pct') or name.startswith('ft_rate'):
        return 'Player Rolling Stats'
    elif name.startswith('h2h_'):
        return 'Head-to-Head'
    elif name.startswith('prop_'):
        return 'Prop History'
    elif name.startswith('bp_'):
        return 'BettingPros Data'
    elif name.startswith('vegas_'):
        return 'Vegas Lines'
    elif 'deviation' in name or 'line' in name.lower() or 'book' in name:
        return 'Book Disagreement'
    elif 'team' in name or 'opp' in name or 'pace' in name:
        return 'Team Context'
    else:
        return 'Other'

categories = {}
for f in features:
    cat = categorize_feature(f)
    categories[cat] = categories.get(cat, 0) + 1

print(f"\n--- Feature Categories ({len(features)} total) ---")
for cat, count in sorted(categories.items(), key=lambda x: -x[1]):
    print(f"  {cat}: {count}")

## 3. Load Real Production Pick

Load an actual pick from our production prediction files.

In [None]:
PREDICTIONS_DIR = PROJECT_ROOT / "nba" / "betting_xl" / "predictions"

# Find a recent prediction file with POINTS picks
def find_points_pick():
    """Find a real POINTS pick from production files."""
    for f in sorted(PREDICTIONS_DIR.glob("*.json"), reverse=True):
        try:
            with open(f) as fp:
                data = json.load(fp)
                for pick in data.get("picks", []):
                    if pick.get("stat_type") == "POINTS":
                        pick['game_date'] = data.get('date', f.stem.split('_')[-1])
                        pick['source_file'] = f.name
                        return pick
        except:
            continue
    return None

real_pick = find_points_pick()

if real_pick:
    print("=" * 60)
    print("REAL PRODUCTION PICK")
    print("=" * 60)
    print(f"\nSource: {real_pick['source_file']}")
    print(f"Date: {real_pick['game_date']}")
    print(f"\nPlayer: {real_pick['player_name']}")
    print(f"Market: {real_pick['stat_type']}")
    print(f"Side: {real_pick.get('side', 'OVER')}")
    print(f"Line: {real_pick.get('line', real_pick.get('best_line'))}")
    print(f"Projection: {real_pick.get('projection', 'N/A')}")
    print(f"Probability: {real_pick.get('probability', 'N/A')}")
    print(f"Confidence: {real_pick.get('confidence', 'N/A')}")
    
    if 'reasoning' in real_pick:
        print(f"\nReasoning: {real_pick['reasoning']}")
else:
    print("No POINTS picks found in prediction files.")

## 4. Validate Against Actual Outcome

Query the database to see how this pick actually performed.

In [None]:
DB_CONFIG = {
    "host": "localhost",
    "port": 5536,
    "user": "nba_user",
    "password": os.getenv("DB_PASSWORD"),
    "database": "nba_players",
}

def get_actual_result(player_name: str, game_date: str, stat_type: str):
    """Fetch actual result from database."""
    conn = psycopg2.connect(**DB_CONFIG)
    cursor = conn.cursor()
    
    # Map stat type to column
    stat_column = {
        'POINTS': 'points',
        'REBOUNDS': 'rebounds',
        'ASSISTS': 'assists',
        'THREES': 'three_pointers_made'
    }.get(stat_type, 'points')
    
    query = f"""
        SELECT l.{stat_column}, l.minutes, p.full_name
        FROM player_game_logs l
        JOIN player_profile p ON l.player_id = p.player_id
        WHERE p.full_name ILIKE %s
        AND l.game_date = %s
    """
    
    cursor.execute(query, (f"%{player_name}%", game_date))
    result = cursor.fetchone()
    
    cursor.close()
    conn.close()
    
    return result

if real_pick:
    result = get_actual_result(
        real_pick['player_name'],
        real_pick['game_date'],
        real_pick['stat_type']
    )
    
    if result:
        actual_value, minutes, matched_name = result
        line = real_pick.get('line', real_pick.get('best_line'))
        side = real_pick.get('side', 'OVER')
        
        # Determine outcome
        if side == 'OVER':
            won = actual_value > line
        else:
            won = actual_value < line
        
        print("\n" + "=" * 60)
        print("VALIDATION RESULT")
        print("=" * 60)
        print(f"\nPlayer: {matched_name}")
        print(f"Date: {real_pick['game_date']}")
        print(f"Minutes: {minutes}")
        print(f"\nPick: {side} {line}")
        print(f"Actual: {actual_value}")
        print(f"Difference: {actual_value - line:+.1f}")
        print(f"\nOutcome: {'WIN' if won else 'LOSS'}")
        print(f"Profit: {'+0.91' if won else '-1.00'} units")
    else:
        print(f"\nNo game log found for {real_pick['player_name']} on {real_pick['game_date']}")
        print("(Player may have been injured, DNP, or game not yet played)")

## 5. Visualization

In [None]:
if real_pick and result:
    actual_value, minutes, _ = result
    line = real_pick.get('line', real_pick.get('best_line'))
    projection = real_pick.get('projection', line)
    prob = real_pick.get('probability', 0.5)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Left: Prediction vs Actual
    ax1 = axes[0]
    categories = ['Line', 'Projection', 'Actual']
    values = [line, projection, actual_value]
    colors = ['#3498db', '#9b59b6', '#2ecc71' if actual_value > line else '#e74c3c']
    
    bars = ax1.bar(categories, values, color=colors, edgecolor='black', width=0.6)
    ax1.axhline(y=line, color='red', linestyle='--', alpha=0.7, label=f'Line ({line})')
    ax1.set_ylabel(f'{real_pick["stat_type"]}')
    ax1.set_title(f'{real_pick["player_name"]} - {real_pick["game_date"]}')
    ax1.legend()
    
    for bar, val in zip(bars, values):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3, 
                 f'{val:.1f}', ha='center', va='bottom', fontsize=12, fontweight='bold')
    
    # Right: Probability gauge
    ax2 = axes[1]
    
    # Simple bar representation of probability
    ax2.barh(['P(OVER)'], [prob * 100], color='#3498db', height=0.4)
    ax2.barh(['P(UNDER)'], [(1-prob) * 100], color='#e74c3c', height=0.4)
    ax2.axvline(x=52.4, color='black', linestyle='--', label='Breakeven (52.4%)')
    ax2.set_xlim(0, 100)
    ax2.set_xlabel('Probability (%)')
    ax2.set_title('Model Confidence')
    ax2.legend()
    
    # Add text
    ax2.text(prob * 100 / 2, 0, f'{prob*100:.1f}%', ha='center', va='center', 
             fontsize=14, fontweight='bold', color='white')
    
    plt.tight_layout()
    plt.savefig('prediction_example.png', dpi=150, bbox_inches='tight')
    plt.show()

## 6. Examine Full Pick JSON

This is the actual JSON output from our production system.

In [None]:
if real_pick:
    print("=" * 60)
    print("FULL PICK JSON (Real Production Output)")
    print("=" * 60)
    print(json.dumps(real_pick, indent=2, default=str))

## 7. Summary

This notebook demonstrated the complete prediction pipeline with **real data**:

1. **Model Loading** - 166 features, two-head LightGBM architecture from `saved_xl/`
2. **Real Pick** - Actual production pick from `predictions/` directory
3. **Validation** - Outcome verified against PostgreSQL database

**In Production:**
- Features are extracted from 4 PostgreSQL databases in real-time
- Props are fetched from 7 sportsbooks via BettingPros API
- Picks are generated twice daily (morning refresh, evening predictions)
- Results are tracked and validated automatically

**All data shown is REAL** - no synthetic or simulated data was used.