# üéØ QEPC Player Props Processing & Predictions

**Purpose:** Turn 254k player-game records into accurate prop predictions

**What this does:**
1. üìä Calculate player averages (PPG, RPG, APG with variance)
2. üî• Detect hot/cold streaks (recent form)
3. üè† Analyze home/away splits
4. üé≤ Build confidence intervals for predictions
5. üí∞ Generate over/under predictions
6. üéØ Identify betting value

**Result:** Complete player props prediction system

---

## üîß Setup

In [8]:
# Setup
from pathlib import Path
import sys

# Try to import notebook_context
try:
    from notebook_context import *
    print("‚úÖ notebook_context loaded")
except ModuleNotFoundError:
    print("‚ÑπÔ∏è  notebook_context not found, setting up manually...")
    
    current = Path.cwd()
    project_root = None
    
    for parent in [current, current.parent, current.parent.parent, current.parent.parent.parent]:
        if (parent / "qepc").is_dir() or (parent / "main.py").exists() or (parent / "data").is_dir():
            project_root = parent
            print(f"   ‚úÖ Found project root: {project_root}")
            break
    
    if project_root is None:
        print(f"   ‚ö†Ô∏è  Using current directory: {current}")
        project_root = current
    
    if str(project_root) not in sys.path:
        sys.path.insert(0, str(project_root))

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

print(f"üìÅ Project root: {project_root}")
print("‚úÖ Imports complete")

‚úÖ notebook_context loaded
üìÅ Project root: C:\Users\wdors\qepc_project
‚úÖ Imports complete


---

## üì• Load Player Game Logs

In [12]:
print("="*60)
print("üì• LOADING PLAYER GAME LOGS")
print("="*60)
print()

# Exact path to your file
data_path = Path(r"C:\Users\wdors\qepc_project\notebooks\02_utilities\data\comprehensive\Player_Game_Logs_All_Seasons.csv")

if not data_path.exists():
    print(f"‚ùå File not found: {data_path}")
    raise FileNotFoundError("Player logs not found")

# Load data
print(f"üìÇ Loading from: {data_path}")
df = pd.read_csv(data_path)

print(f"\n‚úÖ Loaded {len(df):,} records")
print(f"   Players: {df['PLAYER_NAME'].nunique():,}")
print(f"   Seasons: {df['SEASON_YEAR'].nunique()}")
print(f"   Size: {data_path.stat().st_size / 1024 / 1024:.1f} MB")

# Parse game date
df['GAME_DATE'] = pd.to_datetime(df['GAME_DATE'], errors='coerce')

# Show available columns
print(f"\nüìä Available stats ({len(df.columns)} columns):")
key_cols = ['PTS', 'REB', 'AST', 'STL', 'BLK', 'FG3M', 'MIN', 'FG_PCT', 'FG3_PCT', 'FT_PCT']
available = [col for col in key_cols if col in df.columns]
print(f"   {', '.join(available)}")

# Preview
print(f"\nüîç Sample data:")
preview_cols = ['GAME_DATE', 'PLAYER_NAME', 'TEAM_ABBREVIATION'] + available[:5]
display(df[preview_cols].head(10))

üì• LOADING PLAYER GAME LOGS

üìÇ Loading from: C:\Users\wdors\qepc_project\notebooks\02_utilities\data\comprehensive\Player_Game_Logs_All_Seasons.csv

‚úÖ Loaded 254,187 records
   Players: 1,425
   Seasons: 10
   Size: 87.8 MB

üìä Available stats (71 columns):
   PTS, REB, AST, STL, BLK, FG3M, MIN, FG_PCT, FG3_PCT, FT_PCT

üîç Sample data:


Unnamed: 0,GAME_DATE,PLAYER_NAME,TEAM_ABBREVIATION,PTS,REB,AST,STL,BLK
0,2015-04-15,Michael Beasley,MIA,34,11,8,2,2
1,2015-04-15,Russell Westbrook,OKC,37,8,7,2,0
2,2015-04-15,Anthony Davis,NOP,31,13,2,2,3
3,2015-04-15,Marc Gasol,MEM,33,13,0,2,1
4,2015-04-15,Dion Waiters,OKC,33,4,1,3,1
5,2015-04-15,Kyle Lowry,TOR,26,4,7,3,0
6,2015-04-15,Zach LaVine,MIN,19,5,13,3,1
7,2015-04-15,Nikola Vuƒçeviƒá,ORL,26,11,5,1,1
8,2015-04-15,Reggie Jackson,DET,24,4,11,3,0
9,2015-04-15,Cole Aldrich,NYK,24,15,2,0,2


---

## 1Ô∏è‚É£ Calculate Player Season Averages

In [13]:
print("="*60)
print("1Ô∏è‚É£ CALCULATING PLAYER SEASON AVERAGES")
print("="*60)
print()

# Sort by player and date
df_sorted = df.sort_values(['PLAYER_ID', 'GAME_DATE']).copy()

# Calculate season averages with variance metrics
print("üîÑ Calculating statistics...")

season_stats = df_sorted.groupby(['PLAYER_ID', 'PLAYER_NAME', 'Season']).agg({
    # Points
    'PTS': ['mean', 'std', 'median', 'min', 'max', lambda x: x.quantile(0.25), lambda x: x.quantile(0.75)],
    
    # Rebounds
    'REB': ['mean', 'std', 'median', 'min', 'max', lambda x: x.quantile(0.25), lambda x: x.quantile(0.75)],
    
    # Assists
    'AST': ['mean', 'std', 'median', 'min', 'max', lambda x: x.quantile(0.25), lambda x: x.quantile(0.75)],
    
    # Other stats
    'STL': ['mean', 'std'],
    'BLK': ['mean', 'std'],
    'FG3M': ['mean', 'std'],
    'MIN': ['mean', 'std'],
    'TOV': ['mean', 'std'],
    
    # Percentages
    'FG_PCT': 'mean',
    'FG3_PCT': 'mean',
    'FT_PCT': 'mean',
    
    # Games played
    'GAME_ID': 'count',
    
    # Team
    'TEAM_ABBREVIATION': 'last'
}).reset_index()

# Flatten column names
season_stats.columns = ['_'.join(col).strip('_') if col[1] else col[0] 
                        for col in season_stats.columns.values]

# Rename for clarity
season_stats = season_stats.rename(columns={
    'GAME_ID_count': 'GP',
    'TEAM_ABBREVIATION_last': 'TEAM',
    
    # Points
    'PTS_mean': 'PPG',
    'PTS_std': 'PPG_STD',
    'PTS_median': 'PPG_MEDIAN',
    'PTS_min': 'PPG_MIN',
    'PTS_max': 'PPG_MAX',
    'PTS_<lambda_0>': 'PPG_Q25',
    'PTS_<lambda_1>': 'PPG_Q75',
    
    # Rebounds
    'REB_mean': 'RPG',
    'REB_std': 'RPG_STD',
    'REB_median': 'RPG_MEDIAN',
    'REB_min': 'RPG_MIN',
    'REB_max': 'RPG_MAX',
    'REB_<lambda_0>': 'RPG_Q25',
    'REB_<lambda_1>': 'RPG_Q75',
    
    # Assists
    'AST_mean': 'APG',
    'AST_std': 'APG_STD',
    'AST_median': 'APG_MEDIAN',
    'AST_min': 'APG_MIN',
    'AST_max': 'APG_MAX',
    'AST_<lambda_0>': 'APG_Q25',
    'AST_<lambda_1>': 'APG_Q75',
    
    # Others
    'STL_mean': 'SPG',
    'STL_std': 'SPG_STD',
    'BLK_mean': 'BPG',
    'BLK_std': 'BPG_STD',
    'FG3M_mean': '3PM',
    'FG3M_std': '3PM_STD',
    'MIN_mean': 'MPG',
    'MIN_std': 'MPG_STD',
    'TOV_mean': 'TPG',
    'TOV_std': 'TPG_STD',
})

# Calculate consistency scores (lower variance = more consistent)
season_stats['PTS_CONSISTENCY'] = 1 / (1 + season_stats['PPG_STD'])
season_stats['REB_CONSISTENCY'] = 1 / (1 + season_stats['RPG_STD'])
season_stats['AST_CONSISTENCY'] = 1 / (1 + season_stats['APG_STD'])

# Calculate coefficient of variation (relative variance)
season_stats['PTS_CV'] = season_stats['PPG_STD'] / season_stats['PPG']
season_stats['REB_CV'] = season_stats['RPG_STD'] / season_stats['RPG']
season_stats['AST_CV'] = season_stats['APG_STD'] / season_stats['APG']

print(f"‚úÖ Calculated season averages: {len(season_stats):,} player-seasons")
print(f"   Players: {season_stats['PLAYER_NAME'].nunique():,}")
print(f"   Seasons: {season_stats['Season'].nunique()}")

# Save
output_dir = project_root / "data" / "props"
output_dir.mkdir(parents=True, exist_ok=True)

averages_path = output_dir / "Player_Season_Averages.csv"
season_stats.to_csv(averages_path, index=False)
print(f"\nüíæ Saved: {averages_path}")

# Show top scorers
print(f"\nüèÄ Top 10 Scorers (2023-24):")
top_scorers = season_stats[season_stats['Season'] == '2023-24'].nlargest(10, 'PPG')
display(top_scorers[['PLAYER_NAME', 'TEAM', 'GP', 'PPG', 'PPG_STD', 'PTS_CONSISTENCY', 'RPG', 'APG']])

1Ô∏è‚É£ CALCULATING PLAYER SEASON AVERAGES

üîÑ Calculating statistics...
‚úÖ Calculated season averages: 5,309 player-seasons
   Players: 1,425
   Seasons: 10

üíæ Saved: C:\Users\wdors\qepc_project\data\props\Player_Season_Averages.csv

üèÄ Top 10 Scorers (2023-24):


Unnamed: 0,PLAYER_NAME,TEAM,GP,PPG,PPG_STD,PTS_CONSISTENCY,RPG,APG
2461,Joel Embiid,PHI,39,34.692308,9.451132,0.095683,11.025641,5.615385
4054,Luka Donƒçiƒá,DAL,70,33.857143,8.810872,0.101928,9.242857,9.8
2085,Giannis Antetokounmpo,MIL,73,30.438356,9.119555,0.098819,11.520548,6.520548
3871,Shai Gilgeous-Alexander,OKC,75,30.053333,6.972714,0.125428,5.533333,6.2
3830,Jalen Brunson,NYK,77,28.727273,10.118556,0.08994,3.61039,6.74026
490,Kevin Durant,PHX,75,27.093333,7.434713,0.118558,6.6,5.04
2752,Devin Booker,PHX,68,27.073529,10.223351,0.0891,4.529412,6.941176
3415,Jayson Tatum,BOS,74,26.851351,7.043276,0.124327,8.121622,4.918919
3455,Donovan Mitchell,CLE,55,26.6,8.716948,0.102913,5.090909,6.054545
3408,De'Aaron Fox,SAC,74,26.567568,8.480568,0.105479,4.594595,5.648649


---

## 2Ô∏è‚É£ Calculate Recent Form (Hot/Cold Streaks)

In [14]:
print("="*60)
print("2Ô∏è‚É£ CALCULATING RECENT FORM")
print("="*60)
print()

def calculate_recent_form(df, window=5):
    """Calculate rolling averages for last N games"""
    
    recent_form = []
    
    for player_id in df['PLAYER_ID'].unique():
        player_games = df[df['PLAYER_ID'] == player_id].sort_values('GAME_DATE')
        
        if len(player_games) >= window:
            # Get last N games
            last_n = player_games.tail(window)
            
            # Get season average for comparison
            season_avg = player_games['PTS'].mean()
            last_n_avg = last_n['PTS'].mean()
            
            # Determine if hot/cold
            diff = last_n_avg - season_avg
            status = 'HOT' if diff > 2 else ('COLD' if diff < -2 else 'NORMAL')
            
            recent_form.append({
                'PLAYER_ID': player_id,
                'PLAYER_NAME': last_n['PLAYER_NAME'].iloc[0],
                'TEAM': last_n['TEAM_ABBREVIATION'].iloc[-1],
                'Season': last_n['Season'].iloc[-1],
                'Last_Game_Date': last_n['GAME_DATE'].max(),
                
                # Last N games
                f'Last_{window}_PPG': last_n['PTS'].mean(),
                f'Last_{window}_RPG': last_n['REB'].mean(),
                f'Last_{window}_APG': last_n['AST'].mean(),
                f'Last_{window}_3PM': last_n['FG3M'].mean(),
                f'Last_{window}_MPG': last_n['MIN'].mean(),
                
                # Season averages for comparison
                'Season_PPG': season_avg,
                'Season_RPG': player_games['REB'].mean(),
                'Season_APG': player_games['AST'].mean(),
                
                # Hot/Cold indicator
                'PTS_Diff': diff,
                'Status': status,
                
                # Total games
                'Total_GP': len(player_games)
            })
    
    return pd.DataFrame(recent_form)

# Calculate for different windows
print("üîÑ Calculating last 5 games...")
form_5 = calculate_recent_form(df_sorted, window=5)
print(f"   ‚úÖ {len(form_5):,} players with 5+ games")

print("üîÑ Calculating last 10 games...")
form_10 = calculate_recent_form(df_sorted, window=10)
print(f"   ‚úÖ {len(form_10):,} players with 10+ games")

print("üîÑ Calculating last 15 games...")
form_15 = calculate_recent_form(df_sorted, window=15)
print(f"   ‚úÖ {len(form_15):,} players with 15+ games")

# Save all versions
form_5.to_csv(output_dir / "Player_Recent_Form_L5.csv", index=False)
form_10.to_csv(output_dir / "Player_Recent_Form_L10.csv", index=False)
form_15.to_csv(output_dir / "Player_Recent_Form_L15.csv", index=False)

print(f"\nüíæ Saved recent form files")

# Show hot players (last 5 games)
print(f"\nüî• HOTTEST Players (Last 5 Games):")
hot_players = form_5[form_5['Status'] == 'HOT'].nlargest(10, 'PTS_Diff')
if len(hot_players) > 0:
    display(hot_players[['PLAYER_NAME', 'TEAM', 'Last_5_PPG', 'Season_PPG', 'PTS_Diff', 'Status']])
else:
    print("   No hot players found")

# Show cold players
print(f"\nüßä COLDEST Players (Last 5 Games):")
cold_players = form_5[form_5['Status'] == 'COLD'].nsmallest(10, 'PTS_Diff')
if len(cold_players) > 0:
    display(cold_players[['PLAYER_NAME', 'TEAM', 'Last_5_PPG', 'Season_PPG', 'PTS_Diff', 'Status']])
else:
    print("   No cold players found")

2Ô∏è‚É£ CALCULATING RECENT FORM

üîÑ Calculating last 5 games...
   ‚úÖ 1,317 players with 5+ games
üîÑ Calculating last 10 games...
   ‚úÖ 1,233 players with 10+ games
üîÑ Calculating last 15 games...
   ‚úÖ 1,170 players with 15+ games

üíæ Saved recent form files

üî• HOTTEST Players (Last 5 Games):


Unnamed: 0,PLAYER_NAME,TEAM,Last_5_PPG,Season_PPG,PTS_Diff,Status
798,Jalen Brunson,NYK,39.4,16.898104,22.501896,HOT
1154,Dalano Banton,POR,21.2,6.067114,15.132886,HOT
1237,Jake LaRavia,MEM,22.0,6.914286,15.085714,HOT
442,Ian Clark,NOP,20.6,5.771987,14.828013,HOT
1034,Payton Pritchard,BOS,22.2,7.524345,14.675655,HOT
16,Jamal Crawford,BKN,26.0,12.089189,13.910811,HOT
1119,Cam Thomas,BKN,27.0,13.978947,13.021053,HOT
833,Khyri Thomas,HOU,16.4,4.102564,12.297436,HOT
1011,Tyrese Maxey,PHI,30.0,18.169173,11.830827,HOT
941,Jaylen Hoard,OKC,18.4,6.589744,11.810256,HOT



üßä COLDEST Players (Last 5 Games):


Unnamed: 0,PLAYER_NAME,TEAM,Last_5_PPG,Season_PPG,PTS_Diff,Status
290,Kemba Walker,DAL,2.4,20.6862,-18.2862,COLD
321,Isaiah Thomas,PHX,1.6,18.85,-17.25,COLD
201,Blake Griffin,BOS,2.8,17.352298,-14.552298,COLD
202,James Harden,LAC,13.0,27.412268,-14.412268,COLD
119,JJ Redick,DAL,2.2,15.571726,-13.371726,COLD
246,DeMarcus Cousins,DEN,7.8,20.950413,-13.150413,COLD
155,Marc Gasol,LAL,2.0,14.597802,-12.597802,COLD
116,LaMarcus Aldridge,BKN,7.0,19.352705,-12.352705,COLD
841,Trae Young,ATL,13.6,25.501229,-11.901229,COLD
105,Lou Williams,ATL,4.6,16.272408,-11.672408,COLD


---

## 3Ô∏è‚É£ Calculate Home/Away Splits

In [15]:
print("="*60)
print("3Ô∏è‚É£ CALCULATING HOME/AWAY SPLITS")
print("="*60)
print()

# Determine home/away from MATCHUP column
# 'vs.' means home, '@' means away
df_sorted['IS_HOME'] = df_sorted['MATCHUP'].str.contains('vs.', na=False)

print("üîÑ Calculating splits...")

# Calculate home stats
home_stats = df_sorted[df_sorted['IS_HOME']].groupby(['PLAYER_ID', 'PLAYER_NAME', 'Season']).agg({
    'PTS': ['mean', 'std', 'count'],
    'REB': 'mean',
    'AST': 'mean',
    'FG3M': 'mean',
    'FG_PCT': 'mean'
}).reset_index()

home_stats.columns = ['_'.join(col).strip('_') if col[1] else col[0] 
                      for col in home_stats.columns.values]

home_stats = home_stats.rename(columns={
    'PTS_mean': 'Home_PPG',
    'PTS_std': 'Home_PPG_STD',
    'PTS_count': 'Home_GP',
    'REB_mean': 'Home_RPG',
    'AST_mean': 'Home_APG',
    'FG3M_mean': 'Home_3PM',
    'FG_PCT_mean': 'Home_FG_PCT'
})

# Calculate away stats
away_stats = df_sorted[~df_sorted['IS_HOME']].groupby(['PLAYER_ID', 'PLAYER_NAME', 'Season']).agg({
    'PTS': ['mean', 'std', 'count'],
    'REB': 'mean',
    'AST': 'mean',
    'FG3M': 'mean',
    'FG_PCT': 'mean'
}).reset_index()

away_stats.columns = ['_'.join(col).strip('_') if col[1] else col[0] 
                      for col in away_stats.columns.values]

away_stats = away_stats.rename(columns={
    'PTS_mean': 'Away_PPG',
    'PTS_std': 'Away_PPG_STD',
    'PTS_count': 'Away_GP',
    'REB_mean': 'Away_RPG',
    'AST_mean': 'Away_APG',
    'FG3M_mean': 'Away_3PM',
    'FG_PCT_mean': 'Away_FG_PCT'
})

# Merge home and away
splits = home_stats.merge(
    away_stats,
    on=['PLAYER_ID', 'PLAYER_NAME', 'Season'],
    how='outer'
)

# Calculate differences
splits['PPG_Diff_Home_vs_Away'] = splits['Home_PPG'] - splits['Away_PPG']
splits['RPG_Diff_Home_vs_Away'] = splits['Home_RPG'] - splits['Away_RPG']
splits['APG_Diff_Home_vs_Away'] = splits['Home_APG'] - splits['Away_APG']

print(f"‚úÖ Calculated splits for {len(splits):,} player-seasons")

# Save
splits_path = output_dir / "Player_Home_Away_Splits.csv"
splits.to_csv(splits_path, index=False)
print(f"üíæ Saved: {splits_path}")

# Show players with biggest home advantage
print(f"\nüè† Biggest HOME Advantage (2023-24):")
home_advantage = splits[splits['Season'] == '2023-24'].nlargest(10, 'PPG_Diff_Home_vs_Away')
if len(home_advantage) > 0:
    display(home_advantage[['PLAYER_NAME', 'Home_PPG', 'Away_PPG', 'PPG_Diff_Home_vs_Away', 'Home_GP', 'Away_GP']])

# Show players who perform better on road
print(f"\n‚úàÔ∏è  Best ROAD Performers (2023-24):")
road_warriors = splits[splits['Season'] == '2023-24'].nsmallest(10, 'PPG_Diff_Home_vs_Away')
if len(road_warriors) > 0:
    display(road_warriors[['PLAYER_NAME', 'Away_PPG', 'Home_PPG', 'PPG_Diff_Home_vs_Away', 'Away_GP', 'Home_GP']])

3Ô∏è‚É£ CALCULATING HOME/AWAY SPLITS

üîÑ Calculating splits...
‚úÖ Calculated splits for 5,309 player-seasons
üíæ Saved: C:\Users\wdors\qepc_project\data\props\Player_Home_Away_Splits.csv

üè† Biggest HOME Advantage (2023-24):


Unnamed: 0,PLAYER_NAME,Home_PPG,Away_PPG,PPG_Diff_Home_vs_Away,Home_GP,Away_GP
5306,Maozinha Pereira,11.333333,3.5,7.833333,3.0,4.0
4903,Bones Hyland,10.6,4.409091,6.190909,15.0,22.0
3455,Donovan Mitchell,29.653846,23.862069,5.791777,26.0,29.0
4848,Zavier Simpson,7.6,2.0,5.6,5.0,2.0
2321,Jerami Grant,23.571429,18.153846,5.417582,28.0,26.0
2246,Jordan Clarkson,19.482759,14.5,4.982759,29.0,26.0
4589,LaMelo Ball,26.272727,21.545455,4.727273,11.0,11.0
5114,Keegan Murray,17.5,13.0,4.5,38.0,39.0
4030,P.J. Washington,15.114286,10.789474,4.324812,35.0,38.0
5110,Bennedict Mathurin,16.483871,12.321429,4.162442,31.0,28.0



‚úàÔ∏è  Best ROAD Performers (2023-24):


Unnamed: 0,PLAYER_NAME,Away_PPG,Home_PPG,PPG_Diff_Home_vs_Away,Away_GP,Home_GP
5304,Dexter Dennis,18.0,1.333333,-16.666667,1.0,3.0
5287,Adama Sanogo,6.8,0.5,-6.3,5.0,4.0
4436,Jaylen Nowell,8.625,2.4,-6.225,8.0,5.0
4307,Brandon Clarke,14.0,8.666667,-5.333333,3.0,3.0
2085,Giannis Antetokounmpo,32.885714,28.184211,-4.701504,35.0,38.0
5134,Mark Williams,15.25,10.909091,-4.340909,8.0,11.0
4882,Trey Murphy III,16.8,12.518519,-4.281481,30.0,27.0
4359,Charles Bassey,5.555556,1.3,-4.255556,9.0,10.0
4288,Ja Morant,27.0,22.75,-4.25,5.0,4.0
3781,Udoka Azubuike,4.833333,0.6,-4.233333,6.0,10.0


---

## 4Ô∏è‚É£ Build Confidence Intervals

In [16]:
print("="*60)
print("4Ô∏è‚É£ BUILDING CONFIDENCE INTERVALS")
print("="*60)
print()

# For each player-season, calculate confidence intervals
# Using normal distribution approximation

from scipy import stats

def calculate_confidence_intervals(row, confidence=0.90):
    """Calculate confidence intervals for predictions"""
    
    z_score = stats.norm.ppf((1 + confidence) / 2)
    
    # Points
    pts_margin = z_score * (row['PPG_STD'] / np.sqrt(row['GP']))
    row['PPG_CI_LOWER'] = row['PPG'] - pts_margin
    row['PPG_CI_UPPER'] = row['PPG'] + pts_margin
    
    # Rebounds
    reb_margin = z_score * (row['RPG_STD'] / np.sqrt(row['GP']))
    row['RPG_CI_LOWER'] = row['RPG'] - reb_margin
    row['RPG_CI_UPPER'] = row['RPG'] + reb_margin
    
    # Assists
    ast_margin = z_score * (row['APG_STD'] / np.sqrt(row['GP']))
    row['APG_CI_LOWER'] = row['APG'] - ast_margin
    row['APG_CI_UPPER'] = row['APG'] + ast_margin
    
    return row

print(f"üîÑ Calculating 90% confidence intervals...")

# Apply to season stats
season_stats_ci = season_stats.copy()
season_stats_ci = season_stats_ci.apply(calculate_confidence_intervals, axis=1)

print(f"‚úÖ Calculated confidence intervals")

# Save
ci_path = output_dir / "Player_Averages_With_CI.csv"
season_stats_ci.to_csv(ci_path, index=False)
print(f"üíæ Saved: {ci_path}")

# Show examples
print(f"\nüìä Example Predictions with Confidence (2023-24):")
examples = season_stats_ci[
    (season_stats_ci['Season'] == '2023-24') & 
    (season_stats_ci['GP'] >= 20)
].nlargest(5, 'PPG')

if len(examples) > 0:
    display(examples[[
        'PLAYER_NAME', 'TEAM', 'GP',
        'PPG', 'PPG_CI_LOWER', 'PPG_CI_UPPER', 'PPG_STD',
        'RPG', 'RPG_CI_LOWER', 'RPG_CI_UPPER',
        'APG', 'APG_CI_LOWER', 'APG_CI_UPPER'
    ]])
    
    print(f"\nüí° Interpretation:")
    print(f"   90% confidence = Player will score within this range 90% of the time")
    print(f"   Tighter range = More consistent player")
    print(f"   Wider range = More volatile/boom-bust")

4Ô∏è‚É£ BUILDING CONFIDENCE INTERVALS

üîÑ Calculating 90% confidence intervals...
‚úÖ Calculated confidence intervals
üíæ Saved: C:\Users\wdors\qepc_project\data\props\Player_Averages_With_CI.csv

üìä Example Predictions with Confidence (2023-24):


Unnamed: 0,PLAYER_NAME,TEAM,GP,PPG,PPG_CI_LOWER,PPG_CI_UPPER,PPG_STD,RPG,RPG_CI_LOWER,RPG_CI_UPPER,APG,APG_CI_LOWER,APG_CI_UPPER
2461,Joel Embiid,PHI,39,34.692308,32.202999,37.181616,9.451132,11.025641,10.225179,11.826103,5.615385,4.927451,6.303318
4054,Luka Donƒçiƒá,DAL,70,33.857143,32.124946,35.589339,8.810872,9.242857,8.631669,9.854045,9.8,9.12508,10.47492
2085,Giannis Antetokounmpo,MIL,73,30.438356,28.6827,32.194012,9.119555,11.520548,10.799428,12.241668,6.520548,5.926213,7.114883
3871,Shai Gilgeous-Alexander,OKC,75,30.053333,28.728996,31.37767,6.972714,5.533333,5.089184,5.977483,6.2,5.773011,6.626989
3830,Jalen Brunson,NYK,77,28.727273,26.830565,30.62398,10.118556,3.61039,3.262145,3.958634,6.74026,6.214979,7.26554



üí° Interpretation:
   90% confidence = Player will score within this range 90% of the time
   Tighter range = More consistent player
   Wider range = More volatile/boom-bust


---

## 5Ô∏è‚É£ Generate Over/Under Predictions

In [17]:
print("="*60)
print("5Ô∏è‚É£ GENERATING OVER/UNDER PREDICTIONS")
print("="*60)
print()

def predict_over_under(player_avg, line, stat='PTS'):
    """
    Predict probability of going over a betting line
    
    Uses normal distribution based on player's mean and std
    """
    mean = player_avg[f'{stat}_mean'] if f'{stat}_mean' in player_avg else player_avg[f'{stat[0]}PG']
    std = player_avg[f'{stat}_std'] if f'{stat}_std' in player_avg else player_avg.get(f'{stat[0]}PG_STD', 5)
    
    # Calculate z-score
    z = (line - mean) / std
    
    # Probability of UNDER
    prob_under = stats.norm.cdf(z)
    
    # Probability of OVER
    prob_over = 1 - prob_under
    
    return prob_over, prob_under

# Example: Predict for a specific player
print("üîç Example Prediction:\n")

# Get a star player from 2023-24
recent_stars = season_stats_ci[
    (season_stats_ci['Season'] == '2023-24') & 
    (season_stats_ci['PPG'] >= 25) &
    (season_stats_ci['GP'] >= 30)
].head(3)

if len(recent_stars) > 0:
    for idx, player in recent_stars.iterrows():
        print(f"Player: {player['PLAYER_NAME']}")
        print(f"Season Average: {player['PPG']:.1f} PPG (¬±{player['PPG_STD']:.1f})")
        print()
        
        # Test different lines
        for line in [player['PPG'] - 5, player['PPG'], player['PPG'] + 5]:
            prob_over, prob_under = predict_over_under(player, line, 'PTS')
            
            print(f"   If line is {line:.1f} points:")
            print(f"      OVER: {prob_over*100:.1f}% | UNDER: {prob_under*100:.1f}%")
        
        print()

# Create prediction function
def create_prop_prediction(player_name, season='2023-24', stat='PTS', line=None, is_home=None, recent_form_window=5):
    """
    Comprehensive prop prediction for a player
    
    Adjusts for:
    - Home/away
    - Recent form
    - Historical variance
    """
    
    # Get player season stats
    player_stats = season_stats_ci[
        (season_stats_ci['PLAYER_NAME'] == player_name) &
        (season_stats_ci['Season'] == season)
    ]
    
    if len(player_stats) == 0:
        return None
    
    player = player_stats.iloc[0]
    
    # Base prediction
    base_mean = player[f'{stat[0]}PG']
    base_std = player[f'{stat[0]}PG_STD']
    
    # Adjust for home/away if specified
    if is_home is not None:
        split_data = splits[
            (splits['PLAYER_NAME'] == player_name) &
            (splits['Season'] == season)
        ]
        
        if len(split_data) > 0:
            split = split_data.iloc[0]
            if is_home and not pd.isna(split['Home_PPG']):
                base_mean = split['Home_PPG']
            elif not is_home and not pd.isna(split['Away_PPG']):
                base_mean = split['Away_PPG']
    
    # Adjust for recent form
    form_data = form_5[form_5['PLAYER_NAME'] == player_name]
    if len(form_data) > 0:
        form = form_data.iloc[0]
        recent_avg = form[f'Last_5_PPG']
        
        # Weight recent form 30%, season average 70%
        adjusted_mean = 0.7 * base_mean + 0.3 * recent_avg
    else:
        adjusted_mean = base_mean
    
    # If line provided, calculate probability
    if line is not None:
        z = (line - adjusted_mean) / base_std
        prob_over = 1 - stats.norm.cdf(z)
        prob_under = stats.norm.cdf(z)
        
        return {
            'player': player_name,
            'stat': stat,
            'line': line,
            'prediction': adjusted_mean,
            'std': base_std,
            'prob_over': prob_over,
            'prob_under': prob_under,
            'recommendation': 'OVER' if prob_over > 0.55 else ('UNDER' if prob_under > 0.55 else 'PASS'),
            'confidence': max(prob_over, prob_under)
        }
    else:
        return {
            'player': player_name,
            'stat': stat,
            'prediction': adjusted_mean,
            'std': base_std,
            'ci_lower': adjusted_mean - 1.645 * base_std,  # 90% CI
            'ci_upper': adjusted_mean + 1.645 * base_std
        }

print("\n‚úÖ Prediction functions created!")
print("\nüí° Use: create_prop_prediction(player_name, season, stat, line, is_home)")

5Ô∏è‚É£ GENERATING OVER/UNDER PREDICTIONS

üîç Example Prediction:

Player: LeBron James
Season Average: 25.7 PPG (¬±6.7)

   If line is 20.7 points:
      OVER: 77.4% | UNDER: 22.6%
   If line is 25.7 points:
      OVER: 50.0% | UNDER: 50.0%
   If line is 30.7 points:
      OVER: 22.6% | UNDER: 77.4%

Player: Kevin Durant
Season Average: 27.1 PPG (¬±7.4)

   If line is 22.1 points:
      OVER: 74.9% | UNDER: 25.1%
   If line is 27.1 points:
      OVER: 50.0% | UNDER: 50.0%
   If line is 32.1 points:
      OVER: 25.1% | UNDER: 74.9%

Player: Stephen Curry
Season Average: 26.4 PPG (¬±9.6)

   If line is 21.4 points:
      OVER: 69.9% | UNDER: 30.1%
   If line is 26.4 points:
      OVER: 50.0% | UNDER: 50.0%
   If line is 31.4 points:
      OVER: 30.1% | UNDER: 69.9%


‚úÖ Prediction functions created!

üí° Use: create_prop_prediction(player_name, season, stat, line, is_home)


---

## 6Ô∏è‚É£ Example: Make Real Predictions

In [18]:
print("="*60)
print("6Ô∏è‚É£ EXAMPLE PREDICTIONS")
print("="*60)
print()

# Example predictions for top stars
star_players = [
    ('Luka Doncic', 28.5, True),
    ('Joel Embiid', 32.5, True),
    ('Giannis Antetokounmpo', 30.5, False),
    ('Stephen Curry', 26.5, True),
    ('Kevin Durant', 27.5, False)
]

predictions = []

print("üéØ Example Props Predictions (2023-24):\n")

for player_name, line, is_home in star_players:
    pred = create_prop_prediction(
        player_name=player_name,
        season='2023-24',
        stat='PTS',
        line=line,
        is_home=is_home
    )
    
    if pred:
        predictions.append(pred)
        
        location = "HOME" if is_home else "AWAY"
        
        print(f"{player_name} ({location})")
        print(f"   Line: {line} points")
        print(f"   Prediction: {pred['prediction']:.1f} ¬± {pred['std']:.1f}")
        print(f"   Probability OVER: {pred['prob_over']*100:.1f}%")
        print(f"   Probability UNDER: {pred['prob_under']*100:.1f}%")
        print(f"   ‚û°Ô∏è  RECOMMENDATION: {pred['recommendation']} (confidence: {pred['confidence']*100:.0f}%)")
        print()

# Convert to DataFrame
if predictions:
    predictions_df = pd.DataFrame(predictions)
    
    # Save
    example_path = output_dir / "Example_Predictions.csv"
    predictions_df.to_csv(example_path, index=False)
    print(f"üíæ Saved example predictions: {example_path}")

6Ô∏è‚É£ EXAMPLE PREDICTIONS

üéØ Example Props Predictions (2023-24):

Joel Embiid (HOME)
   Line: 32.5 points
   Prediction: 34.1 ¬± 9.5
   Probability OVER: 56.8%
   Probability UNDER: 43.2%
   ‚û°Ô∏è  RECOMMENDATION: OVER (confidence: 57%)

Giannis Antetokounmpo (AWAY)
   Line: 30.5 points
   Prediction: 31.1 ¬± 9.1
   Probability OVER: 52.7%
   Probability UNDER: 47.3%
   ‚û°Ô∏è  RECOMMENDATION: PASS (confidence: 53%)

Stephen Curry (HOME)
   Line: 26.5 points
   Prediction: 27.2 ¬± 9.6
   Probability OVER: 52.8%
   Probability UNDER: 47.2%
   ‚û°Ô∏è  RECOMMENDATION: PASS (confidence: 53%)

Kevin Durant (AWAY)
   Line: 27.5 points
   Prediction: 25.0 ¬± 7.4
   Probability OVER: 36.9%
   Probability UNDER: 63.1%
   ‚û°Ô∏è  RECOMMENDATION: UNDER (confidence: 63%)

üíæ Saved example predictions: C:\Users\wdors\qepc_project\data\props\Example_Predictions.csv


---

## üìä Summary & Next Steps

In [19]:
print("="*60)
print("üìä PLAYER PROPS PROCESSING COMPLETE")
print("="*60)

print(f"\nüìÅ Files Created in: {output_dir}\n")

files_created = [
    ('Player_Season_Averages.csv', 'Season stats with variance'),
    ('Player_Averages_With_CI.csv', 'Averages with 90% confidence intervals'),
    ('Player_Recent_Form_L5.csv', 'Last 5 games hot/cold analysis'),
    ('Player_Recent_Form_L10.csv', 'Last 10 games analysis'),
    ('Player_Recent_Form_L15.csv', 'Last 15 games analysis'),
    ('Player_Home_Away_Splits.csv', 'Home vs away performance'),
    ('Example_Predictions.csv', 'Sample prop predictions')
]

for filename, description in files_created:
    filepath = output_dir / filename
    if filepath.exists():
        size = filepath.stat().st_size / 1024
        print(f"   ‚úÖ {filename}")
        print(f"      {description} ({size:.0f} KB)")
    else:
        print(f"   ‚ùå {filename} - Not created")

print(f"\nüéØ What You Can Now Do:\n")
print("1. ‚úÖ Predict player props (points, rebounds, assists)")
print("2. ‚úÖ Calculate over/under probabilities")
print("3. ‚úÖ Identify hot/cold players")
print("4. ‚úÖ Adjust for home/away")
print("5. ‚úÖ Build confidence intervals")
print("6. ‚úÖ Find betting value")

print(f"\nüöÄ Next Steps:\n")
print("1. Load Player_Averages_With_CI.csv for predictions")
print("2. Use create_prop_prediction() function for any player")
print("3. Compare your predictions to actual betting lines")
print("4. Look for value where your probability differs from odds")
print("5. Track performance and adjust model")

print(f"\nüí° Integration with QEPC:\n")
print("‚Ä¢ Use player stats to improve team predictions")
print("‚Ä¢ Add player props to your betting strategy")
print("‚Ä¢ Combine team + player models for best results")

print(f"\nüéâ You now have a complete player props system!")

üìä PLAYER PROPS PROCESSING COMPLETE

üìÅ Files Created in: C:\Users\wdors\qepc_project\data\props

   ‚úÖ Player_Season_Averages.csv
      Season stats with variance (2678 KB)
   ‚úÖ Player_Averages_With_CI.csv
      Averages with 90% confidence intervals (3241 KB)
   ‚úÖ Player_Recent_Form_L5.csv
      Last 5 games hot/cold analysis (200 KB)
   ‚úÖ Player_Recent_Form_L10.csv
      Last 10 games analysis (190 KB)
   ‚úÖ Player_Recent_Form_L15.csv
      Last 15 games analysis (226 KB)
   ‚úÖ Player_Home_Away_Splits.csv
      Home vs away performance (1384 KB)
   ‚úÖ Example_Predictions.csv
      Sample prop predictions (1 KB)

üéØ What You Can Now Do:

1. ‚úÖ Predict player props (points, rebounds, assists)
2. ‚úÖ Calculate over/under probabilities
3. ‚úÖ Identify hot/cold players
4. ‚úÖ Adjust for home/away
5. ‚úÖ Build confidence intervals
6. ‚úÖ Find betting value

üöÄ Next Steps:

1. Load Player_Averages_With_CI.csv for predictions
2. Use create_prop_prediction() function for a

---

## üîÆ How To Use This System

### **Quick Prediction:**
```python
# Predict a specific prop
pred = create_prop_prediction(
    player_name='Luka Doncic',
    season='2023-24',
    stat='PTS',
    line=28.5,
    is_home=True
)

print(f"Prediction: {pred['prediction']:.1f}")
print(f"Recommendation: {pred['recommendation']}")
print(f"Confidence: {pred['confidence']*100:.0f}%")
```

### **Find Value Bets:**
```python
# Load averages
avgs = pd.read_csv('data/props/Player_Averages_With_CI.csv')

# Find consistent high scorers
consistent = avgs[
    (avgs['Season'] == '2023-24') &
    (avgs['PPG'] >= 20) &
    (avgs['PTS_CONSISTENCY'] > 0.15)  # Low variance
]

# These players are good for OVER bets (predictable)
```

### **Hot Streak Detection:**
```python
# Load recent form
form = pd.read_csv('data/props/Player_Recent_Form_L5.csv')

# Find hot players
hot = form[
    (form['Status'] == 'HOT') &
    (form['PTS_Diff'] > 3)  # 3+ points above average
]

# Target these for OVER bets
```

### **Home Court Advantage:**
```python
# Load splits
splits = pd.read_csv('data/props/Player_Home_Away_Splits.csv')

# Find players who perform much better at home
home_boost = splits[
    (splits['Season'] == '2023-24') &
    (splits['PPG_Diff_Home_vs_Away'] > 2)
]

# Target these for OVER when playing at home
```

---

**You now have everything you need for professional-grade player props predictions!** üéØ