# üìä Injury Impact Analysis

Analyze the impact of player injuries on team performance.

**Features:**
- Calculate player impact factors
- Compare manual vs data-driven impacts
- Analyze individual players
- Generate impact reference data

---

## üß© Setup

In [2]:
import os
import sys
from pathlib import Path

print("üîç Locating QEPC project root...\n")

# Try direct import first
try:
    from notebook_context import *
    print("‚úÖ Imported notebook_context directly")
    
except ModuleNotFoundError:
    print("‚ÑπÔ∏è  notebook_context not on path, searching...")
    
    # Search current directory and parents
    cwd = Path.cwd()
    candidates = [cwd, cwd.parent, cwd.parent.parent]
    
    found_root = None
    for root in candidates:
        if (root / "notebook_context.py").exists():
            found_root = root
            print(f"   Found at: {root}")
            break
    
    if found_root is None:
        raise FileNotFoundError(
            f"‚ùå Could not find notebook_context.py\n"
            f"   Searched: {cwd} and parent directories\n"
            f"   Ensure you're in the qepc_project folder"
        )
    
    # Add to path and re-import
    sys.path.insert(0, str(found_root))
    os.chdir(found_root)
    
    from notebook_context import *
    print("‚úÖ Imported after path adjustment")

# Verify project_root is defined
try:
    project_root
except NameError:
    project_root = Path.cwd()
    print("‚ö†Ô∏è  project_root not defined, using CWD")

print(f"\nüìÅ Project Root: {project_root}")
print(f"üìÇ Working Dir:  {os.getcwd()}")
print("\n" + "="*60)

üîç Locating QEPC project root...

‚ÑπÔ∏è  notebook_context not on path, searching...
   Found at: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project
[QEPC Paths] Project Root set: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project
[QEPC] Autoload complete.
‚úÖ Imported after path adjustment
‚ö†Ô∏è  project_root not defined, using CWD

üìÅ Project Root: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project
üìÇ Working Dir:  /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project



In [3]:
import pandas as pd
import numpy as np
from datetime import datetime

print("‚úÖ Imports loaded")
print(f"üìÅ Project root: {project_root}")

‚úÖ Imports loaded
üìÅ Project root: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project


---

## üì• Load Player Statistics

In [4]:
# Load player statistics
player_stats_path = project_root / "data" / "raw" / "PlayerStatistics.csv"

if not player_stats_path.exists():
    print("‚ùå PlayerStatistics.csv not found!")
    print(f"   Expected at: {player_stats_path}")
else:
    player_stats = pd.read_csv(player_stats_path)
    print(f"‚úÖ Loaded {len(player_stats)} player records")
    print(f"\nüìä Columns available:")
    print(player_stats.columns.tolist())
    
    print(f"\nüîç Sample data:")
    display(player_stats.head())

  player_stats = pd.read_csv(player_stats_path)


‚úÖ Loaded 1635462 player records

üìä Columns available:
['firstName', 'lastName', 'personId', 'gameId', 'gameDate', 'playerteamCity', 'playerteamName', 'opponentteamCity', 'opponentteamName', 'gameType', 'gameLabel', 'gameSubLabel', 'seriesGameNumber', 'win', 'home', 'numMinutes', 'points', 'assists', 'blocks', 'steals', 'fieldGoalsAttempted', 'fieldGoalsMade', 'fieldGoalsPercentage', 'threePointersAttempted', 'threePointersMade', 'threePointersPercentage', 'freeThrowsAttempted', 'freeThrowsMade', 'freeThrowsPercentage', 'reboundsDefensive', 'reboundsOffensive', 'reboundsTotal', 'foulsPersonal', 'turnovers', 'plusMinusPoints']

üîç Sample data:


Unnamed: 0,firstName,lastName,personId,gameId,gameDate,playerteamCity,playerteamName,opponentteamCity,opponentteamName,gameType,...,threePointersPercentage,freeThrowsAttempted,freeThrowsMade,freeThrowsPercentage,reboundsDefensive,reboundsOffensive,reboundsTotal,foulsPersonal,turnovers,plusMinusPoints
0,Jamal,Murray,1627750,22500248,2025-11-17T21:00:00Z,Denver,Nuggets,Chicago,Bulls,,...,0.455,5.0,5.0,1.0,11.0,0.0,11.0,3.0,2.0,-1.0
1,Bruce,Brown,1628971,22500248,2025-11-17T21:00:00Z,Denver,Nuggets,Chicago,Bulls,,...,0.0,0.0,0.0,0.0,2.0,0.0,2.0,1.0,0.0,-17.0
2,Jevon,Carter,1628975,22500248,2025-11-17T21:00:00Z,Chicago,Bulls,Denver,Nuggets,,...,0.5,0.0,0.0,0.0,3.0,1.0,4.0,2.0,1.0,20.0
3,Kevin,Huerter,1628989,22500248,2025-11-17T21:00:00Z,Chicago,Bulls,Denver,Nuggets,,...,0.444,2.0,2.0,1.0,2.0,0.0,2.0,0.0,1.0,-21.0
4,Jalen,Pickett,1629618,22500248,2025-11-17T21:00:00Z,Denver,Nuggets,Chicago,Bulls,,...,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,9.0


---

## ‚öôÔ∏è Calculate Impact Factors

Impact factor represents how much a player's absence affects team performance.

**Formula:**
- Base: Usage Rate (what % of possessions the player uses)
- Adjustment: On/Off rating differential
- Scale: 0.60 (minimal impact) to 1.00 (no impact when OUT)

In [5]:
def calculate_impact_factor(row):
    """
    Calculate injury impact factor for a player.
    
    Returns:
        float: Impact when absent (0.60-1.00)
               Lower = bigger impact when missing
    """
    # Default values if columns missing
    usage_rate = row.get('USG%', 0.20)  # Default 20% usage
    minutes = row.get('MP', 0)
    games = row.get('G', 0)
    
    # Minutes per game
    mpg = minutes / games if games > 0 else 0
    
    # Base impact on usage and minutes
    if mpg < 15:  # Bench player
        base_impact = 0.95
    elif mpg < 25:  # Role player
        base_impact = 0.85
    elif mpg < 32:  # Starter
        base_impact = 0.75
    else:  # Star player
        base_impact = 0.65
    
    # Adjust by usage rate
    if isinstance(usage_rate, str):
        usage_rate = float(usage_rate.strip('%')) / 100 if '%' in usage_rate else float(usage_rate)
    
    usage_adjustment = (usage_rate - 0.20) * 0.5  # Scale usage impact
    
    impact = base_impact - usage_adjustment
    
    # Clamp between 0.60 and 1.00
    return max(0.60, min(1.00, impact))


if 'player_stats' in dir():
    # Add impact column
    player_stats['Impact_Factor'] = player_stats.apply(calculate_impact_factor, axis=1)
    
    # Sort by impact (lowest = most important)
    impact_sorted = player_stats.sort_values('Impact_Factor')
    
    print("\nüìä Top 20 Highest Impact Players (when OUT):")
    print("="*60)
    
    cols_to_show = ['Player', 'Tm', 'Impact_Factor', 'MP', 'G']
    # Only show columns that exist
    cols_to_show = [col for col in cols_to_show if col in impact_sorted.columns]
    
    display(impact_sorted[cols_to_show].head(20))
    
    print(f"\nüìà Impact Factor Statistics:")
    print(f"   Mean: {player_stats['Impact_Factor'].mean():.3f}")
    print(f"   Median: {player_stats['Impact_Factor'].median():.3f}")
    print(f"   Min (highest impact): {player_stats['Impact_Factor'].min():.3f}")
    print(f"   Max (lowest impact): {player_stats['Impact_Factor'].max():.3f}")
else:
    print("‚ö†Ô∏è  Player stats not loaded")


üìä Top 20 Highest Impact Players (when OUT):


Unnamed: 0,Impact_Factor
0,0.95
1090313,0.95
1090312,0.95
1090311,0.95
1090310,0.95
1090309,0.95
1090308,0.95
1090307,0.95
1090306,0.95
1090305,0.95



üìà Impact Factor Statistics:
   Mean: 0.950
   Median: 0.950
   Min (highest impact): 0.950
   Max (lowest impact): 0.950


---

## üíæ Save Impact Reference Data

In [6]:
if 'player_stats' in dir() and 'Impact_Factor' in player_stats.columns:
    # Create reference file with key columns
    reference_cols = ['Player', 'Tm', 'Impact_Factor', 'MP', 'G']
    reference_cols = [col for col in reference_cols if col in player_stats.columns]
    
    impact_reference = player_stats[reference_cols].copy()
    
    # Rename for consistency
    impact_reference.rename(columns={
        'Player': 'PlayerName',
        'Tm': 'Team',
        'Impact_Factor': 'Impact'
    }, inplace=True)
    
    # Save to data folder
    output_path = project_root / "data" / "Injury_Overrides.csv"
    impact_reference.to_csv(output_path, index=False)
    
    print(f"‚úÖ Saved impact reference to: {output_path}")
    print(f"   Rows: {len(impact_reference)}")
    
    print("\nüîç Sample output:")
    display(impact_reference.head(10))
else:
    print("‚ö†Ô∏è  Impact factors not calculated yet")

‚úÖ Saved impact reference to: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project/data/Injury_Overrides.csv
   Rows: 1635462

üîç Sample output:


Unnamed: 0,Impact
0,0.95
1,0.95
2,0.95
3,0.95
4,0.95
5,0.95
6,0.95
7,0.95
8,0.95
9,0.95


---

## üîç Analyze Specific Players

In [7]:
# Players to analyze (customize these!)
players_to_check = [
    "LeBron James",
    "Stephen Curry",
    "Giannis Antetokounmpo",
    "Kevin Durant",
    "Jayson Tatum",
]

print("üîç Individual Player Analysis\n")
print("="*60)

if 'player_stats' in dir():
    for player_name in players_to_check:
        # Find player in stats
        player_data = player_stats[
            player_stats['Player'].str.contains(player_name, case=False, na=False)
        ]
        
        if player_data.empty:
            print(f"\n‚ùå {player_name}: Not found in data")
        else:
            row = player_data.iloc[0]
            print(f"\nüìä {row.get('Player', player_name)}")
            print(f"   Team: {row.get('Tm', 'N/A')}")
            print(f"   Games: {row.get('G', 'N/A')}")
            print(f"   Minutes: {row.get('MP', 'N/A')}")
            
            mpg = row.get('MP', 0) / row.get('G', 1) if row.get('G', 0) > 0 else 0
            print(f"   MPG: {mpg:.1f}")
            
            if 'Impact_Factor' in row:
                print(f"   üéØ Impact Factor: {row['Impact_Factor']:.3f}")
                
                # Interpret impact
                if row['Impact_Factor'] < 0.70:
                    impact_level = "üî¥ CRITICAL - Star player"
                elif row['Impact_Factor'] < 0.80:
                    impact_level = "üü° HIGH - Key starter"
                elif row['Impact_Factor'] < 0.90:
                    impact_level = "üü¢ MODERATE - Role player"
                else:
                    impact_level = "‚ö™ LOW - Bench player"
                
                print(f"   Impact Level: {impact_level}")
else:
    print("‚ö†Ô∏è  Player stats not loaded")

print("\n" + "="*60)

üîç Individual Player Analysis



KeyError: 'Player'

---

## üìä Compare with Manual Overrides

In [8]:
# Load manual injury overrides
manual_path = project_root / "data" / "Injury_Overrides.csv"

if not manual_path.exists():
    print("‚ÑπÔ∏è  No manual injury overrides file found")
    print(f"   Expected at: {manual_path}")
elif 'impact_reference' not in dir():
    print("‚ö†Ô∏è  Run previous cells first to calculate impact factors")
else:
    manual = pd.read_csv(manual_path)
    
    print(f"üìä Comparing Manual vs Data-Driven Impact Factors\n")
    print("="*60)
    
    # Rename manual columns for consistency
    if 'Player' in manual.columns:
        manual.rename(columns={'Player': 'PlayerName'}, inplace=True)
    if 'Tm' in manual.columns:
        manual.rename(columns={'Tm': 'Team'}, inplace=True)
    
    # Merge
    comparison = manual.merge(
        impact_reference[['PlayerName', 'Team', 'Impact']],
        on=['PlayerName', 'Team'],
        how='inner',
        suffixes=('_manual', '_data')
    )
    
    if len(comparison) == 0:
        print("‚ö†Ô∏è  No matching players found between manual and data-driven")
    else:
        comparison['Delta'] = comparison['Impact_data'] - comparison['Impact_manual']
        
        print(f"\nüìã Comparison Results ({len(comparison)} players):\n")
        display(comparison[
            ['PlayerName', 'Team', 'Impact_manual', 'Impact_data', 'Delta']
        ].head(20))
        
        print(f"\nüìà Statistics:")
        print(f"   Mean difference: {comparison['Delta'].mean():.3f}")
        print(f"   Mean absolute difference: {comparison['Delta'].abs().mean():.3f}")
        print(f"   Max overestimate: {comparison['Delta'].max():.3f}")
        print(f"   Max underestimate: {comparison['Delta'].min():.3f}")
        
        # Players with biggest discrepancies
        print(f"\nüîç Biggest Discrepancies:")
        biggest_diff = comparison.nlargest(5, 'Delta', keep='all')
        display(biggest_diff[['PlayerName', 'Team', 'Impact_manual', 'Impact_data', 'Delta']])

print("\n" + "="*60)

üìä Comparing Manual vs Data-Driven Impact Factors



KeyError: "['PlayerName', 'Team'] not in index"

---

## üí° Impact Factor Guide

**What does the Impact Factor mean?**

The Impact Factor represents how much a team's offensive efficiency drops when a player is OUT:

- **0.60-0.70** üî¥ CRITICAL - Superstar (30+ min, 25%+ usage)
- **0.70-0.80** üü° HIGH - Key starter (25-30 min, 20-25% usage)
- **0.80-0.90** üü¢ MODERATE - Role player (15-25 min)
- **0.90-1.00** ‚ö™ LOW - Bench player (<15 min)

**Example:**
- Team ORtg: 115.0
- LeBron James Impact: 0.70
- Adjusted ORtg (LeBron OUT): 115.0 √ó 0.70 = 80.5
- **Impact: -34.5 points per 100 possessions**

**Usage in QEPC:**
This impact factor is used in the lambda calculations to adjust team strength when players are injured.