# Predictive Modeling: The "Cap Trap" Detector

**Goal**: Build a machine learning model to predict which players are likely to become "Dead Money" liabilities in the near future.

**Hypothesis**: We can flag "traps" by analyzing the *rate of change* in efficiency relative to cost, age (biological), and cumulative workload.

**Features**:
- `Age`: Biological age (from Spotrac).
- `Experience`: Years since first appearing in dataset.
- `Cap_Pct`: Current % of Team Cap.
- `Efficiency_Trend`: 3-year slope of AV/Cap efficiency.
- `Cumulative_AV`: Proxy for total career workload.

**Target**:
- `Is_Liability_Next_Year`: True if next year's Efficiency < 1.0 (Replacement Level) AND Cap Hit > 3.0% (Material Cost).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from scipy.stats import linregress

sns.set_style("whitegrid")
pd.set_option('display.max_columns', 50)

## 1. Load & Prepare Data

In [2]:
# Load Master Table (ALL years, including future for context)
df = pd.read_csv('../data/processed/nfl_master_table.csv')

# Exclude positions we cannot score with Fantasy Points (OL, K, P, LS)
exclude_agg = ['OL', 'C', 'G', 'T', 'LT', 'RT', 'LG', 'RG', 'K', 'P', 'LS']
df = df[~df['position'].isin(exclude_agg)].copy()

df = df.sort_values(['player_name', 'year'])

print(f"Loaded {len(df)} player-seasons (2015-2026)")
if 'age' in df.columns:
    print(f"Age data present. Valid entries: {df['age'].notna().sum()} / {len(df)}")

Loaded 19484 player-seasons (2015-2026)
Age data present. Valid entries: 34 / 19484


## 2. Feature Engineering (Global)

In [3]:
# 1. Experience & Age
df['first_year'] = df.groupby('player_name')['year'].transform('min')
df['experience'] = df['year'] - df['first_year']

# Handle missing Age (Fallback: 22 + Experience)
if 'age' not in df.columns:
    df['age'] = np.nan
    
df['age'] = df['age'].fillna(22 + df['experience'])

# 2. Cumulative Workload
df['cumulative_AV'] = df.groupby('player_name')['AV_Proxy'].cumsum().fillna(0)

# 3. Trend Metrics
def get_slope(series):
    if len(series) < 2: return 0.0
    series = series.fillna(0)
    slope, _, _, _, _ = linregress(range(len(series)), series)
    return slope

df['efficiency_trend_3yr'] = df.groupby('player_name')['efficiency'].transform(lambda x: x.rolling(3).apply(get_slope).fillna(0))
df['efficiency_prev'] = df.groupby('player_name')['efficiency'].shift(1).fillna(0)

# 4. Future Context (Next Year's Cost)
df['cap_pct_next'] = df.groupby('player_name')['cap_pct'].shift(-1).fillna(0)
df['efficiency_next'] = df.groupby('player_name')['efficiency'].shift(-1) # Can be NaN

# Define Target (Training Only)
df['is_liability_next'] = ((df['efficiency_next'] < 1.0) & (df['cap_pct_next'] > 3.0)).astype(int)

print("Feature Engineering Complete.")

Feature Engineering Complete.


## 3. Model Training (2015-2023)

In [4]:
# Modeling Data: Only years where we know the OUTCOME (Next year exists and is not future)
model_df = df[(df['year'] < 2024) & (df['is_future'] == False)].dropna(subset=['efficiency_next']).copy()

features = ['cap_pct', 'efficiency', 'efficiency_prev', 'efficiency_trend_3yr', 'age', 'experience', 'cumulative_AV']
target = 'is_liability_next'

# Train: 2015-2021 | Test: 2022-2023
train = model_df[model_df['year'] < 2022]
test = model_df[model_df['year'] >= 2022]

X_train = train[features]
y_train = train[target]
X_test = test[features]
y_test = test[target]

# Train Random Forest
clf = RandomForestClassifier(n_estimators=100, max_depth=6, random_state=42, class_weight='balanced')
clf.fit(X_train, y_train)

# Eval
y_probs = clf.predict_proba(X_test)[:, 1]
print(f"ROC AUC Score: {roc_auc_score(y_test, y_probs):.3f}")

# Feature Importance
importances = pd.DataFrame({'feature': features, 'importance': clf.feature_importances_}).sort_values('importance', ascending=False)
print("\nFeature Importance:")
print(importances)

ROC AUC Score: 0.939

Feature Importance:
                feature  importance
0               cap_pct    0.539438
6         cumulative_AV    0.221672
1            efficiency    0.115201
3  efficiency_trend_3yr    0.043621
2       efficiency_prev    0.036626
5            experience    0.022374
4                   age    0.021067


## 6. Application: The 2025 Risk Report
Predicting liability for the 2025 season using 2024 data (including AGE).

In [5]:
# Inference Cohort: 2024 active players
inference_df = df[df['year'] == 2024].copy()

X_inf = inference_df[features].fillna(0)

inference_df['prob_decline_2025'] = clf.predict_proba(X_inf)[:, 1]
inference_df['cap_hit_2025_projected'] = df.groupby('player_name')['cap_hit_m'].shift(-1)

traps = inference_df[ 
    (inference_df['cap_hit_2025_projected'] > 5.0) & 
    (inference_df['prob_decline_2025'] > 0.60)
].sort_values(['prob_decline_2025', 'cap_hit_2025_projected'], ascending=False)

print("TOP 20 PROJECTED 'CAP TRAPS' FOR 2025 (With Age Data):")
display_cols = ['player_name', 'team', 'position', 'age', 'efficiency', 'cap_hit_2025_projected', 'prob_decline_2025']
print(traps[display_cols].head(20))

buyer_beware = inference_df[
    (inference_df['prob_decline_2025'] > 0.80) & 
    (inference_df['cap_hit_2025_projected'].isna() | (inference_df['cap_hit_2025_projected'] < 1.0))
].sort_values('prob_decline_2025', ascending=False)

print("\nBUYER BEWARE (Likely Free Agents / High Risk):")
print(buyer_beware[['player_name', 'position', 'age', 'efficiency', 'prob_decline_2025']].head(10))

TOP 20 PROJECTED 'CAP TRAPS' FOR 2025 (With Age Data):
              player_name team position   age  efficiency  \
19069          Kyle Pitts  ATL       TE  25.0    0.814625   
19072       Travon Walker  JAX    ED/DE  24.0    0.917107   
19084    Aidan Hutchinson  DET    ED/DE  24.0    0.508701   
19087  Patrick Surtain II  DEN       CB  25.0    0.781787   
19112   Kayvon Thibodeaux  NYG   LB/OLB  24.0    0.549825   
19124       Will Anderson  HOU    ED/DE  23.0    1.008429   
18977     Jeffery Simmons  TEN    DL/DT  27.0    0.370368   
18950         Maxx Crosby   LV    DL/DE  27.0    0.241443   
19017           Nick Bosa   SF    ED/DE  27.0    0.595370   
18952          Joe Burrow  CIN       QB  26.0    1.233731   
19052           Cole Kmet  CHI       TE  26.0    0.568045   
18986    Quinnen Williams  NYJ    DL/DT  27.0    0.275431   
19058         Rashan Gary   GB    ED/DE  27.0    0.632420   
18966      Terry McLaurin  WAS       WR  27.0    0.745217   
19029      Jaylon Johnson  CHI