# 📊 NFL Combine & Draft Analytics
### Predicting NFL Success from Combine Metrics
This notebook explores the relationship between NFL Combine performance and career success.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import requests
from bs4 import BeautifulSoup
import glob
import re

## 📂 Load Dataset
We'll load a dataset containing NFL Combine stats taken from [ProFootball Reference](https://www.pro-football-reference.com/draft/2024-combine.htm) from 2000-2024.

In [2]:
# Define the path where your CSV files are stored
csv_path = 'data/*.csv'

# Get a list of all CSV files
all_csvs = glob.glob(csv_path)

# Initialize empty lists for the two types of files
combine_files = []
nfl_files = []
college_files = []

# Separate the files into two lists based on the naming pattern
for file in all_csvs:
    if re.search(r'\d{4}(NFL)\.csv', file):  # Regex for year and optional NFL suffix
        nfl_files.append(file)
    elif re.search(r'\d{4}_(clg)\.csv', file):
        college_files.append(file)
    else:
        combine_files.append(file)

# Load the CSVs into DataFrames
raw_combine = pd.concat([pd.read_csv(file) for file in combine_files], ignore_index=True)

nfl_dfs = []
for file in nfl_files:
    year = int(file[5:9])
    temp = pd.read_csv(file)
    temp['Year'] = year
    nfl_dfs.append(temp)

raw_nfl = pd.concat(nfl_dfs, ignore_index=True)

college_dfs = []
for file in college_files:
    year = int(file[5:9])
    temp = pd.read_csv(file)
    temp['Year'] = year
    college_dfs.append(temp)

raw_college = pd.concat(college_dfs, ignore_index=True)
raw_college.sort_values('Year')


Unnamed: 0,player,player_id,position,team_name,player_game_count,avg_depth_of_target,avoided_tackles,caught_percent,contested_catch_rate,contested_receptions,...,targets,touchdowns,wide_rate,wide_snaps,yards,yards_after_catch,yards_after_catch_per_reception,yards_per_reception,yprr,Year
13097,Prince-Tyson Gulley,13271,HB,SYRACUSE,12,0.5,6,63.6,,0,...,33,0,0.7,2,107,132,6.3,5.1,0.61,2014
13260,Jared Cornelius,47451,WR,ARKANSAS,12,8.7,2,62.5,,0,...,24,2,6.0,9,196,110,7.3,13.1,1.37,2014
13261,Shaun Wilson,45993,HB,DUKE,12,1.5,5,75.0,,0,...,24,1,4.2,5,179,178,9.9,9.9,1.88,2014
13262,AJ Ouellette,45974,HB,OHIO,10,0.7,3,87.5,,0,...,24,3,3.2,7,133,146,7.0,6.3,1.06,2014
13263,Robert Johnson,14752,WR,MISS STATE,12,13.0,1,50.0,,0,...,24,1,87.6,212,210,86,7.2,17.5,0.91,2014
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21948,Marlin Klein,162818,TE,MICHIGAN,12,9.0,0,57.1,20.0,1,...,21,0,14.3,22,101,44,3.7,8.4,0.85,2024
21949,Joshua Manning,124519,WR,MISSOURI,12,7.6,5,61.9,100.0,2,...,21,1,73.0,119,192,122,9.4,14.8,1.23,2024
21950,Bryson Donelson,191080,HB,FRESNO ST,11,-0.6,2,81.0,100.0,2,...,21,0,15.4,20,54,74,4.4,3.2,0.53,2024
21944,Tyler Savage,123832,TE,E CAROLINA,12,8.2,1,66.7,50.0,1,...,21,1,1.1,2,193,80,5.7,13.8,1.58,2024


## 🧹 Data Cleaning & Preprocessing
Expand the "Drafted" column and convert relevant columns to numeric and handle missing values.

In [3]:
def height_to_inches(height_str):
    if height_str == 'nan':  # Check if the value is NaN
        return np.nan  # Or use None or 0 depending on your preference
    feet, inches = height_str.split('-')
    return int(feet) * 12 + int(inches)  # Convert feet to inches and add the extra inches


combine = raw_combine.copy()
combine[["Team", "Round", "Pick", "Year"]] = combine["Drafted (tm/rnd/yr)"].str.split(" / ", expand=True)
combine = combine[combine['Pos'] == 'WR']
combine.drop(columns=['Drafted (tm/rnd/yr)', 'College', 'Player-additional'], inplace=True)
combine['Ht'] = combine['Ht'].astype(str).apply(height_to_inches)
combine['Round'] = combine['Round'].astype(str).apply(lambda x: np.nan if x == 'nan' else int(x[0]))
combine['Pick'] = combine['Pick'].apply(lambda x: int(re.sub(r'\D', '', x)) if pd.notna(x) else x)
combine.set_index('Player', inplace=True)

nfl = raw_nfl.copy()
nfl = nfl[nfl['Pos'] == 'WR']

nfl
nfl[nfl['Year'] == 2024]
combine


Unnamed: 0_level_0,Pos,School,Ht,Wt,40yd,Vertical,Bench,Broad Jump,3Cone,Shuttle,Team,Round,Pick,Year
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Danny Amendola,WR,Texas Tech,70.0,183.0,4.58,27.5,13.0,103.0,6.81,4.25,,,,
Adrian Arrington,WR,Michigan,75.0,203.0,4.55,,,,,,New Orleans Saints,7.0,237.0,2008
Donnie Avery,WR,Houston,71.0,192.0,4.43,,16.0,,,,St. Louis Rams,2.0,33.0,2008
Earl Bennett,WR,Vanderbilt,71.0,209.0,4.48,26.0,15.0,110.0,7.15,4.22,Chicago Bears,3.0,70.0,2008
Davone Bess,WR,Hawaii,70.0,194.0,4.64,31.5,12.0,118.0,6.97,4.27,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Paris Warren,WR,Utah,72.0,219.0,4.71,31.5,,,,,Tampa Bay Buccaneers,7.0,225.0,2005
Isaac West,WR,Furman,72.0,187.0,4.46,37.0,,124.0,7.21,4.22,,,,
Roddy White,WR,Ala-Birmingham,73.0,207.0,4.46,,,,,,Atlanta Falcons,1.0,27.0,2005
Mike Williams,WR,USC,77.0,229.0,4.56,36.5,,,,,Detroit Lions,1.0,10.0,2005


# Receiver Efficiency and Impact Score (REIS)

## Introduction  
The **Receiver Efficiency and Impact Score (REIS)** is a metric designed to evaluate wide receivers based on their efficiency, impact, and volume. It balances raw production with advanced efficiency measures to provide a **comprehensive** assessment of a receiver's performance.

## Components  

1. **Reception Efficiency** (How well a receiver converts targets into yards)  
   Reception Efficiency = `Ctch% * Y/Tgt`

2. **First Down Impact** (How often receptions result in a first down)  
   First Down Impact = `1D / Rec`

3. **Touchdown Rate** (Scoring efficiency per target)  
   TD Rate = `TD / Tgt`

4. **Volume Adjustment** (How involved a receiver is per game)  
   Volume Adj. = `Rec / G`

## Final Formula  
To balance efficiency, impact, and volume, we use weighted components:

REIS = `0.4 * Reception Efficiency + 0.3 * First Down Impact + 0.2 * TD Rate + 0.1 * Volume Adj.`

## Why Use REIS?  
- **Efficiency-focused**: Prioritizes yards per target and catch percentage.  
- **Impact-driven**: Values first downs and touchdowns.  
- **Volume-aware**: Adjusts for players with higher workloads.  
- **Requires no team-level stats**, making it widely applicable.  

This metric provides a **balanced** view of wide receiver performance beyond just raw stats like yards and receptions.

In [4]:
from sklearn.preprocessing import StandardScaler

std_nfl = nfl.copy()
features = ['Y/G', 'R/G', 'Y/R', 'TD/G', 'G']

std_nfl['TD/G'] = std_nfl['TD'] / std_nfl['G']


scaler = StandardScaler()
std_nfl[features] = scaler.fit_transform(std_nfl[features])

# Weights for each standardized stat
weights = {
    'Y/G': 0.40,  
    'R/G': 0.20,  
    'Y/R': 0.10,  
    'TD/G': 0.20,  
    'G': 0.10    
}


std_nfl['RPI'] = (
    (weights['Y/G'] * std_nfl['Y/G']) +
    (weights['R/G'] * std_nfl['R/G']) +
    (weights['Y/R'] * std_nfl['Y/R']) +
    (weights['TD/G'] * std_nfl['TD/G']) +
    (weights['G'] * std_nfl['G'])
)

# Show the top 20 players
std_nfl.sort_values('RPI', ascending=False).head(10)


Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,Tgt,Rec,Yds,...,R/G,Y/G,Ctch%,Y/Tgt,Fmb,Awards,-9999,Year,TD/G,RPI
1727,1.0,Cooper Kupp,28.0,LAR,WR,1.117315,17.0,191.0,145.0,1947.0,...,3.465692,3.462389,75.9,10.2,0.0,PBAP-1AP MVP-3AP OPoY-1,KuppCo00,2021,3.81784,2.967175
7112,2.0,Davante Adams,28.0,GNB,WR,0.486123,14.0,149.0,115.0,1374.0,...,3.300342,2.800813,77.2,9.2,1.0,PBAP-1,AdamDa01,2020,5.519651,2.914807
2410,9.0,Randy Moss,30.0,NWE,WR,0.906918,16.0,160.0,98.0,1493.0,...,2.142894,2.607181,61.3,9.3,0.0,PBAP-1AP OPoY-2AP CPoY-2,MossRa00,2007,6.269382,2.868095
11349,9.0,Odell Beckham Jr.,22.0,NYG,WR,0.065329,11.0,130.0,91.0,1305.0,...,2.969643,3.232451,70.0,10.0,1.0,PBAP ORoY-1,BeckOd00,2014,4.108393,2.748049
5416,2.0,Randy Moss,26.0,MIN,WR,0.906918,16.0,172.0,111.0,1632.0,...,2.583826,2.958139,64.5,9.5,1.0,PBAP-1,MossRa00,2003,4.417106,2.715572
4221,2.0,Tyreek Hill,29.0,MIA,WR,0.906918,16.0,171.0,119.0,1799.0,...,2.859409,3.377675,69.6,10.5,1.0,PBAP-1AP MVP-6AP OPoY-2,HillTy00,2023,3.182256,2.700043
14183,1.0,Ja'Marr Chase,24.0,CIN,WR,1.117315,16.0,175.0,127.0,1708.0,...,2.914526,2.897629,72.6,9.8,0.0,PBAP-1AP MVP-8AP OPoY-3,ChasJa00,2024,4.108393,2.689149
4824,4.0,Calvin Johnson,26.0,DET,WR,0.906918,16.0,158.0,96.0,1681.0,...,2.087777,3.083193,60.8,10.6,1.0,PBAP-1,JohnCa00,2011,4.108393,2.664209
3605,1.0,Antonio Brown,27.0,PIT,WR,0.906918,16.0,193.0,136.0,1834.0,...,3.465692,3.466423,70.5,9.5,3.0,PBAP-1AP OPoY-2,BrowAn04,2015,2.256118,2.637532
11341,1.0,Antonio Brown,26.0,PIT,WR,0.906918,16.0,181.0,129.0,1698.0,...,3.245226,3.123533,71.3,9.4,2.0,PBAP-1AP OPoY-3,BrowAn04,2014,3.182256,2.635128


How well does REIS measure a wr seasonal performance?

In [5]:
import plotly.express as px

# Scatter plot: RPI vs Yards
fig1 = px.scatter(
    std_nfl, x="Yds", y="RPI", hover_name="Player",
    title="RPI Score vs Total Yards",
    labels={"Yds": "Total Yards", "RPI": "RPI Score"},
    opacity=0.7
)
fig1.show()

# Scatter plot: RPI vs Touchdowns
fig2 = px.scatter(
    std_nfl, x="TD", y="RPI", hover_name="Player",
    title="RPI Score vs Total Touchdowns",
    labels={"TD": "Total Touchdowns", "RPI": "RPI Score"},
    opacity=0.7
)
fig2.show()


In [6]:
import pandas as pd
import plotly.express as px

# Make copies to avoid modifying the original data
combine_copy = combine.copy()
nfl_copy = std_nfl.copy()

# Compute the average RPI per player across all seasons
nfl_avg_RPI = nfl_copy.groupby("Player", as_index=False)["RPI"].mean()

# Merge with combine data on player names
df_merged = combine_copy.merge(nfl_avg_RPI, on="Player", how="inner")

# Handle draft pick NaNs and convert to numeric where possible
df_merged['Pick'] = df_merged['Pick'].fillna("Undrafted")
df_merged['Pick_numeric'] = pd.to_numeric(df_merged['Pick'], errors='coerce')

# Create an interactive scatter plot
fig = px.scatter(
    df_merged, x="Pick_numeric", y="RPI", hover_name="Player",
    title="Draft Pick vs Average RPI Score",
    labels={"Pick_numeric": "Draft Pick", "RPI": "Avg RPI Score"},
    opacity=0.7
)

# Reverse x-axis so lower picks appear on the left
fig.update_layout(xaxis=dict(autorange="reversed"))

# Show the plot
fig.show()