A custom Expected Points Added (EPA) model for NFL play-by-play analysis.
This project builds an EPA model from scratch using machine learning to predict expected points for any game situation, then calculates EPA as the difference between expected points after and before each play.
pip install -r requirements.txtpython main.py train --seasons 2020 2021 2022 2023python main.py calculate --season 2023 --output epa_results.csvepa/
├── requirements.txt
├── README.md
├── src/
│ ├── __init__.py
│ ├── data_loader.py # Load and cache NFL play-by-play data
│ ├── feature_engineering.py # Create model features from raw data
│ ├── model.py # Train expected points model
│ └── epa_calculator.py # Calculate EPA for plays
├── models/ # Saved trained models
└── main.py # Entry point for training and analysis
-
Expected Points (EP): For any game situation (down, distance, yard line), the model predicts the expected points that will be scored next (relative to the team with possession).
-
EPA Calculation:
EPA = EP_after_play - EP_before_play- A positive EPA means the play improved the team's scoring expectation
- A negative EPA means the play hurt the team's scoring expectation
yardline_100: Distance from opponent's end zone (0-100)down: Current down (1-4)ydstogo: Yards to first downhalf_seconds_remaining: Time remaining in halfscore_differential: Current score differenceposteam_timeouts_remaining: Timeouts remaining
- Player evaluation (QB EPA, receiver EPA, etc.)
- Team performance analysis
- Game predictions and win probability