# 2025 NFL BIG BOWL: Pre-snap to post-snap predictions
The National Football League (NFL) is back with another Big Data Bowl, where contestants use Next Gen Stats player tracking data to generate actionable, creative, and novel stats. Previous iterations have analyzed running backs, defensive backs, special teams, pass rush plays, and tackling, and have generated metrics that have been used on television and by NFL teams.

This year's competition turns to a new type of data -- what happens before the snap -- to generate creative insights and actionable predictions into what the offense or defense does after the snap.

## Description


NFL offenses have 40 seconds in which to run a play. That time begins with substitutions, as players run on and off the field until both teams' personnel are configured. It continues into the play call, where both the offensive and defensive units learn their formation and assignments. It ends with myriad strategic decisions by the 22 players on the field, including motion, shifts, and alignment changes, designed to both confuse the opponent and capitalize on any advantages.

In all that action prior to the snap, both teams likely divulge patterns in what players will do after the snap. The goal of this year's competition aims to tell us just what those patterns are.

### Examples to consider

Your challenge is generating actionable, practical, and novel insights from player tracking data corresponding to pre-snap team and player tendencies. Examples include, but are not limited to:

- Play prediction (run v pass)
- Scheme prediction (blitzes, run fits, route combinations, etc)
- Player prediction (pass patterns, blocking assignments, etc)
Note that the above list is not exhaustive, and we encourage participants to be creative with their submissions.

## Evaluation

### Submission tracks

Participants will select one of three tracks in which to submit.

- Undergraduate track. Open only to groups or individuals composed entirely of undergraduate students. Verification may be required to prove eligibility.
- Metric track. Leverage pre-snap data to assess team or player performance and/or strategy to create a post-snap outcome. You may focus on offensive or defensive players, teams, or individuals. In general, the more narrow the focus, the better the paper.
- Coaching presentation track. Analyze and present data in a submission designed for coaches (e.g, a scouting report). We encourage participants interested in this track to partner with a coach (or current/former player), though this isn’t required.

All submissions must explicitly state which track they are submitting to, and participants may not submit to multiple tracks.

Note: For this year, the Coaching Presentation Track will be requiring submissions in the form of a slide presentation saved as a PDF. Please upload your PDF slides as a Kaggle Dataset, and ensure that dataset is public prior to the submission deadline of January 6, 2025. In your submission form, please provide the URL to the Kaggle Dataset for review.

# IMPORT CODE LIBRARIES

In [1]:
# CONDA ENVIRONMENT
# conda env create -f nfl.yml

%time   # CPU time
# LOAD STANDARD LIBRARIES

import numpy as np
from numpy import nan
import pandas as pd


import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import matplotlib.colors as mcolors


#import chart_studio.plotly as py
import seaborn as sns


import plotly.express as px  
import plotly.io as pio
import plotly.graph_objects as go


import pickle
import os
import sys
from timeit import default_timer as timer
import datetime
import math



CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 2.86 µs


In [20]:
inputFile = './data/games.csv'
df = pd.read_csv(inputFile)
print('filename:', inputFile)
print('rows:', df.shape[0])
print('columns:', df.shape[1])
print(df.columns.tolist())

filename: ./data/games.csv
rows: 136
columns: 9
['gameId', 'season', 'week', 'gameDate', 'gameTimeEastern', 'homeTeamAbbr', 'visitorTeamAbbr', 'homeFinalScore', 'visitorFinalScore']


In [22]:
dataFiles = {}


for dirname, _, filenames in os.walk('./data'):
    for filename in filenames:
       # print("\n",os.path.join(dirname, filename),"\n")
        inputData = os.path.join(dirname, filename)
        df = pd.read_csv(inputData)
        print()
        print('filename:', filename)
        print('rows:', df.shape[0])
        print('columns:', df.shape[1])
        print(df.columns.tolist())
        print('\n*****\n')
        #dataFiles[filename] = df.shape[0]
        
#print(dataFiles)


filename: plays.csv
rows: 16124
columns: 50
['gameId', 'playId', 'playDescription', 'quarter', 'down', 'yardsToGo', 'possessionTeam', 'defensiveTeam', 'yardlineSide', 'yardlineNumber', 'gameClock', 'preSnapHomeScore', 'preSnapVisitorScore', 'playNullifiedByPenalty', 'absoluteYardlineNumber', 'preSnapHomeTeamWinProbability', 'preSnapVisitorTeamWinProbability', 'expectedPoints', 'offenseFormation', 'receiverAlignment', 'playClockAtSnap', 'passResult', 'passLength', 'targetX', 'targetY', 'playAction', 'dropbackType', 'dropbackDistance', 'passLocationType', 'timeToThrow', 'timeInTackleBox', 'timeToSack', 'passTippedAtLine', 'unblockedPressure', 'qbSpike', 'qbKneel', 'qbSneak', 'rushLocationType', 'penaltyYards', 'prePenaltyYardsGained', 'yardsGained', 'homeTeamWinProbabilityAdded', 'visitorTeamWinProbilityAdded', 'expectedPointsAdded', 'isDropback', 'pff_runConceptPrimary', 'pff_runConceptSecondary', 'pff_runPassOption', 'pff_passCoverage', 'pff_manZone']

*****


filename: tracking_week_