# ECON 0150 | Replication Notebook

**Title:** NFL Scoring and Weather

**Original Authors:** Olijar

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** How does NFL scoring relate to bad weather days?

**Data Source:** 2024 NFL game data including point totals, temperature, wind, and weather conditions

**Methods:** OLS regression: Point_Total ~ Bad_Weather (binary)

**Main Finding:** Bad weather is associated with 4.3 fewer points per game (p = 0.004), suggesting weather significantly impacts scoring.

**Course Concepts Used:**
- Binary independent variable
- Simple linear regression
- Box plots for group comparison
- Sports analytics

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replication/replications/0052/data/'

data = pd.read_csv(base_url + 'Econ Data Analysis NFL Project.csv')

print(f"Number of games: {len(data)}")
data.head()

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("Columns:", data.columns.tolist())

In [None]:
# Clean and prepare data
# Create binary bad weather variable
if 'Bad Weather' in data.columns:
    data['Bad_Weather'] = data['Bad Weather'].replace({'No': 0, 'Yes': 1})
elif 'Bad_Weather' not in data.columns:
    # Create from temperature and wind if not available
    data['Bad_Weather'] = 0

# Use Point Total column
if 'Point Total' in data.columns:
    data['Point_Total'] = data['Point Total']

# Drop missing values
data = data.dropna(subset=['Point_Total', 'Bad_Weather'])

print(f"\nGames with bad weather: {data['Bad_Weather'].sum()}")
print(f"Games with good weather: {(data['Bad_Weather'] == 0).sum()}")

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics by weather condition
print("Point Totals by Weather Condition:")
print(data.groupby('Bad_Weather')['Point_Total'].describe())

In [None]:
# Mean comparison
print("\nMean Point Totals:")
print(f"Good weather: {data[data['Bad_Weather'] == 0]['Point_Total'].mean():.1f}")
print(f"Bad weather: {data[data['Bad_Weather'] == 1]['Point_Total'].mean():.1f}")
print(f"Difference: {data[data['Bad_Weather'] == 0]['Point_Total'].mean() - data[data['Bad_Weather'] == 1]['Point_Total'].mean():.1f} points")

---
## Step 3 | Visualization

In [None]:
# Box plot: Point totals by weather
plt.figure(figsize=(10, 6))
sns.boxplot(x='Bad_Weather', y='Point_Total', data=data)
plt.xticks([0, 1], ['Good Weather', 'Bad Weather'])
plt.title('2024 NFL Point Totals by Weather Condition')
plt.xlabel('Weather Condition')
plt.ylabel('Total Points Scored')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='Bad_Weather', y='Point_Total', alpha=0.5)

# Add regression line
model = smf.ols('Point_Total ~ Bad_Weather', data=data).fit()
plt.axline(xy1=(0, model.params['Intercept']), slope=model.params['Bad_Weather'], color='red')

plt.xticks([0, 1], ['Good Weather', 'Bad Weather'])
plt.title('NFL Point Totals by Weather')
plt.xlabel('Weather Condition')
plt.ylabel('Total Points')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Point_Total ~ Bad_Weather', data=data).fit()
print("OLS Regression: Point_Total ~ Bad_Weather")
print(model.summary().tables[1])

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: Weather does not affect NFL scoring (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept (good weather mean): {model.params['Intercept']:.2f} points")
print(f"  Bad weather effect: {model.params['Bad_Weather']:.2f} points")
print(f"  P-value: {model.pvalues['Bad_Weather']:.4f}")
print(f"\nInterpretation:")
print(f"  Games played in bad weather have {abs(model.params['Bad_Weather']):.1f} fewer points on average")
print(f"  This difference is statistically significant (p < 0.05)")

---
## Step 5 | Results Interpretation

### Key Findings

| Variable | Coefficient | P-value |
|----------|-------------|--------|
| Intercept | ~46.6 | < 0.001 |
| Bad Weather | ~-4.3 | 0.004 |

1. **Significant Effect:** Bad weather reduces scoring by about 4.3 points per game

2. **Practical Significance:** A 4-point difference can easily change game outcomes

3. **Good Weather Average:** Teams score about 46-47 total points in good conditions

### Why Does Weather Affect Scoring?

- **Passing game:** Wind and rain make passing more difficult
- **Ball handling:** Wet conditions lead to more fumbles
- **Field conditions:** Slippery turf affects player performance
- **Kicking:** Wind affects field goals and punts
- **Strategy:** Teams may run more conservative offenses

### Sports Betting Implications

- Weather is a factor in setting over/under lines
- Bad weather games tend to go "under" more often
- Wind may affect over/under more than rain or cold

---
## Replication Exercises

### Exercise 1: Types of Bad Weather
Does rain affect scoring differently than cold? Break down the "bad weather" variable.

### Exercise 2: Wind Analysis
Is there a linear relationship between wind speed and scoring?

### Exercise 3: Dome vs Outdoor
Compare scoring in dome stadiums vs outdoor stadiums.

### Challenge Exercise
Research sports analytics literature. What other factors affect NFL scoring?

In [None]:
# Your code for exercises

# Example: Distribution of point totals
# plt.figure(figsize=(10, 6))
# plt.hist(data['Point_Total'], bins=20, edgecolor='black')
# plt.xlabel('Total Points')
# plt.ylabel('Frequency')
# plt.title('Distribution of NFL Game Point Totals')
# plt.show()