# ECON 0150 | Replication Notebook

**Title:** QB College Years and NFL Contract

**Original Authors:** Charles

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Does the number of years a quarterback plays in college predict their largest NFL contract (success)?

**Data Source:** NFL quarterback contract data (2016-2024 draft classes)

**Methods:** OLS regression: Largest_Contract ~ Years_in_College

**Main Finding:** No significant relationship between college years and NFL contract value (p = 0.637, R² = 0.007).

**Course Concepts Used:**
- Simple linear regression
- Scatter plots with regression lines
- Hypothesis testing
- Sports analytics

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0060/data/'

data = pd.read_csv(base_url + 'qb_contracts.csv')

print(f"Number of quarterbacks: {len(data)}")
data.head(10)

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("Columns:", data.columns.tolist())
print(f"\nData shape: {data.shape}")

In [None]:
# Convert contract to millions for easier interpretation
data['Contract_Millions'] = data['Largest_Contract'] / 1e6

# Drop any missing values
data = data.dropna()

print(f"\nCleaned data: {len(data)} observations")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Years_in_College', 'Contract_Millions']].describe())

In [None]:
# Group by years in college
print("\nAverage Contract by Years in College:")
print(data.groupby('Years_in_College')['Contract_Millions'].agg(['mean', 'count']))

In [None]:
# Correlation
correlation = data['Years_in_College'].corr(data['Contract_Millions'])
print(f"\nCorrelation between college years and contract: {correlation:.3f}")

---
## Step 3 | Visualization

In [None]:
# Box plot: Contract by years in college
plt.figure(figsize=(10, 6))
sns.boxplot(x='Years_in_College', y='Contract_Millions', data=data)
plt.title('NFL Contract Value by Years Played in College')
plt.xlabel('Years in College')
plt.ylabel('Largest Contract ($ Millions)')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x='Years_in_College', y='Contract_Millions', data=data,
            scatter_kws={'alpha': 0.7, 's': 80})
plt.title('Years in College vs. NFL Contract Value')
plt.xlabel('Years in College')
plt.ylabel('Largest Contract ($ Millions)')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Contract_Millions ~ Years_in_College', data=data).fit()
print("OLS Regression: Contract_Millions ~ Years_in_College")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: College years do not predict NFL contract (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: ${model.params['Intercept']:.1f} million")
print(f"  Years coefficient: ${model.params['Years_in_College']:.1f} million per year")
print(f"  P-value: {model.pvalues['Years_in_College']:.3f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nConclusion: FAIL TO REJECT null hypothesis")
print(f"  Years in college does NOT significantly predict NFL contract value")

---
## Step 5 | Results Interpretation

### Key Findings

| Statistic | Value |
|-----------|-------|
| Coefficient | ~$10.5M per year |
| P-value | 0.637 |
| R-squared | 0.007 |

1. **No Significant Relationship:** Years in college does not predict NFL contract value

2. **Very Low R²:** College experience explains less than 1% of contract variance

3. **High Variability:** QBs with 1-4 years can all get major contracts

### Why Doesn't College Experience Predict NFL Success?

- **Physical traits:** Arm strength, accuracy, mobility are more important
- **Game intelligence:** Football IQ develops differently for each player
- **Team context:** Supporting cast affects perceived value
- **Draft position:** Earlier picks get bigger contracts regardless of college tenure
- **Small sample:** Only ~34 QBs in dataset, high variance

### Notable Examples

- **Patrick Mahomes (3 years):** $450M contract
- **Josh Allen (2 years):** $330M contract  
- **Baker Mayfield (4 years):** $100M contract

### Policy Implication

NFL teams should not heavily weight college tenure in QB evaluation - talent and potential matter more than experience.

---
## Replication Exercises

### Exercise 1: Draft Position
Does draft position (1st overall vs later picks) predict contract value better than college years?

### Exercise 2: Top Contracts Only
Filter to QBs with contracts > $100M. Does the relationship change?

### Exercise 3: Era Effects
Split by draft year (pre-2020 vs 2020+). Has the relationship changed over time?

### Challenge Exercise
Research the economics of NFL contracts. Why do QBs command such high salaries?

In [None]:
# Your code for exercises

# Example: Distribution of years in college
# print(data['Years_in_College'].value_counts().sort_index())