# ECON 0150 | Replication Notebook

**Title:** NBA Draft Position and Player Success

**Original Authors:** Burkardt; Gerardi; Shanken

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Does draft position predict NBA player success?

**Data Source:** NBA draft picks and Box Plus/Minus (BPM) statistics (2010-2025)

**Methods:** OLS regression of BPM on draft pick number

**Main Finding:** Negative relationship: higher draft picks (lower numbers) tend to have better performance (higher BPM).

**Course Concepts Used:**
- Simple linear regression
- Scatter plots with regression lines
- Box plots by category
- Residual analysis

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0043/data/'

data = pd.read_csv(base_url + 'cleaned_data.csv')

print(f"Number of observations: {len(data)}")
data.head()

---
## Step 1 | Data Exploration

In [None]:
# Check column names
print("Columns:", data.columns.tolist())

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Pick', 'BPM']].describe())

In [None]:
# Average BPM by draft position groups
data['Pick_Group'] = pd.cut(data['Pick'], bins=[0, 5, 14, 30, 60], 
                            labels=['Lottery (1-5)', 'Mid-Lottery (6-14)', 
                                    'First Round (15-30)', 'Second Round (31-60)'])
print("\nAverage BPM by Draft Position Group:")
print(data.groupby('Pick_Group')['BPM'].mean().round(2))

---
## Step 2 | Visualization

In [None]:
# Scatter plot: Pick vs BPM
plt.figure(figsize=(10, 6))
sns.regplot(x='Pick', y='BPM', data=data, scatter_kws={'alpha': 0.5})
plt.title('Correlation between Draft Pick and Average BPM')
plt.xlabel('Draft Pick Number')
plt.ylabel('Box Plus/Minus (BPM)')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Box plot for top 10 picks
top_10 = data[data['Pick'] <= 10].copy()

plt.figure(figsize=(12, 6))
sns.boxplot(data=top_10, x='Pick', y='BPM', order=range(1, 11))
plt.title('BPM Distribution for Top 10 Draft Picks')
plt.xlabel('Draft Pick')
plt.ylabel('Box Plus/Minus (BPM)')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 3 | Statistical Analysis

In [None]:
# OLS Regression
X = data['Pick']
X = sm.add_constant(X)
y = data['BPM']

model = sm.OLS(y, X).fit()
print(model.summary())

In [None]:
# Residual plot
fitted_vals = model.fittedvalues
residuals = model.resid

plt.figure(figsize=(10, 5))
sns.scatterplot(x=fitted_vals, y=residuals, alpha=0.5)
plt.axhline(0, color='red', linestyle='--')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residuals vs Fitted Values')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: Draft pick does not predict BPM (beta = 0)")
print(f"\nIntercept: {model.params['const']:.4f}")
print(f"Pick coefficient: {model.params['Pick']:.4f}")
print(f"\nInterpretation:")
print(f"  Each position later in the draft is associated with")
print(f"  a {abs(model.params['Pick']):.3f} decrease in BPM")
print(f"\nR-squared: {model.rsquared:.3f}")
print(f"P-value: {model.pvalues['Pick']:.4f}")

---
## Step 4 | Results Interpretation

### Key Findings

1. **Negative Relationship:** Higher draft picks (lower numbers) tend to have better career performance

2. **Draft Position Matters:** Being picked earlier in the draft predicts better statistical performance

3. **Large Variance:** Even at the same draft position, there's substantial variation in outcomes

### Why Does Draft Position Predict Success?

- **Scouting accuracy:** Teams are generally good at identifying talent
- **Self-fulfilling prophecy:** High picks get more playing time and development
- **Selection effects:** Best college players get drafted earlier

### Cautions

- **Survivorship bias:** Players who don't make the NBA aren't in the data
- **Playing time effects:** High picks get more opportunities to accumulate stats
- **Year effects:** Draft strength varies by year

---
## Replication Exercises

### Exercise 1: Lottery vs Non-Lottery
Compare average BPM for lottery picks (1-14) vs non-lottery picks.

### Exercise 2: Time Trends
Has the predictive power of draft position changed over time?

### Exercise 3: Position Effects
Does the relationship differ by player position?

### Challenge Exercise
Research the NBA draft literature. What factors (beyond college stats) predict NBA success?

In [None]:
# Your code for exercises

# Example: Lottery vs non-lottery
# data['Lottery'] = data['Pick'] <= 14
# print(data.groupby('Lottery')['BPM'].mean())