# ECON 0150 | Replication Notebook

**Title:** ACC Football Attendance and Winning

**Original Authors:** Borger

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Does average attendance for a football team in the ACC correlate to a higher winning percentage?

**Data Source:** ACC Football team attendance and winning percentage data (2024 season, 17 teams)

**Methods:** OLS regression: Winning_Percentage ~ Average_Attendance

**Main Finding:** No significant relationship between attendance and winning (p = 0.52, R² = 0.028).

**Course Concepts Used:**
- Simple linear regression
- Scatter plots with regression lines
- Bar charts
- Sports analytics

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replication/replications/0064/data/'

data = pd.read_csv(base_url + 'acc_football.csv')

print(f"Number of teams: {len(data)}")
data

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("Columns:", data.columns.tolist())
print(f"\nData shape: {data.shape}")

In [None]:
# Convert attendance to thousands for easier interpretation
data['Attendance_Thousands'] = data['Attendance'] / 1000

print("\nData prepared. No missing values.")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Attendance', 'Winning_Pct']].describe())

In [None]:
# Correlation
correlation = data['Attendance'].corr(data['Winning_Pct'])
print(f"\nCorrelation between attendance and winning percentage: {correlation:.4f}")

---
## Step 3 | Visualization

In [None]:
# Bar chart: Attendance by team
plt.figure(figsize=(14, 6))
data_sorted = data.sort_values('Attendance', ascending=False)
plt.bar(data_sorted['Team'], data_sorted['Attendance_Thousands'])
plt.xticks(rotation=45, ha='right')
plt.ylabel('Average Attendance (Thousands)')
plt.title('ACC Football - Average Attendance Per Game (2024)')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Bar chart: Winning percentage by team
plt.figure(figsize=(14, 6))
data_sorted = data.sort_values('Winning_Pct', ascending=False)
plt.bar(data_sorted['Team'], data_sorted['Winning_Pct'])
plt.axhline(0.5, linestyle='--', color='red', label='.500 (Even record)')
plt.xticks(rotation=45, ha='right')
plt.ylabel('Winning Percentage')
plt.title('ACC Football - Winning Percentage by Team (2024)')
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x='Attendance_Thousands', y='Winning_Pct', data=data,
            scatter_kws={'s': 80, 'alpha': 0.7})
plt.title('Correlation Between Attendance and Winning Percentage (ACC 2024)')
plt.xlabel('Average Attendance (Thousands)')
plt.ylabel('Winning Percentage')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Winning_Pct ~ Attendance', data=data).fit()
print("OLS Regression: Winning_Pct ~ Attendance")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: Attendance does not predict winning (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: {model.params['Intercept']:.3f}")
print(f"  Attendance coefficient: {model.params['Attendance']:.2e}")
print(f"  P-value: {model.pvalues['Attendance']:.3f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nConclusion: FAIL TO REJECT null hypothesis")
print(f"  Attendance does NOT significantly predict winning percentage")

---
## Step 5 | Results Interpretation

### Key Findings

| Statistic | Value |
|-----------|-------|
| Coefficient | 2.09e-06 |
| P-value | 0.52 |
| R-squared | 0.028 |

1. **No Significant Relationship:** Attendance doesn't predict winning

2. **Very Low R²:** Attendance explains less than 3% of winning variance

3. **Tiny effect size:** Coefficient is essentially zero

### Causality Question

Even if there were a relationship, direction is unclear:
- Do fans attend because the team wins? (Winning → Attendance)
- Does high attendance help teams win? (Attendance → Winning)

### Why Might There Be No Relationship?

- **Stadium size:** Some schools have larger stadiums regardless of success
- **Fan base:** Traditional programs draw fans even in bad years
- **Location:** Urban schools may have smaller stadiums but good teams
- **Small sample:** Only 17 teams in one season

### Counter-examples in the Data

- **SMU:** High winning pct (0.786), low attendance (32,652)
- **Florida State:** Low winning pct (0.167), high attendance (53,479)

### Note on Expanded ACC

This is the first year with 17 teams after Cal, SMU, and Stanford joined from the Pac-12.

---
## Replication Exercises

### Exercise 1: Historical Analysis
Collect data for previous years. Does attendance predict winning when pooling across years?

### Exercise 2: Revenue Analysis
If ticket prices were available, analyze total revenue vs. winning.

### Exercise 3: Lagged Effects
Does last year's attendance predict this year's winning (recruiting effect)?

### Challenge Exercise
Compare ACC to other conferences. Is the relationship different in the SEC or Big Ten?

In [None]:
# Your code for exercises

# Example: Teams with best attendance
# print(data.nlargest(5, 'Attendance')[['Team', 'Attendance', 'Winning_Pct']])