# ECON 0150 | Replication Notebook

**Title:** Income and Ticket Prices

**Original Author:** Gallup

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Does Median household income have an impact on ticket prices?

**Data Source:** U.S. median household income and ticket price index (1999-2024)

**Methods:** OLS regression

**Main Finding:** Higher ticket prices are associated with higher median household income. Each 1-point increase in the ticket price index is associated with $129 higher median income (p < 0.001).

**Course Concepts Used:**
- OLS regression
- Time series data
- Merging datasets
- Line plots

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0010/data/'

ticket_data = pd.read_csv(base_url + 'ticket_data_99-24.csv')
income_data = pd.read_csv(base_url + 'median_household_income_99-24.csv')

print(f"Ticket data: {len(ticket_data)} years")
print(f"Income data: {len(income_data)} years")

---
## Step 1 | Data Preparation

In [None]:
# Preview data
print("Ticket prices:")
print(ticket_data.head())
print("\nIncome data:")
print(income_data.head())

In [None]:
# Merge datasets
data = pd.merge(ticket_data, income_data, on='observation_date')

# Parse date
data['observation_date'] = pd.to_datetime(data['observation_date'])
data['year'] = data['observation_date'].dt.year

print(f"Merged data: {len(data)} observations")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
data[['Average_Ticket_Price_Index', 'Median_Household_Income']].describe()

In [None]:
# Correlation
correlation = data['Average_Ticket_Price_Index'].corr(data['Median_Household_Income'])
print(f"Correlation: {correlation:.3f}")

---
## Step 3 | Visualization

In [None]:
# Line plot: Ticket prices over time
plt.figure(figsize=(10, 5))
plt.plot(data['observation_date'], data['Average_Ticket_Price_Index'], color='red', marker='o')
plt.xlabel('Date')
plt.ylabel('Average Ticket Price Index')
plt.title('Ticket Price Index Over Time (1999-2024)')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Line plot: Median income over time
plt.figure(figsize=(10, 5))
plt.plot(data['observation_date'], data['Median_Household_Income'], color='blue', marker='o')
plt.xlabel('Date')
plt.ylabel('Median Household Income ($)')
plt.title('Median Household Income Over Time (1999-2024)')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Scatter plot: Income vs Ticket Prices
plt.figure(figsize=(10, 6))
sns.regplot(data=data, x='Average_Ticket_Price_Index', y='Median_Household_Income', 
            line_kws={'color': 'red'})
plt.xlabel('Average Ticket Price Index')
plt.ylabel('Median Household Income ($)')
plt.title('Median Household Income vs Ticket Price Index')
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS regression: Income ~ Ticket Price Index
model = smf.ols('Median_Household_Income ~ Average_Ticket_Price_Index', data=data).fit()
print(model.summary().tables[1])
print(f"\nR-squared: {model.rsquared:.3f}")

In [None]:
# Residual plot
plt.figure(figsize=(10, 5))
sns.residplot(data=data, x='Average_Ticket_Price_Index', y='Median_Household_Income')
plt.xlabel('Average Ticket Price Index')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.axhline(0, color='red', linestyle='--')
plt.show()

---
## Step 5 | Results Interpretation

### Key Findings

**Regression Results:**
- **Intercept:** ~$53,000 (p < 0.001)
- **Ticket Price Index coefficient:** ~$129 (p < 0.001)
- **R-squared:** ~0.58 - Ticket prices explain about 58% of variation in income

### Interpretation

Each 1-point increase in the average ticket price index is associated with $129 higher median household income. Both variables have been trending upward over time.

### Caveats

1. **Spurious correlation:** Both variables trend upward over time, so the correlation may be driven by time (inflation, economic growth) rather than a direct relationship
2. **Causation unclear:** Higher incomes could drive ticket prices up (demand), OR higher ticket prices could indicate a stronger economy
3. **Time series issues:** These are not independent observations - each year is related to the previous year

---
## Replication Exercises

### Exercise 1: Inflation Adjustment
Convert both variables to real (inflation-adjusted) values. Does the relationship still hold?

### Exercise 2: First Differences
Calculate year-over-year changes in both variables. Is the change in ticket prices associated with change in income?

### Exercise 3: Lag Analysis
Does last year's income predict this year's ticket prices (or vice versa)?

### Challenge Exercise
Research what the "ticket price index" measures. What specific events might explain unusual years in the data?

In [None]:
# Your code for exercises
