# ECON 0150 | Replication Notebook

**Title:** GDP and Voter Turnout

**Original Author:** Perkins

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

**Note:** This replication requires a voter turnout dataset that was not included in the original submission. The analysis below demonstrates the methodology using the available GDP and unemployment data.

## About This Replication

**Research Question:** Does GDP have an effect on Voter turnout?

**Data Source:** County-level GDP, voter turnout, and unemployment data (presidential election years 2004-2020)

**Methods:** OLS regression with year fixed effects and percent change variables

**Main Finding:** GDP percent change is NOT significantly associated with voter turnout change (p = 0.603), though year effects are significant (2020 turnout was notably higher).

**Course Concepts Used:**
- Panel data analysis
- First differences (percent changes)
- Year fixed effects
- Multiple regression

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
# Note: The original analysis required a voter_turnout.txt file that is not available
# This notebook demonstrates the GDP data preparation methodology
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0013/data/'

# Load GDP data
gdp_data = pd.read_csv(base_url + 'final_gdp.csv', encoding='latin1', low_memory=False)
print(f"GDP data: {len(gdp_data)} rows")
gdp_data.head()

---
## Step 1 | Data Preparation (GDP)

In [None]:
# Filter to total industry GDP
gdp_total = gdp_data[gdp_data['Description'] == 'All industry total '].copy()

# Get column names that are years
year_cols = [str(y) for y in range(2000, 2024)]
id_cols = ['GeoFIPS']

# Reshape wide to long
gdp_long = gdp_total.melt(
    id_vars=['GeoFIPS'],
    value_vars=[c for c in year_cols if c in gdp_total.columns],
    var_name='YEAR',
    value_name='GDP'
)

# Convert to numeric
gdp_long['YEAR'] = pd.to_numeric(gdp_long['YEAR'], errors='coerce')
gdp_long['GDP'] = pd.to_numeric(gdp_long['GDP'], errors='coerce')

# Filter to presidential election years
pres_years = [2004, 2008, 2012, 2016, 2020]
gdp_pres = gdp_long[gdp_long['YEAR'].isin(pres_years)].copy()

print(f"GDP data (presidential years): {len(gdp_pres)} county-year observations")

In [None]:
# Calculate percent change in GDP
gdp_pres = gdp_pres.sort_values(['GeoFIPS', 'YEAR'])
gdp_pres['pct_change_gdp'] = gdp_pres.groupby('GeoFIPS')['GDP'].pct_change()

# Drop first observation per county (no previous year to compare)
gdp_pres = gdp_pres.dropna(subset=['pct_change_gdp'])

# Clean FIPS codes
gdp_pres['GeoFIPS'] = gdp_pres['GeoFIPS'].astype(str).str.replace('"', '').str.strip().str.zfill(5)

# Remove state-level data
gdp_pres = gdp_pres[~gdp_pres['GeoFIPS'].str.endswith('000')]

print(f"Final GDP data: {len(gdp_pres)} observations")
gdp_pres.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics by year
gdp_pres.groupby('YEAR')['pct_change_gdp'].describe()

In [None]:
# Average GDP change over time
avg_gdp = gdp_pres.groupby('YEAR')['pct_change_gdp'].mean().reset_index()

plt.figure(figsize=(10, 5))
plt.plot(avg_gdp['YEAR'], avg_gdp['pct_change_gdp'], marker='o')
plt.axhline(0, color='red', linestyle='--')
plt.xlabel('Year')
plt.ylabel('Average Percent Change in GDP')
plt.title('Average County GDP Change Over Presidential Election Years')
plt.grid(True)
plt.show()

---
## Step 3 | Original Analysis Summary

The original analysis merged GDP data with voter turnout data at the county level and ran:

```python
model = smf.ols(
    'pct_change_turnout ~ pct_change_gdp + unemployment_rate + YEAR_2012 + YEAR_2016 + YEAR_2020',
    data=data
).fit()
```

### Original Results

| Variable | Coefficient | p-value |
|----------|-------------|----------|
| Intercept | -0.055 | 0.000 |
| pct_change_gdp | 0.005 | 0.603 |
| unemployment_rate | 0.005 | 0.000 |
| YEAR_2012 | -0.007 | 0.447 |
| YEAR_2016 | -0.049 | 0.000 |
| YEAR_2020 | 0.098 | 0.000 |

**R-squared:** 0.026

### Interpretation

- GDP change is NOT significantly associated with voter turnout change (p = 0.603)
- Year effects are significant: 2016 had lower turnout, 2020 had higher turnout
- The model explains only 2.6% of variation in turnout changes

---
## Replication Exercises

### Exercise 1: Alternative Measures
Instead of percent changes, try using GDP levels. Does the relationship change?

### Exercise 2: Lagged Effects
Does the previous election cycle's GDP affect current turnout?

### Exercise 3: Regional Analysis
Do certain regions show stronger GDP-turnout relationships?

### Challenge Exercise
Obtain county-level voter turnout data (e.g., from MIT Election Data + Science Lab) and replicate the full analysis.

In [None]:
# Your code for exercises
