# ECON 0150 | Replication Notebook

**Title:** Urban-Rural Polarization

**Original Authors:** Sobolewski, Teets

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** How has Democratic vote share changed across counties from 2000 to 2016, and do these trends point to widening urban-rural polarization?

**Data Source:** County presidential election results 2000-2024

**Methods:** OLS regression with interaction term (log votes x year)

**Main Finding:** The positive interaction term (0.019, p < 0.001) confirms widening polarization: urban counties shifted more Democratic while rural counties shifted more Republican between 2000 and 2016.

**Course Concepts Used:**
- OLS regression
- Log transformations
- Interaction terms
- Difference-in-differences logic

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
data_url = 'https://tayweid.github.io/econ-0150/projects/replications/0040/data/countypres_2000-2024.csv'
data = pd.read_csv(data_url)
data.head()

---
## Step 1 | Data Preparation

In [None]:
# Filter to Democratic votes and calculate vote share
democrat = data[data['party'] == 'DEMOCRAT'].copy()
democrat['d_share'] = democrat['candidatevotes'] / democrat['totalvotes']

# Keep only 2000 and 2016
df = democrat[democrat['year'].isin([2000, 2016])].copy()
df['now'] = (df['year'] == 2016).astype(int)
df['log_votes'] = np.log(df['totalvotes'])

print(f"Counties in 2000: {(df['now'] == 0).sum()}")
print(f"Counties in 2016: {(df['now'] == 1).sum()}")
df.head()

---
## Step 2 | Visualization

In [None]:
# Democratic vote share vs county size (total votes) by year
g = sns.lmplot(data=df, x="totalvotes", y="d_share", hue="now",
               logx=True, scatter=False, ci=None, palette={0: "blue", 1: "red"})
plt.xscale("log")
g._legend.set_title("Year")
for t, l in zip(g._legend.texts, ["2000", "2016"]):
    t.set_text(l)
plt.xlabel("Total Votes (log scale)")
plt.ylabel("Democratic Vote Share")
plt.title("Urban-Rural Polarization: 2000 vs 2016")
plt.show()

---
## Step 3 | Statistical Analysis

In [None]:
# OLS with interaction term
model = smf.ols("d_share ~ log_votes + now + log_votes:now", data=df).fit()
print(model.summary().tables[1])

In [None]:
# Residual plot
df['residuals'] = model.resid
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df.sample(1000), x="totalvotes", y="residuals", hue="now", alpha=0.3)
plt.xscale("log")
plt.axhline(0, color='r', linestyle='--')
plt.xlabel("Total Votes (log scale)")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

---
## Step 4 | Results Interpretation

### Key Findings

- **log_votes coefficient (2000):** 0.032 - larger counties vote more Democratic
- **now coefficient:** -0.27 - overall shift toward Republicans in 2016
- **Interaction term:** +0.019 (p < 0.001) - but this shift was smaller in urban counties

### Conclusion

The positive interaction confirms widening polarization: urban areas became relatively more Democratic while rural areas became more Republican between 2000 and 2016.

---
## Replication Exercises

### Exercise 1: Different Years
Compare 2000 vs 2024 instead of 2000 vs 2016. Has polarization continued?

### Exercise 2: State Analysis
Filter to a single state and run the analysis. Does your state show the same pattern?

### Exercise 3: Swing States
Compare polarization trends in swing states vs non-swing states.

### Challenge Exercise
Calculate the predicted Democratic vote share for a county with 10,000 votes vs 1,000,000 votes in both 2000 and 2016. Quantify the divergence.

In [None]:
# Your code for exercises
