# ECON 0150 | Replication Notebook

**Title:** Teen Birth Rates

**Original Author:** Tessier

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** How does state economy and political leaning affect teen birth rates?

**Data Source:** U.S. state-level data on GDP per capita, teen birth rates, and political affiliation (50 states)

**Methods:** OLS regression with multiple predictors

**Main Finding:** Both GDP per capita and political leaning predict teen birth rates. Republican states have ~4.6 higher teen birth rates (p = 0.001), and each $1,000 increase in GDP is associated with 0.14 fewer teen births per 1,000 (p = 0.012).

**Course Concepts Used:**
- OLS regression with multiple predictors
- Categorical variables (political party)
- Testing for interaction effects
- Residual analysis

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
data_url = 'https://tayweid.github.io/econ-0150/projects/replications/0002/data/econ_final_data.csv'
data = pd.read_csv(data_url)

# Create GDP in thousands for easier interpretation
data['K_GDP'] = round(data['GDP_Per_Cap'] / 1000, 1)

print(f"Dataset has {len(data)} states")
data.head()

---
## Step 1 | Data Exploration

In [None]:
# Summary statistics by political party
print("Republican states (1):")
print(data[data['Republican'] == 1][['K_GDP', 'Birth_Rate']].describe())
print("\nDemocratic states (0):")
print(data[data['Republican'] == 0][['K_GDP', 'Birth_Rate']].describe())

In [None]:
# Boxplot of teen birth rates by political party
plt.figure(figsize=(8, 5))
sns.boxplot(data=data, x='Republican', y='Birth_Rate', 
            hue='Republican', palette={0: 'blue', 1: 'red'}, legend=False)
plt.xlabel('Republican (0=Dem, 1=Rep)')
plt.ylabel('Teen Birth Rate (per 1,000)')
plt.title('Teen Birth Rates by State Political Leaning')
plt.show()

---
## Step 2 | Visualization

In [None]:
# Scatter plot with separate regression lines
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='K_GDP', y='Birth_Rate', 
                hue='Republican', palette={0: 'blue', 1: 'red'}, legend=False, s=60)

# Add parallel regression lines from the model
# These will be updated after running the model
plt.xlabel('GDP Per Capita (Thousands of $)')
plt.ylabel('Teen Birth Rate (per 1,000)')
plt.title('Teen Birth Rates by GDP and Political Leaning')
plt.show()

---
## Step 3 | Statistical Analysis

In [None]:
# First, test for interaction effect
model_interaction = smf.ols('Birth_Rate ~ K_GDP + Republican + K_GDP:Republican', data=data).fit()
print("Model with interaction term:")
print(model_interaction.summary().tables[1])

In [None]:
# Since interaction is not significant (p = 0.059), use parallel lines model
model = smf.ols('Birth_Rate ~ K_GDP + Republican', data=data).fit()
print("\nModel without interaction (parallel lines):")
print(model.summary().tables[1])

In [None]:
# Visualization with fitted lines
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='K_GDP', y='Birth_Rate', 
                hue='Republican', palette={0: 'blue', 1: 'red'}, legend=False, s=60)

# Add parallel regression lines
x_range = np.array([40, 95])
# Democratic states (Republican=0)
y_dem = 19.39 - 0.143 * x_range
# Republican states (Republican=1) 
y_rep = (19.39 + 4.65) - 0.143 * x_range

plt.plot(x_range, y_dem, color='blue', linewidth=2, label='Democratic')
plt.plot(x_range, y_rep, color='red', linewidth=2, label='Republican')

plt.xlabel('GDP Per Capita (Thousands of $)')
plt.ylabel('Teen Birth Rate (per 1,000)')
plt.title('Teen Birth Rates by GDP and Political Leaning')
plt.legend()
plt.xlim(40, 95)
plt.ylim(0, 30)
plt.show()

In [None]:
# Residual plots
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.residplot(x=data['K_GDP'], y=model.resid, ax=axes[0])
axes[0].set_xlabel('GDP Per Capita (Thousands)')
axes[0].set_ylabel('Residuals')
axes[0].set_title('Residuals vs GDP')

axes[1].scatter(data['Republican'], model.resid)
axes[1].axhline(0, color='red', linestyle='--')
axes[1].set_xlabel('Republican')
axes[1].set_ylabel('Residuals')
axes[1].set_title('Residuals vs Political Party')

plt.tight_layout()
plt.show()

---
## Step 4 | Results Interpretation

### Key Findings

**Model Results (R² = 0.47):**
- **Intercept:** 19.39 (p < 0.001)
- **K_GDP:** -0.143 (p = 0.012) - Each $1,000 increase in GDP is associated with 0.14 fewer teen births
- **Republican:** +4.65 (p = 0.001) - Republican states have 4.65 higher teen birth rate on average

**Interaction test:** The interaction term (GDP × Republican) was not significant (p = 0.059), suggesting both types of states share approximately the same slope.

### Interpretation

After controlling for economic conditions (GDP per capita), Republican states still have significantly higher teen birth rates. This could reflect differences in:
- Sex education policies
- Access to contraception
- Cultural/religious factors
- Urbanization patterns

---
## Replication Exercises

### Exercise 1: Additional Controls
What other state-level variables might confound this relationship? (e.g., education levels, urbanization, religious affiliation)

### Exercise 2: Identify Outliers
Which states have unusually high or low teen birth rates given their GDP and political leaning? Examine the residuals.

### Exercise 3: Marginal Significance
The interaction term had p = 0.059. What would you conclude about whether the slopes differ? How does sample size affect this?

### Challenge Exercise
This is observational data. What are the threats to causal interpretation? Could the relationship be reversed (teen birth rates affecting state GDP)?

In [None]:
# Your code for exercises
