# ECON 0150 | Replication Notebook

**Title:** Unemployment and Opiate Overdose

**Original Authors:** Burdge

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** What is the relationship between unemployment rates and opiate overdose death rates in the US?

**Data Source:** FRED unemployment rate (UNRATE) and CDC opioid overdose death rates (1999-2020)

**Methods:** OLS regression: Opioid_Deaths_per_100k ~ Unemployment_Rate

**Main Finding:** No significant relationship between unemployment and opioid deaths (p = 0.842, R² = 0.002).

**Course Concepts Used:**
- Time series data
- Data merging
- Simple linear regression
- Dual-axis time series plots
- Null result interpretation

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0065/data/'

data = pd.read_csv(base_url + 'cleaned_data.csv')

print(f"Number of years: {len(data)}")
data.head()

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("Columns:", data.columns.tolist())
print(f"\nYears covered: {data['Year'].min()} - {data['Year'].max()}")

In [None]:
# Rename columns for clarity if needed
if 'Unemployment_Rate' not in data.columns:
    data = data.rename(columns={'UNRATE': 'Unemployment_Rate'})
if 'Opioid_Deaths_per_100k' not in data.columns:
    for col in data.columns:
        if 'opioid' in col.lower() or 'death' in col.lower():
            data = data.rename(columns={col: 'Opioid_Deaths_per_100k'})
            break

# Drop any missing values
data = data.dropna(subset=['Unemployment_Rate', 'Opioid_Deaths_per_100k'])

print(f"\nCleaned data: {len(data)} observations")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Unemployment_Rate', 'Opioid_Deaths_per_100k']].describe())

In [None]:
# Correlation
correlation = data['Unemployment_Rate'].corr(data['Opioid_Deaths_per_100k'])
print(f"\nCorrelation between unemployment and opioid deaths: {correlation:.4f}")

---
## Step 3 | Visualization

In [None]:
# Time series plot: Both variables
fig, ax1 = plt.subplots(figsize=(12, 6))

ax1.plot(data['Year'], data['Unemployment_Rate'], 'b-', linewidth=2, label='Unemployment Rate')
ax1.set_xlabel('Year')
ax1.set_ylabel('Unemployment Rate (%)', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')

ax2 = ax1.twinx()
ax2.plot(data['Year'], data['Opioid_Deaths_per_100k'], 'r-', linewidth=2, label='Opioid Deaths')
ax2.set_ylabel('Opioid Overdose Deaths per 100k', color='red')
ax2.tick_params(axis='y', labelcolor='red')

plt.title('U.S. Unemployment Rate vs. Opioid Overdose Deaths (1999-2020)')
fig.tight_layout()
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x='Unemployment_Rate', y='Opioid_Deaths_per_100k', data=data,
            scatter_kws={'s': 80, 'alpha': 0.7})

# Label key years
for year in [2008, 2009, 2020]:
    row = data[data['Year'] == year]
    if len(row) > 0:
        plt.annotate(str(year), 
                    (row['Unemployment_Rate'].iloc[0], row['Opioid_Deaths_per_100k'].iloc[0]),
                    fontsize=10, fontweight='bold')

plt.title('Unemployment Rate vs. Opioid Overdose Deaths')
plt.xlabel('Unemployment Rate (%)')
plt.ylabel('Opioid Deaths per 100,000')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Opioid_Deaths_per_100k ~ Unemployment_Rate', data=data).fit()
print("OLS Regression: Opioid_Deaths_per_100k ~ Unemployment_Rate")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nResearch Hypothesis: Higher unemployment leads to more opioid deaths")
print(f"Null Hypothesis: No relationship (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: {model.params['Intercept']:.2f} deaths per 100k")
print(f"  Unemployment coefficient: {model.params['Unemployment_Rate']:.3f}")
print(f"  P-value: {model.pvalues['Unemployment_Rate']:.3f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nConclusion: FAIL TO REJECT null hypothesis")
print(f"  Unemployment is NOT significantly associated with opioid deaths")

---
## Step 5 | Results Interpretation

### Key Findings

| Statistic | Value |
|-----------|-------|
| Coefficient | 0.12 |
| P-value | 0.842 |
| R-squared | 0.002 |

1. **No Significant Relationship:** Unemployment doesn't predict opioid deaths

2. **Very Low R²:** Unemployment explains less than 1% of variation

3. **Visual Mismatch:** The time series shows opioid deaths rising steadily while unemployment fluctuates

### The Opioid Epidemic Story

The opioid crisis was driven primarily by **supply-side factors**, not economic despair:

1. **1990s-2010:** OxyContin marketing and overprescription
2. **2010s:** Shift to heroin as prescription opioids became harder to get
3. **2013+:** Fentanyl flooding into drug supply

### Counter-Examples in the Data

- **2008-2010 (Great Recession):** Unemployment spiked but opioid deaths only rose moderately
- **2014-2019:** Unemployment fell to record lows while opioid deaths surged
- **2020 (COVID):** Both spiked - but for independent reasons

### What Would Be Needed?

- Lag analysis (does unemployment predict future deaths?)
- Regional analysis (areas with more job loss)
- Controls for prescription rates, drug supply factors

---
## Replication Exercises

### Exercise 1: Lagged Analysis
Does this year's unemployment predict next year's opioid deaths? Try adding lags.

### Exercise 2: First Differences
Regress changes in deaths on changes in unemployment. Does this help?

### Exercise 3: Synthetic Opioids
If fentanyl data is available, analyze deaths from synthetic vs. prescription opioids separately.

### Challenge Exercise
Research the "deaths of despair" hypothesis (Case & Deaton). How does your analysis relate to their findings?

In [None]:
# Your code for exercises

# Example: Create lagged variable
# data['Unemployment_Lag1'] = data['Unemployment_Rate'].shift(1)
# model_lag = smf.ols('Opioid_Deaths_per_100k ~ Unemployment_Lag1', data=data.dropna()).fit()
# print(model_lag.summary().tables[1])