# ECON 0150 | Replication Notebook

**Title:** GDP, Dentist Density, and Healthcare

**Original Authors:** Ghobrial, Ghobrial, Merlos

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis. You can run this notebook yourself to explore the data, reproduce the findings, and try the extension exercises at the end.

## About This Replication

**Research Question:** Does the relationship between log GDP per capita and density of dentists differ between countries with universal healthcare and countries without universal healthcare?

**Data Source:** WHO dentist density data, GDP per capita data, and universal healthcare legislation data (2020)

**Methods:** OLS regression with interaction term (GDP x UHC)

**Main Finding:** Higher log GDP is associated with more dentists per 10,000 people (coef = 1.74, p < 0.001). The interaction between GDP and universal healthcare is not statistically significant, suggesting the relationship is similar regardless of healthcare system.

**Course Concepts Used:**
- OLS regression
- Log transformations
- Interaction terms
- Categorical control variables
- Robust standard errors

---
## Step 0 | Setup

First, we import the necessary libraries and load the data.

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load merged data from course website
# This dataset combines dentist density, GDP per capita, and universal healthcare data for 2020
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0015/data/'
data = pd.read_csv(base_url + 'merged_data_with_uhc.csv')

# Preview the data
data.head()

In [None]:
# Check the shape and columns
print(f"Dataset has {len(data)} rows and {len(data.columns)} columns")
print(f"\nColumns: {list(data.columns)}")

---
## Step 1 | Data Exploration

We explore the distributions of our key variables.

In [None]:
# Summary statistics
data.describe()

In [ ]:
# Distribution of dentist density
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].hist(data['RATE_PER_10000_N'], bins=15, edgecolor='black')
axes[0].set_xlabel('Dentists per 10,000')
axes[0].set_ylabel('Count')
axes[0].set_title('Distribution of Dentist Density')

axes[1].hist(data['log_GDP_Per_Cap'], bins=15, edgecolor='black')
axes[1].set_xlabel('Log GDP Per Capita')
axes[1].set_ylabel('Count')
axes[1].set_title('Distribution of Log GDP Per Capita')

plt.tight_layout()
plt.show()

In [None]:
# Universal healthcare distribution
sns.countplot(data=data, x='uhc_indicator')
plt.title('Countries by Universal Healthcare Status')
plt.xlabel('Universal Healthcare (0=No, 1=Yes)')
plt.ylabel('Count')
plt.show()

---
## Step 2 | Visualization

We visualize the relationship between GDP and dentist density, separated by healthcare status.

In [None]:
# Scatter plot with separate regression lines by UHC status
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='log_GDP_Per_Cap', y='RATE_PER_10000_N', hue='uhc_indicator')

for uhc_val in data['uhc_indicator'].unique():
    sns.regplot(data=data[data['uhc_indicator'] == uhc_val],
                x='log_GDP_Per_Cap', y='RATE_PER_10000_N', scatter=False, ci=None, truncate=False,
                line_kws={'color': sns.color_palette()[uhc_val]})

plt.title('Dentist Density by Log GDP Per Capita (by Universal Healthcare)')
plt.xlabel('Log GDP Per Capita')
plt.ylabel('Dentist Density per 10,000')
plt.legend(title='UHC', labels=['No', 'Yes'])
plt.show()

---
## Step 3 | Statistical Analysis

We run OLS regressions to test whether the GDP-dentist relationship differs by healthcare system.

### Model 1: Simple regression (GDP only)

In [None]:
# Model 1: Dentist Density ~ Log GDP
model_simple = smf.ols('RATE_PER_10000_N ~ log_GDP_Per_Cap', data=data).fit()
print(model_simple.summary().tables[1])

### Model 2: Interaction model (GDP x UHC)

In [None]:
# Model 2: Dentist Density ~ Log GDP + UHC + GDP*UHC interaction
# Uses robust standard errors
model_interaction = smf.ols('RATE_PER_10000_N ~ log_GDP_Per_Cap + uhc_indicator + log_GDP_Per_Cap:uhc_indicator', 
                            data=data).fit(cov_type='HC3')
print(model_interaction.summary().tables[1])

In [None]:
# Residual plot by UHC status
data['residuals'] = model_interaction.resid

plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='log_GDP_Per_Cap', y='residuals', hue='uhc_indicator')
plt.axhline(0, color='red', linestyle='--')
plt.title('Residual Plot by Universal Healthcare Status')
plt.xlabel('Log GDP Per Capita')
plt.ylabel('Residuals')
plt.legend(title='UHC', labels=['No', 'Yes'])
plt.show()

---
## Step 4 | Results Interpretation

### Key Findings

**Model 1 (GDP only):**
- **Log GDP coefficient:** 1.89 (p < 0.001)
- **Interpretation:** A 1 unit increase in log GDP (roughly 2.7x more GDP) is associated with 1.89 more dentists per 10,000

**Model 2 (with UHC interaction):**
- **Log GDP coefficient:** 1.74 (p < 0.001) - still significant
- **UHC indicator:** 7.95 (p = 0.175) - not significant
- **Interaction term:** -0.67 (p = 0.267) - not significant

### Conclusion

Higher GDP is strongly associated with more dentists per capita. However, the interaction term is not statistically significant, suggesting that the GDP-dentist relationship does not meaningfully differ between countries with and without universal healthcare legislation. Both types of countries show similar positive relationships between wealth and dental care availability.

---
## Replication Exercises

Try extending this analysis with the following exercises:

### Exercise 1: Other Healthcare Indicators
The original data contains other potential predictors. Try adding another variable (like percentage expenditure on health) and see how it affects the model.

### Exercise 2: Regional Analysis
Group countries by region (using the ParentLocation field from the UHC data). Does the GDP-dentist relationship vary by world region?

### Exercise 3: Interpret the Interaction
Even though the interaction is not significant, calculate the predicted dentist density for:
- A low-GDP country (log_GDP = 7) with UHC
- A low-GDP country (log_GDP = 7) without UHC
- A high-GDP country (log_GDP = 10) with UHC
- A high-GDP country (log_GDP = 10) without UHC

### Challenge Exercise
The original analysis used 2020 data. Load the raw dentist_density.csv file and analyze how the GDP-dentist relationship has changed over time. Create a visualization showing the trend.

In [None]:
# Your code for Exercise 1: Other Healthcare Indicators


In [ ]:
# Your code for Exercise 2: Regional Analysis


In [None]:
# Your code for Exercise 3: Interpret the Interaction


In [None]:
# Your code for Challenge Exercise
