# ECON 0150 | Replication Notebook

**Title:** GINI and Life Expectancy

**Original Authors:** Mooney; Bochkoris; Ketels

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Does income inequality (GINI) predict life expectancy across countries?

**Data Source:** Our World in Data - GINI coefficient and life expectancy

**Methods:** OLS regression of life expectancy on GINI coefficient

**Main Finding:** Significant negative relationship: higher inequality is associated with lower life expectancy (coef = -33.35, p < 0.001, R² = 0.138).

**Course Concepts Used:**
- Simple linear regression
- Cross-country comparison
- Panel data handling
- Scatter plots with regression lines

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0046/data/'

# Load cleaned data files
gini = pd.read_csv(base_url + 'cleaned_gini.csv')
lfex = pd.read_csv(base_url + 'cleaned_lfex.csv')

print(f"GINI data: {len(gini)} countries")
print(f"Life expectancy data: {len(lfex)} countries")

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("GINI columns:", gini.columns.tolist())
print("Life expectancy columns:", lfex.columns.tolist())

In [None]:
# Merge datasets on country
# Find the country column
gini_country_col = [c for c in gini.columns if 'country' in c.lower() or 'entity' in c.lower()][0] if any('country' in c.lower() or 'entity' in c.lower() for c in gini.columns) else gini.columns[1]
lfex_country_col = [c for c in lfex.columns if 'country' in c.lower() or 'entity' in c.lower()][0] if any('country' in c.lower() or 'entity' in c.lower() for c in lfex.columns) else lfex.columns[1]

# Rename for merging
gini = gini.rename(columns={gini_country_col: 'country'})
lfex = lfex.rename(columns={lfex_country_col: 'country'})

# Find GINI and life expectancy columns
gini_val_col = [c for c in gini.columns if 'gini' in c.lower()][0]
lfex_val_col = [c for c in lfex.columns if 'life' in c.lower() or 'expectancy' in c.lower()][0]

# Standardize names
gini = gini.rename(columns={gini_val_col: 'gini'})
lfex = lfex.rename(columns={lfex_val_col: 'life_expectancy'})

# Merge
df = pd.merge(gini[['country', 'gini']], lfex[['country', 'life_expectancy']], on='country', how='inner')
df = df.dropna(subset=['gini', 'life_expectancy'])

print(f"Merged data: {len(df)} countries")
df.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(df[['gini', 'life_expectancy']].describe())

In [None]:
# Correlation
correlation = df['gini'].corr(df['life_expectancy'])
print(f"Correlation between GINI and life expectancy: {correlation:.3f}")

---
## Step 3 | Visualization

In [None]:
# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='gini', y='life_expectancy', alpha=0.7)
plt.title('Income Inequality (GINI) vs Life Expectancy')
plt.xlabel('GINI Coefficient')
plt.ylabel('Life Expectancy (years)')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(data=df, x='gini', y='life_expectancy', scatter_kws={'alpha': 0.5})
plt.title('GINI vs Life Expectancy with Regression Line')
plt.xlabel('GINI Coefficient')
plt.ylabel('Life Expectancy (years)')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('life_expectancy ~ gini', data=df).fit()
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: GINI does not predict life expectancy (beta = 0)")
print(f"\nIntercept: {model.params['Intercept']:.2f} years")
print(f"GINI coefficient: {model.params['gini']:.2f}")
print(f"\nInterpretation:")
print(f"  Each 0.1 increase in GINI (10 percentage points more inequality)")
print(f"  is associated with {abs(model.params['gini']) * 0.1:.1f} fewer years of life expectancy")
print(f"\nR-squared: {model.rsquared:.3f}")
print(f"P-value: {model.pvalues['gini']:.2e}")

---
## Step 5 | Results Interpretation

### Key Findings

| Variable | Coefficient | P-value |
|----------|-------------|--------|
| Intercept | ~85 years | < 0.001 |
| GINI | ~-33 | < 0.001 |

**R-squared:** ~0.14

1. **Negative Relationship:** More unequal countries have lower life expectancy

2. **Significant Effect:** The relationship is highly statistically significant

3. **Moderate R²:** GINI explains about 14% of life expectancy variation

### Why Might Inequality Reduce Life Expectancy?

- **Healthcare access:** Unequal societies may have worse healthcare for the poor
- **Stress and social cohesion:** Income inequality affects mental health
- **Political economy:** Unequal societies may invest less in public health
- **Poverty effects:** High inequality means more people in poverty

### Caution: Confounders

The relationship could be driven by:
- **GDP per capita:** Poorer countries may have both high inequality and low life expectancy
- **Historical factors:** Colonial history, institutions
- **Region effects:** Latin America has high GINI; Sub-Saharan Africa has low life expectancy

---
## Replication Exercises

### Exercise 1: Control for GDP
Add GDP per capita as a control. Does the GINI coefficient change?

### Exercise 2: Regional Analysis
Does the relationship differ by continent?

### Exercise 3: Time Trends
Using full panel data, examine how the relationship has changed over time.

### Challenge Exercise
Research the Wilkinson Hypothesis. What does epidemiological literature say about inequality and health?

In [None]:
# Your code for exercises

# Example: Identify outliers
# high_gini_long_life = df[(df['gini'] > 0.5) & (df['life_expectancy'] > 75)]
# print(high_gini_long_life)