# ECON 0150 | Replication Notebook

**Title:** GDP and Happiness

**Original Author:** Arrlington

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Is there a relationship between GDP per capita and happiness?

**Data Source:** World Bank GDP data and World Happiness Report 2024

**Methods:** OLS regression with log transformation

**Main Finding:** Log GDP per capita is a strong predictor of national happiness. A 1 unit increase in log GDP (roughly 10x more GDP) is associated with 1.83 points higher happiness score (p < 0.001).

**Course Concepts Used:**
- OLS regression
- Log transformations
- Merging datasets
- Residual analysis

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0004/data/'

gdp_data = pd.read_csv(base_url + 'gdp-per-capita-worldbank.csv')
happiness_data = pd.read_csv(base_url + 'happiest-countries-in-the-world-2025.csv')

print(f"GDP data: {len(gdp_data)} rows")
print(f"Happiness data: {len(happiness_data)} rows")

---
## Step 1 | Data Preparation

In [None]:
# Extract 2024 GDP per capita
gdp_2024 = gdp_data[gdp_data['Year'] == 2024][['Entity', 'GDP per capita, PPP (constant 2021 international $)']].copy()
gdp_2024 = gdp_2024.rename(columns={
    'Entity': 'country', 
    'GDP per capita, PPP (constant 2021 international $)': 'gdp_per_capita'
})

# Add log GDP
gdp_2024['log_gdp'] = np.log10(gdp_2024['gdp_per_capita'])

# Prepare happiness data
happiness = happiness_data[['country', 'WorldHappinessScore_2024']].copy()
happiness = happiness.rename(columns={'WorldHappinessScore_2024': 'happiness_score'})

print(f"GDP 2024: {len(gdp_2024)} countries")
print(f"Happiness: {len(happiness)} countries")

In [None]:
# Merge datasets
data = pd.merge(gdp_2024, happiness, on='country', how='inner')
print(f"Merged data: {len(data)} countries")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
data[['gdp_per_capita', 'log_gdp', 'happiness_score']].describe()

In [None]:
# Top 10 happiest and least happy countries
print("Top 10 Happiest:")
print(data.nlargest(10, 'happiness_score')[['country', 'happiness_score', 'gdp_per_capita']])
print("\nBottom 10:")
print(data.nsmallest(10, 'happiness_score')[['country', 'happiness_score', 'gdp_per_capita']])

---
## Step 3 | Visualization

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='log_gdp', y='happiness_score', alpha=0.7)
plt.xlabel('Log₁₀ GDP per Capita')
plt.ylabel('World Happiness Score (2024)')
plt.title('Happiness vs. Log GDP per Capita')
plt.show()

In [None]:
# With regression line
plt.figure(figsize=(10, 6))
sns.regplot(data=data, x='log_gdp', y='happiness_score', line_kws={'color': 'red'}, ci=None)
plt.xlabel('Log₁₀ GDP per Capita')
plt.ylabel('World Happiness Score (2024)')
plt.title('Happiness vs. Log GDP per Capita')
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS regression
model = smf.ols('happiness_score ~ log_gdp', data=data).fit()
print(model.summary().tables[1])
print(f"\nR-squared: {model.rsquared:.3f}")

In [None]:
# Residual plot
plt.figure(figsize=(10, 5))
sns.residplot(data=data, x='log_gdp', y='happiness_score')
plt.xlabel('Log₁₀ GDP per Capita')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.axhline(0, color='red', linestyle='--')
plt.show()

---
## Step 5 | Results Interpretation

### Key Findings

**Regression Results:**
- **Intercept:** -2.17 (p < 0.001)
- **Log GDP coefficient:** 1.83 (p < 0.001)
- **R-squared:** ~0.65 - GDP explains about 65% of the variation in happiness

### Interpretation

A 1 unit increase in log₁₀ GDP (i.e., 10x more GDP) is associated with 1.83 points higher happiness score.

**Example:** 
- A country with GDP of $10,000 (log = 4) is predicted to have happiness: -2.17 + 1.83(4) = 5.15
- A country with GDP of $100,000 (log = 5) is predicted to have happiness: -2.17 + 1.83(5) = 6.98

### Caveats

- Correlation does not imply causation
- Other factors (social support, freedom, corruption) also influence happiness
- The World Happiness Report explicitly models these other factors

---
## Replication Exercises

### Exercise 1: Outliers
Which countries are unusually happy or unhappy given their GDP? Calculate residuals and identify outliers.

### Exercise 2: Non-linearity
Add a quadratic term for log_gdp. Is there evidence of diminishing returns to wealth for happiness?

### Exercise 3: Rich vs Poor
Split the data into high-GDP and low-GDP countries. Is the relationship stronger in one group?

### Challenge Exercise
Research the Easterlin Paradox. How does this cross-sectional finding relate to the paradox?

In [None]:
# Your code for exercises
