# ECON 0150 | Replication Notebook

**Title:** BMI and GDP on Life Expectancy

**Original Author:** Rod

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis. You can run this notebook yourself to explore the data, reproduce the findings, and try the extension exercises at the end.

## About This Replication

**Research Question:** What are the effects of BMI and GDP on Life Expectancy?

**Data Source:** WHO Life Expectancy dataset (2938 observations across countries and years)

**Methods:** OLS regression with log transformation

**Main Finding:** Both GDP (logged) and BMI have significant positive effects on life expectancy. A 10x increase in GDP is associated with approximately 10 years higher life expectancy.

**Course Concepts Used:**
- OLS regression
- Log transformations
- Multiple regression
- Residual analysis

---
## Step 0 | Setup

First, we import the necessary libraries and load the data.

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
data_url = 'https://tayweid.github.io/econ-0150/projects/replications/0009/data/life_expectancy_data.csv'
data = pd.read_csv(data_url)

# Preview the data
data.head()

In [None]:
# Check the shape and columns
print(f"Dataset has {len(data)} rows and {len(data.columns)} columns")
print(f"\nColumns: {list(data.columns)}")

---
## Step 1 | Data Preparation

We filter to year 2015 and select the key variables for analysis.

In [ ]:
# Filter to 2015 data and select key variables
df_2015 = data[data['Year'] == 2015].copy()

# Clean column names (some have extra spaces)
df_2015 = df_2015.rename(columns={
    'Life expectancy ': 'life_expectancy',
    ' BMI ': 'bmi'
})

# Select key columns and drop missing values
df = df_2015[['Country', 'life_expectancy', 'GDP', 'bmi']].dropna()

# Create log GDP
df['log_gdp'] = np.log10(df['GDP'])

print(f"Working with {len(df)} countries from 2015")
df.head()

---
## Step 2 | Data Exploration

We explore the distributions and relationships in our key variables.

In [None]:
# Summary statistics
df.describe()

---
## Step 3 | Visualization

We visualize the relationships between our key variables.

In [None]:
# Scatter plots: GDP (log) and BMI vs Life Expectancy
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].scatter(df['log_gdp'], df['life_expectancy'], alpha=0.6)
axes[0].set_xlabel('Log GDP', fontsize=12)
axes[0].set_ylabel('Life Expectancy', fontsize=12)
axes[0].set_title('GDP (log) vs Life Expectancy')
axes[0].grid(True, alpha=0.3)

axes[1].scatter(df['bmi'], df['life_expectancy'], alpha=0.6)
axes[1].set_xlabel('BMI', fontsize=12)
axes[1].set_ylabel('Life Expectancy', fontsize=12)
axes[1].set_title('BMI vs Life Expectancy')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---
## Step 4 | Statistical Analysis

We run OLS regressions to examine the effects of GDP and BMI on life expectancy.

### Model 1: Effect of GDP on Life Expectancy

In [None]:
# Model 1: Life Expectancy ~ Log GDP
model_gdp = smf.ols('life_expectancy ~ log_gdp', data=df).fit()
print(model_gdp.summary().tables[1])

# Visualization with regression line
sns.regplot(x='log_gdp', y='life_expectancy', data=df, ci=None)
plt.xlabel('Log GDP')
plt.ylabel('Life Expectancy')
plt.title('Effect of GDP on Life Expectancy')
plt.show()

### Model 2: Effect of BMI on Life Expectancy

In [None]:
# Model 2: Life Expectancy ~ BMI
model_bmi = smf.ols('life_expectancy ~ bmi', data=df).fit()
print(model_bmi.summary().tables[1])

# Visualization with regression line
sns.regplot(x='bmi', y='life_expectancy', data=df, ci=None)
plt.xlabel('BMI')
plt.ylabel('Life Expectancy')
plt.title('Effect of BMI on Life Expectancy')
plt.show()

---
## Step 5 | Results Interpretation

### Key Findings

**Model 1: GDP Effect**
- **Log GDP coefficient:** ~10 (p < 0.001)
- **Interpretation:** A 10x increase in GDP is associated with approximately 10 years higher life expectancy
- GDP explains a substantial portion of variation in life expectancy across countries

**Model 2: BMI Effect**
- **BMI coefficient:** ~0.6 (p < 0.001)
- **Interpretation:** A 1 unit increase in BMI is associated with approximately 0.6 years higher life expectancy
- This positive relationship may reflect that higher BMI correlates with food security and economic development at the country level

### Conclusion

Both GDP and BMI are significantly associated with life expectancy. Wealthier countries tend to have longer life expectancies, and countries with higher average BMI (often reflecting greater food availability) also tend to have longer life expectancies. However, these bivariate relationships may be confounded by other factors.

---
## Replication Exercises

Try extending this analysis with the following exercises:

### Exercise 1: Multiple Regression
Include both log_gdp and bmi in the same regression model. How do the coefficients change when you control for both variables?

### Exercise 2: Different Years
Repeat the analysis for 2010 instead of 2015. Are the relationships similar across years?

### Exercise 3: Developed vs Developing Countries
The data includes a "Status" column (Developed/Developing). Split the data by status and run separate regressions. Does the GDP-life expectancy relationship differ between developed and developing countries?

### Challenge Exercise
The dataset includes many other health-related variables (Adult Mortality, infant deaths, HIV/AIDS, etc.). Build a multiple regression model with several predictors. Which variables are the strongest predictors of life expectancy?

In [None]:
# Your code for Exercise 1: Multiple Regression


In [None]:
# Your code for Exercise 2: Different Years


In [None]:
# Your code for Exercise 3: Developed vs Developing Countries


In [None]:
# Your code for Challenge Exercise
