# ECON 0150 | Replication Notebook

**Title:** Olympics and GDP

**Original Authors:** Randiga; Richards; Reed

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Is there a relationship between a country's Olympic medal count and GDP per capita?

**Data Source:** Olympic medal counts and GDP per capita changes (87 countries)

**Methods:** OLS regression using first differences: Change_in_Medals ~ Change_in_GDP_per_capita

**Main Finding:** No significant relationship between changes in GDP and changes in medal counts (p = 0.394, R² = 0.009).

**Course Concepts Used:**
- First differences approach
- Simple linear regression
- Scatter plots with regression lines
- Null result interpretation

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0063/data/'

data = pd.read_csv(base_url + 'Merged_GDP_Medals.csv')

print(f"Number of countries: {len(data)}")
data.head()

---
## Step 1 | Data Preparation

In [None]:
# Check columns
print("Columns:", data.columns.tolist())

In [None]:
# Rename columns for clarity
data = data.rename(columns={
    'Differences_medal': 'Medal_Change',
    'Differences_cap': 'GDP_Change'
})

# Drop missing values
data = data.dropna(subset=['Medal_Change', 'GDP_Change'])

print(f"\nCleaned data: {len(data)} countries")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Medal_Change', 'GDP_Change']].describe())

In [None]:
# Correlation
correlation = data['GDP_Change'].corr(data['Medal_Change'])
print(f"\nCorrelation between GDP change and medal change: {correlation:.4f}")

---
## Step 3 | Visualization

In [None]:
# Distribution of medal changes
plt.figure(figsize=(10, 6))
sns.histplot(data['Medal_Change'], kde=True, bins=20)
plt.title('Distribution of Changes in Medal Count')
plt.xlabel('Change in Medal Count')
plt.ylabel('Frequency')
plt.axvline(x=0, color='red', linestyle='--', label='No Change')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x='GDP_Change', y='Medal_Change', data=data,
            scatter_kws={'alpha': 0.6})
plt.title('Change in Medal Count vs. Change in GDP per Capita')
plt.xlabel('Change in GDP per Capita ($)')
plt.ylabel('Change in Medal Count')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Medal_Change ~ GDP_Change', data=data).fit()
print("OLS Regression: Medal_Change ~ GDP_Change")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: GDP change does not predict medal change (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: {model.params['Intercept']:.3f} medals")
print(f"  GDP coefficient: {model.params['GDP_Change']:.2e}")
print(f"  P-value: {model.pvalues['GDP_Change']:.3f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nConclusion: FAIL TO REJECT null hypothesis")
print(f"  Changes in GDP per capita do NOT predict changes in medal counts")

---
## Step 5 | Results Interpretation

### Key Findings

| Statistic | Value |
|-----------|-------|
| Coefficient | -7.76e-05 |
| P-value | 0.394 |
| R-squared | 0.009 |

1. **No Significant Relationship:** GDP changes don't predict medal changes

2. **Very Low R²:** GDP explains less than 1% of medal variation

3. **Near-zero coefficient:** Effect size is essentially zero

### Why First Differences?

Using first differences (changes rather than levels) helps control for:
- Fixed country characteristics
- Long-term trends
- Omitted variables that are constant over time

### Why Might GDP NOT Predict Medals?

- **Lagged effects:** Investment in sports takes years to produce results
- **Institutions matter:** Sports federations, training infrastructure, culture
- **Population:** Larger countries have more potential athletes
- **Focus:** Countries may prioritize different sports
- **Randomness:** Olympic performance has significant variance

### What Would Be Needed?

- Longer time series to capture lagged effects
- Sports-specific analysis
- Controls for population, sports investment, training facilities

---
## Replication Exercises

### Exercise 1: Level Analysis
Instead of differences, analyze levels. Does GDP per capita predict total medal count?

### Exercise 2: Population Control
Add population as a control variable. Does this improve the model?

### Exercise 3: Medal Categories
Separate gold, silver, bronze medals. Are patterns different?

### Challenge Exercise
Research the economics of Olympic success. What do studies find about determinants of medal counts?

In [None]:
# Your code for exercises

# Example: Countries with biggest medal gains
# print(data.nlargest(5, 'Medal_Change')[['Entity', 'Medal_Change', 'GDP_Change']])