# ECON 0150 | Replication Notebook

**Title:** The Olympics Effect

**Original Authors:** Sainiak, Corella

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** What is the Olympics Effect on host countries?

**Data Source:** International tourist arrivals data from Our World in Data (1995-present)

**Methods:** Time series visualization and trend analysis for Olympic host countries

**Main Finding:** Visual analysis shows tourism trends around Olympic years, though establishing a causal "Olympics effect" requires more rigorous methodology.

**Course Concepts Used:**
- Time series visualization
- Event study approach
- Comparative analysis across countries
- Trend analysis

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0018/data/'

df = pd.read_csv(base_url + 'international-tourist-trips.csv')

print(f"Tourism data: {len(df)} country-year observations")
print(f"Countries: {df['Entity'].nunique()}")
print(f"Years: {df['Year'].min()} - {df['Year'].max()}")
df.head()

---
## Step 1 | Data Preparation

In [None]:
# Olympic host countries and years (Summer Olympics)
olympic_hosts = {
    'United States': [1996],  # Atlanta
    'China': [2008],          # Beijing
    'United Kingdom': [2012], # London
    'Brazil': [2016],         # Rio
    'Canada': [2010]          # Vancouver (Winter)
}

# Extract data for each host country
host_data = {}
for country in olympic_hosts.keys():
    country_df = df[df['Entity'] == country][['Year', 'Inbound arrivals of tourists']].dropna()
    host_data[country] = country_df.reset_index(drop=True)
    print(f"{country}: {len(host_data[country])} years of data")

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics by country
summary = []
for country, data in host_data.items():
    summary.append({
        'Country': country,
        'Mean Arrivals': data['Inbound arrivals of tourists'].mean() / 1e6,
        'Max Arrivals': data['Inbound arrivals of tourists'].max() / 1e6,
        'Olympic Year': olympic_hosts[country][0]
    })

pd.DataFrame(summary)

---
## Step 3 | Visualization

In [None]:
# Individual country plots
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

colors = ['blue', 'green', 'purple', 'red', 'orange']

for i, (country, data) in enumerate(host_data.items()):
    ax = axes[i]
    ax.scatter(data['Year'], data['Inbound arrivals of tourists'] / 1e6, color=colors[i], alpha=0.7)
    
    # Mark Olympic year
    for year in olympic_hosts[country]:
        ax.axvline(x=year, color='red', linestyle='--', label=f'Olympics ({year})')
    
    ax.set_xlabel('Year')
    ax.set_ylabel('Tourist Arrivals (millions)')
    ax.set_title(f'{country}')
    ax.legend()
    ax.grid(True, alpha=0.3)

# Hide last subplot
axes[-1].set_visible(False)

plt.suptitle('International Tourist Arrivals by Olympic Host Country', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Combined plot
plt.figure(figsize=(14, 8))

for country, data in host_data.items():
    plt.scatter(data['Year'], data['Inbound arrivals of tourists'] / 1e6, 
                label=f'{country}', alpha=0.7, s=50)

# Add Olympic year markers
for country, years in olympic_hosts.items():
    for year in years:
        plt.axvline(x=year, linestyle='--', alpha=0.5)

plt.xlabel('Year')
plt.ylabel('Inbound Tourist Arrivals (millions)')
plt.title('International Tourist Arrivals with Olympic Years Marked')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# Simple trend analysis for each country
print("Trend Analysis (Arrivals ~ Year):\n")

for country, data in host_data.items():
    X = sm.add_constant(data['Year'])
    y = data['Inbound arrivals of tourists']
    
    model = sm.OLS(y, X).fit()
    
    print(f"{country}:")
    print(f"  Annual growth: {model.params['Year'] / 1e6:.2f} million tourists/year")
    print(f"  p-value: {model.pvalues['Year']:.4f}")
    print()

In [None]:
# Pre/Post Olympics comparison
print("Pre/Post Olympics Comparison:\n")

for country, data in host_data.items():
    olympic_year = olympic_hosts[country][0]
    
    # Get 3 years before and after
    pre = data[(data['Year'] >= olympic_year - 3) & (data['Year'] < olympic_year)]['Inbound arrivals of tourists'].mean()
    post = data[(data['Year'] > olympic_year) & (data['Year'] <= olympic_year + 3)]['Inbound arrivals of tourists'].mean()
    olympic = data[data['Year'] == olympic_year]['Inbound arrivals of tourists'].values
    
    if len(olympic) > 0:
        print(f"{country} ({olympic_year}):")
        print(f"  Pre-Olympic avg: {pre/1e6:.2f}M")
        print(f"  Olympic year: {olympic[0]/1e6:.2f}M")
        print(f"  Post-Olympic avg: {post/1e6:.2f}M")
        if pre > 0:
            print(f"  Change: {(post - pre) / pre * 100:.1f}%")
        print()

---
## Step 5 | Results Interpretation

### Key Findings

**Visual Observations:**
- Most countries show upward trends in tourism over time
- The "Olympics effect" is difficult to isolate from general trends
- COVID-19 (2020-2021) caused dramatic drops in all countries

**Trend Analysis:**
- All countries show positive trends in tourist arrivals
- Growth rates vary by country

### Limitations

1. **No control group:** We can't isolate the Olympics effect without comparing to non-host countries
2. **Confounding factors:** Economic growth, exchange rates, and other events affect tourism
3. **Long-term vs short-term:** Olympics may have different immediate vs lasting effects
4. **COVID disruption:** Recent data is affected by the pandemic

### What Would Improve This Analysis

- Difference-in-differences design with non-host countries
- Pre-registration of hypotheses
- Controls for economic conditions and exchange rates

---
## Replication Exercises

### Exercise 1: Control Countries
Select similar countries that did NOT host Olympics (e.g., Germany, France) and compare trends.

### Exercise 2: Difference-in-Differences
Implement a proper diff-in-diff analysis with treated and control groups.

### Exercise 3: Winter Olympics
Focus on Winter Olympics hosts (Canada 2010, Russia 2014, South Korea 2018). Are effects different?

### Challenge Exercise
Research the academic literature on "mega-events" (Olympics, World Cup). What do economists find about their effects?

In [None]:
# Your code for exercises
