# ECON 0150 | Replication Notebook

**Title:** Gas Prices and Consumer Spending

**Original Authors:** Brubaker

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** What is the effect of gas prices on consumer spending in the US?

**Data Source:** FRED - Gas prices (GASREGW), Real Personal Consumption Expenditures (PCECC96), Unemployment rate (UNRATE), 2000-2024

**Methods:** Multiple regression: Real_PCE ~ Gas_Price + Unemployment_Rate

**Main Finding:** Positive relationship between gas prices and consumer spending (coef = 1386, p < 0.001), controlling for unemployment. RÂ² = 0.42.

**Course Concepts Used:**
- Time series data
- Multiple regression with controls
- Data merging and resampling
- Dual-axis time series plots

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0056/data/'

gas = pd.read_csv(base_url + 'GASREGW (1).csv')
pce = pd.read_csv(base_url + 'PCECC96.csv')
unemp = pd.read_csv(base_url + 'UNRATE.csv')

print(f"Gas prices: {len(gas)} observations")
print(f"Real PCE: {len(pce)} observations")
print(f"Unemployment: {len(unemp)} observations")

---
## Step 1 | Data Preparation

In [None]:
# Convert dates and set index
for df in [gas, pce, unemp]:
    df['observation_date'] = pd.to_datetime(df['observation_date'])
    df.set_index('observation_date', inplace=True)

# Resample gas to monthly (it's weekly)
gas_monthly = gas.resample('ME').mean()
pce_monthly = pce.resample('ME').mean()
unemp_monthly = unemp.resample('ME').mean()

# Merge datasets
data = gas_monthly.join([pce_monthly, unemp_monthly], how='inner')

# Rename columns
data = data.rename(columns={'GASREGW': 'gas_price', 'PCECC96': 'real_pce', 'UNRATE': 'unemp_rate'})

# Filter to 2000-2024
data = data.loc['2000-01-01':'2024-01-01']

# Drop missing
data = data.dropna()

print(f"\nMerged data: {len(data)} observations")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['gas_price', 'real_pce', 'unemp_rate']].describe())

In [None]:
# Correlations
print("\nCorrelation Matrix:")
print(data[['gas_price', 'real_pce', 'unemp_rate']].corr().round(3))

---
## Step 3 | Visualization

In [None]:
# Time series plot: Gas price and Real PCE
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

axes[0].plot(data.index, data['gas_price'])
axes[0].set_title('U.S. Gasoline Price (Monthly Average)')
axes[0].set_ylabel('Dollars per Gallon')
axes[0].grid(True, alpha=0.3)

axes[1].plot(data.index, data['real_pce'])
axes[1].set_title('Real Personal Consumption Expenditures (PCE)')
axes[1].set_ylabel('Billions of Chained 2017 Dollars')
axes[1].set_xlabel('Date')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Scatter plot: Gas price vs Real PCE
plt.figure(figsize=(10, 6))
plt.scatter(data['gas_price'], data['real_pce'], alpha=0.6)
plt.xlabel('Gas Price (Dollars per Gallon)')
plt.ylabel('Real PCE (Billions)')
plt.title('Gas Price vs Real Consumer Spending')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# Multiple regression: Real PCE ~ Gas Price + Unemployment Rate
y = data['real_pce']
X = data[['gas_price', 'unemp_rate']]
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
print("Multiple Regression: real_pce ~ gas_price + unemp_rate")
print(model.summary())

In [None]:
# Residual plot
fitted_vals = model.fittedvalues
residuals = model.resid

plt.figure(figsize=(10, 5))
plt.scatter(fitted_vals, residuals, alpha=0.6)
plt.axhline(0, linestyle='--', color='red')
plt.xlabel('Fitted Values (Predicted Real PCE)')
plt.ylabel('Residuals')
plt.title('Residual Plot: Gas Price Model')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Fitted line visualization
gas_grid = np.linspace(data['gas_price'].min(), data['gas_price'].max(), 100)
unemp_mean = data['unemp_rate'].mean()

X_line = pd.DataFrame({
    'const': 1.0,
    'gas_price': gas_grid,
    'unemp_rate': unemp_mean
})

y_line = model.predict(X_line)

plt.figure(figsize=(10, 6))
plt.scatter(data['gas_price'], data['real_pce'], alpha=0.4, label='Observed')
plt.plot(gas_grid, y_line, color='red', linewidth=2, label='Fitted line')
plt.xlabel('Gas Price (Dollars per Gallon)')
plt.ylabel('Real PCE (Billions)')
plt.title('Gas Price vs Real PCE with Fitted Line\n(Unemployment held at mean)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: Gas prices have no effect on consumer spending (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: ${model.params['const']:.0f} billion")
print(f"  Gas price coefficient: {model.params['gas_price']:.1f}")
print(f"  Gas price p-value: {model.pvalues['gas_price']:.4f}")
print(f"  Unemployment coefficient: {model.params['unemp_rate']:.1f}")
print(f"  Unemployment p-value: {model.pvalues['unemp_rate']:.4f}")
print(f"\nR-squared: {model.rsquared:.3f}")
print(f"\nInterpretation:")
print(f"  Each $1 increase in gas price is associated with")
print(f"  ${model.params['gas_price']:.0f} billion higher real consumer spending")
print(f"  (controlling for unemployment)")

---
## Step 5 | Results Interpretation

### Key Findings

| Variable | Coefficient | P-value |
|----------|-------------|--------|
| Intercept | ~$9,728B | < 0.001 |
| Gas Price | ~+1,386 | < 0.001 |
| Unemployment | ~-229 | 0.001 |

**R-squared:** ~0.42

### Wait - Positive Coefficient?

The positive relationship is **counterintuitive** but explainable:

1. **Time trend confound:** Both gas prices and consumer spending increased over the 2000-2024 period

2. **This is correlation, not causation:** Higher economic activity drives both more gas consumption (higher prices) and more spending

3. **Low Durbin-Watson:** The residual autocorrelation (DW = 0.17) indicates the model misses important time dynamics

### What Would We Need?

- **First differences:** Look at changes, not levels
- **Time fixed effects:** Control for the overall upward trend
- **Lead-lag analysis:** Does gas price *predict* future spending changes?
- **Instrumental variables:** Find exogenous shocks to gas prices (e.g., OPEC decisions)

---
## Replication Exercises

### Exercise 1: First Differences
Regress changes in spending on changes in gas prices. Does the relationship reverse?

### Exercise 2: Recession Periods
Add a recession dummy. How does the relationship change during economic downturns?

### Exercise 3: Lag Structure
Do gas price changes affect spending with a delay? Try different lag lengths.

### Challenge Exercise
Research the macroeconomic literature on oil shocks. What do economists find about gas prices and consumption?

In [None]:
# Your code for exercises

# Example: First differences
# data['gas_change'] = data['gas_price'].diff()
# data['pce_change'] = data['real_pce'].diff()
# model_diff = smf.ols('pce_change ~ gas_change', data=data.dropna()).fit()
# print(model_diff.summary().tables[1])