# ECON 0150 | Replication Notebook

**Title:** Bus Boardings Weekday vs Weekend

**Original Authors:** Shroff; Torri

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Do average bus boardings per stop differ significantly between weekdays and weekends?

**Data Source:** WPRDC Pittsburgh bus stop ridership data

**Methods:** OLS regression with log-transformed outcome: log(avg_ons) ~ serviceday

**Main Finding:** Weekend boardings are significantly lower (coef = -0.19 on log scale, p < 0.001), representing about 17% fewer riders.

**Course Concepts Used:**
- Log transformation of dependent variable
- Binary categorical predictor
- Box plots for comparison
- Interpretation of log-linear models

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0049/data/'

data = pd.read_csv(base_url + 'wprdc_stop_data.csv')

# Create weekend indicator
data['weekend'] = (data['serviceday'] == 'Weekend').astype(int)

# Add log transformation of average boardings
data['logons'] = np.log(data['avg_ons'] + 1)

print(f"Number of observations: {len(data):,}")
print(f"\nObservations by service day:")
print(data['serviceday'].value_counts())
data.head()

---
## Step 1 | Data Exploration

In [None]:
# Summary statistics
print("Average Boardings by Service Day:")
print(data.groupby('serviceday')['avg_ons'].describe())

In [None]:
# Mean comparison
print("\nMean Average Boardings:")
print(data.groupby('serviceday')['avg_ons'].mean())
print("\nMean Log Boardings:")
print(data.groupby('serviceday')['logons'].mean())

---
## Step 2 | Visualization

In [None]:
# Bar plot: Average boardings by service day
plt.figure(figsize=(8, 6))
sns.barplot(data=data, x='serviceday', y='avg_ons')
plt.title('Average Bus Boardings by Service Day')
plt.xlabel('Service Day')
plt.ylabel('Average Boardings per Stop')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Histogram of log boardings
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Weekday
axes[0].hist(data[data['weekend'] == 0]['logons'], bins=30, alpha=0.7, color='steelblue')
axes[0].set_xlabel('Log Average Boardings')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Distribution of Log Boardings (Weekday)')

# Weekend
axes[1].hist(data[data['weekend'] == 1]['logons'], bins=30, alpha=0.7, color='coral')
axes[1].set_xlabel('Log Average Boardings')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Distribution of Log Boardings (Weekend)')

plt.tight_layout()
plt.show()

In [None]:
# Box plot comparison
plt.figure(figsize=(8, 6))
sns.boxplot(data=data, x='serviceday', y='logons', whis=(0, 100))
plt.xlabel('Service Day')
plt.ylabel('Log Average Boardings')
plt.title('Log Boardings Distribution: Weekday vs Weekend')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 3 | Statistical Analysis

In [None]:
# OLS Regression: log(boardings) ~ service day
model = smf.ols('logons ~ serviceday', data=data).fit()
print("OLS Regression: log(boardings) ~ serviceday")
print(model.summary().tables[1])

In [None]:
# Residual plot
plt.figure(figsize=(10, 5))
sns.boxplot(x=model.predict(), y=model.resid, whis=(0, 100))
plt.axhline(y=0, color='red', linestyle='-')
plt.xticks([0, 1], labels=['Weekend', 'Weekday'])
plt.ylabel('Residuals')
plt.xlabel('Predicted Group')
plt.title('Residuals by Service Day')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: No difference in boardings between weekdays and weekends")
print(f"\nModel Results:")
print(f"  Intercept (Weekday mean): {model.params['Intercept']:.4f}")
print(f"  Weekend effect: {model.params['serviceday[T.Weekend]']:.4f}")
print(f"  P-value: {model.pvalues['serviceday[T.Weekend]']:.4e}")

# Convert to percentage change
pct_change = (np.exp(model.params['serviceday[T.Weekend]']) - 1) * 100
print(f"\nInterpretation:")
print(f"  Weekend boardings are {abs(pct_change):.1f}% lower than weekday boardings")
print(f"  This difference is highly statistically significant (p < 0.001)")

---
## Step 4 | Results Interpretation

### Key Findings

| Variable | Coefficient | P-value |
|----------|-------------|--------|
| Intercept (Weekday) | ~1.19 | < 0.001 |
| Weekend | ~-0.19 | < 0.001 |

1. **Significant Difference:** Weekend boardings are significantly lower than weekday boardings

2. **Practical Magnitude:** About 17% fewer riders on weekends

3. **Log Transformation:** Using log(boardings) helps normalize the skewed distribution

### Why Are Weekend Boardings Lower?

- **Commuter patterns:** Most transit riders are commuting to work/school on weekdays
- **Service frequency:** Transit agencies often reduce weekend service
- **Trip purposes:** Weekend trips are more discretionary and spread throughout the day
- **Car availability:** More people may have access to cars on weekends

### Policy Implications

- Resource allocation should account for weekday/weekend demand differences
- Weekend service optimization may differ from weekday strategies
- Encouraging weekend ridership could improve system efficiency

---
## Replication Exercises

### Exercise 1: Route-Level Analysis
Do some routes show smaller weekday-weekend gaps? Which routes serve weekend destinations?

### Exercise 2: Time of Day
If time-of-day data is available, how does the pattern differ by hour?

### Exercise 3: Seasonal Effects
Does the weekday-weekend gap change by season or month?

### Challenge Exercise
Research transit demand modeling. What factors besides day-of-week predict ridership?

In [None]:
# Your code for exercises

# Example: High-ridership stops
# high_ridership = data[data['avg_ons'] > data['avg_ons'].quantile(0.9)]
# print(high_ridership.groupby('serviceday')['avg_ons'].mean())