# ECON 0150 | Replication Notebook

**Title:** School Spending and Test Scores

**Original Authors:** Liebsch; Nguyen

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Do schools with higher spending see higher test scores?

**Data Source:** NAEP Math Scores (2022) and School Expenditure Data by State

**Methods:** OLS regression of test scores on expenditures per pupil

**Main Finding:** No statistically significant relationship between spending and test scores (8th grade p = 0.632, 4th grade p = 0.335).

**Course Concepts Used:**
- Simple linear regression
- Cross-state comparison
- Scatter plots with regression lines
- Hypothesis testing

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0038/data/'

data = pd.read_csv(base_url + 'econ0150_project.csv')

print(f"Number of states: {len(data)}")
data.head()

---
## Step 1 | Data Exploration

In [None]:
# Check column names
print("Columns:", data.columns.tolist())

In [None]:
# Create variables for analysis
eighthgrade = data['eighth_grade_scores']
fourthgrade = data['fourth_grade_scores']
expenditure = data['expenditures_per_pupil']

# Summary statistics
print("Summary Statistics:")
print(data[['eighth_grade_scores', 'fourth_grade_scores', 'expenditures_per_pupil']].describe())

In [None]:
# Correlation
print(f"\nCorrelation (8th grade scores vs expenditure): {eighthgrade.corr(expenditure):.3f}")
print(f"Correlation (4th grade scores vs expenditure): {fourthgrade.corr(expenditure):.3f}")

---
## Step 2 | Visualization

In [None]:
# Scatter plots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 8th grade
sns.scatterplot(x=expenditure, y=eighthgrade, ax=axes[0])
axes[0].set_title('Expenditure vs. Eighth Grade Scores')
axes[0].set_xlabel('Expenditures per Pupil ($)')
axes[0].set_ylabel('Eighth Grade Math Scores')
axes[0].grid(True, alpha=0.3)

# 4th grade
sns.scatterplot(x=expenditure, y=fourthgrade, ax=axes[1])
axes[1].set_title('Expenditure vs. Fourth Grade Scores')
axes[1].set_xlabel('Expenditures per Pupil ($)')
axes[1].set_ylabel('Fourth Grade Math Scores')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---
## Step 3 | Statistical Analysis

In [None]:
# Null hypothesis: B1 = 0 (No relationship between NAEP scores and school spending)
# Statistical model: score = B0 + B1 * expenditures

# Regression for 8th grade
model_8th = smf.ols('eighthgrade ~ expenditure', data).fit()
print("Eighth Grade Regression:")
print(model_8th.summary().tables[1])

In [None]:
# Regression for 4th grade
model_4th = smf.ols('fourthgrade ~ expenditure', data).fit()
print("\nFourth Grade Regression:")
print(model_4th.summary().tables[1])

In [None]:
# Scatter plots with regression lines
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 8th grade with regression line
sns.regplot(data=data, y=eighthgrade, x=expenditure, ci=None, 
            line_kws={'color': 'red'}, ax=axes[0])
axes[0].set_title('Eighth Grade Scores vs. Expenditure')
axes[0].set_xlabel('Expenditures per Pupil ($)')
axes[0].set_ylabel('Eighth Grade Math Scores')
axes[0].grid(True, alpha=0.3)

# 4th grade with regression line
sns.regplot(data=data, y=fourthgrade, x=expenditure, ci=None, 
            line_kws={'color': 'red'}, ax=axes[1])
axes[1].set_title('Fourth Grade Scores vs. Expenditure')
axes[1].set_xlabel('Expenditures per Pupil ($)')
axes[1].set_ylabel('Fourth Grade Math Scores')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: No relationship between spending and scores (beta = 0)")

print(f"\nEighth Grade Model:")
print(f"  Expenditure coefficient: {model_8th.params['expenditure']:.6f}")
print(f"  P-value: {model_8th.pvalues['expenditure']:.3f}")
print(f"  Significant at 0.05? {'Yes' if model_8th.pvalues['expenditure'] < 0.05 else 'No'}")

print(f"\nFourth Grade Model:")
print(f"  Expenditure coefficient: {model_4th.params['expenditure']:.6f}")
print(f"  P-value: {model_4th.pvalues['expenditure']:.3f}")
print(f"  Significant at 0.05? {'Yes' if model_4th.pvalues['expenditure'] < 0.05 else 'No'}")

---
## Step 4 | Results Interpretation

### Key Findings

| Grade | Coefficient | P-value | Significant? |
|-------|-------------|---------|-------------|
| 8th | 0.000089 | 0.632 | No |
| 4th | -0.0002 | 0.335 | No |

1. **No Significant Relationship:** Neither model shows a statistically significant relationship between spending and test scores

2. **Near-Zero Coefficients:** The effect sizes are essentially zero

3. **Cannot Reject Null:** We cannot conclude that higher spending leads to higher scores

### Why No Relationship?

This finding is consistent with much education research:

- **How money is spent matters more:** Teacher quality, class size, curriculum design
- **Diminishing returns:** After a baseline, more money may not help
- **Confounders:** States with challenging student populations may spend more but achieve less
- **Cross-sectional limitation:** Comparing across states misses within-state dynamics

### Policy Implications

This does NOT mean "money doesn't matter" for education. It suggests:
- Simple spending increases may not be sufficient
- Targeted investments (teacher training, early intervention) may be more effective
- Equity in funding distribution may matter more than total spending

---
## Replication Exercises

### Exercise 1: Poverty Controls
Add state poverty rate as a control. Does the spending coefficient change?

### Exercise 2: Regional Differences
Does the relationship differ by region (Northeast, South, Midwest, West)?

### Exercise 3: Spending Growth
The data includes spending changes. Is spending *growth* related to score improvements?

### Challenge Exercise
Research the Coleman Report and subsequent debates about school resources. What methodological approaches do economists use to estimate causal effects of school spending?

In [None]:
# Your code for exercises

# Example: Check if revenue vs expenditure matters
# model_revenue = smf.ols('eighthgrade ~ revenues_per_pupil', data).fit()
# print(model_revenue.summary().tables[1])