# ECON 0150 | Replication Notebook

**Title:** Maple Trees - Which Maple Is Best?

**Original Authors:** Sensibar, Sharma

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis. You can run this notebook yourself to explore the data, reproduce the findings, and try the extension exercises at the end.

## About This Replication

**Research Question:** Which maple tree species provides more dollar benefits - Norway Maple or Red Maple?

**Data Source:** Pittsburgh Trees dataset (45,709 trees with species, dimensions, and ecosystem benefits)

**Methods:** OLS regression with interaction term comparing tree volume effects on dollar benefits by maple species

**Main Finding:** Norway Maples provide $10.58 more in benefits on average, and the benefit per unit volume is slightly higher for Norway Maples (interaction term = 0.019, p < 0.001).

**Course Concepts Used:**
- OLS regression
- Dummy variables
- Interaction terms
- Residual analysis

---
## Step 0 | Setup

First, we import the necessary libraries and load the data.

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
data_url = 'https://tayweid.github.io/econ-0150/projects/replications/0039/data/pittsburgh_trees.csv'
data = pd.read_csv(data_url, low_memory=False)

# Preview the data
data.head()

---
## Step 1 | Data Preparation

We select key columns, create tree volume, and filter to the two maple species of interest.

In [ ]:
# Select key columns and clean data
df = data[['common_name', 'height', 'width', 'overall_benefits_dollar_value']].copy()
df = df.dropna()
df = df[df['height'] > 0]

# Create tree volume (proxy for size)
df['tree_volume'] = df['height'] * df['width']

# Check most common tree species
df['common_name'].value_counts().head(5)

In [None]:
# Filter to just the two maple species
subset = df[df['common_name'].isin(['Maple: Norway', 'Maple: Red'])].copy()

# Create dummy variable: 1 = Norway Maple, 0 = Red Maple
subset['is_norway'] = (subset['common_name'] == 'Maple: Norway').astype(int)
subset['tree_type'] = subset['is_norway'].map({0: 'Red Maple', 1: 'Norway Maple'})

print(f"Red Maples: {(subset['is_norway'] == 0).sum()}")
print(f"Norway Maples: {(subset['is_norway'] == 1).sum()}")
subset.head()

---
## Step 2 | Visualization

We visualize the relationship between tree volume and dollar benefits for each maple type.

In [None]:
# Scatter plot with regression lines by maple type
sns.lmplot(data=subset, x='tree_volume', y='overall_benefits_dollar_value', 
           hue='tree_type', scatter_kws={'alpha': 0.3}, ci=None)
plt.title('Dollar Benefits of Red and Norway Maples by Tree Volume')
plt.xlabel('Tree Volume')
plt.ylabel('Dollar Benefits')
plt.show()

---
## Step 3 | Statistical Analysis

We run regression models to compare dollar benefits between maple types.

### Model 1: Without interaction

In [None]:
# Model 1: Dollar Benefits ~ Maple Type + Tree Volume
model1 = smf.ols('overall_benefits_dollar_value ~ is_norway + tree_volume', data=subset).fit()
print(model1.summary().tables[1])

### Model 2: With interaction

In [None]:
# Model 2: With interaction term
model2 = smf.ols('overall_benefits_dollar_value ~ is_norway + tree_volume + tree_volume:is_norway', data=subset).fit()
print(model2.summary().tables[1])

In [ ]:
# Residual plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x=model2.fittedvalues, y=model2.resid, alpha=0.5)
plt.axhline(0, color='red', linestyle='--')
plt.title('Residual Plot')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.show()

---
## Step 4 | Results Interpretation

### Key Findings

**Model 1 (no interaction):**
- **Norway Maple coefficient:** +$10.58 (p < 0.001)
- Norway Maples provide about $10.58 more in benefits on average, controlling for tree volume

**Model 2 (with interaction):**
- **Intercept:** $69.03 - baseline dollar value for Red Maples
- **is_norway:** +$5.99 (p < 0.001) - Norway Maples start $5.99 higher
- **tree_volume:** +$0.13 per unit for Red Maples
- **Interaction:** +$0.019 (p < 0.001) - Norway Maples get an additional $0.019 per unit of volume

### Conclusion

Norway Maples provide slightly more ecosystem dollar benefits than Red Maples at the same size, and this advantage grows with tree size (significant positive interaction). Large trees are underpredicted by the model, suggesting a non-linear relationship.

---
## Replication Exercises

Try extending this analysis with the following exercises:

### Exercise 1: Other Species
Compare two different tree species from the dataset. Do the results differ?

### Exercise 2: Non-linear Effects
Add a squared term for tree_volume to capture non-linear effects. Does this improve the model?

### Exercise 3: Location Analysis
The full dataset includes neighborhood information. Do the benefits vary by neighborhood?

### Challenge Exercise
The residual plot shows under-prediction for large trees. Propose and test a transformation (log, square root) that might better capture this relationship.

In [ ]:
# Your code for Exercise 1: Other Species


In [None]:
# Your code for Exercise 2: Non-linear Effects


In [None]:
# Your code for Exercise 3: Location Analysis


In [None]:
# Your code for Challenge Exercise
