# ECON 0150 | Replication Notebook

**Title:** Housing Prices and Rent Affordability

**Original Author:** Li

**Original Date:** Fall 2025

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** How Do Housing Price Changes Affect Rent Affordability Across U.S. Cities?

**Data Source:** Zillow ZHVI (home values) and ZORI (rent) data for U.S. metro areas (2015-2024)

**Methods:** Correlation analysis and OLS regression

**Main Finding:** Housing price growth is positively correlated with rent increases across U.S. metros.

**Course Concepts Used:**
- Time series data
- Correlation analysis
- OLS regression
- Data reshaping (wide to long)

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0007/data/'

zhvi = pd.read_csv(base_url + 'Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv')
zori = pd.read_csv(base_url + 'Metro_zori_uc_sfrcondomfr_sm_month.csv')

print(f"ZHVI data: {len(zhvi)} metros")
print(f"ZORI data: {len(zori)} metros")

---
## Step 1 | Data Preparation

In [None]:
# Function to filter and reshape data
def filter_date_columns(df):
    date_cols = [c for c in df.columns if c[:4].isdigit()]
    keep_cols = ['RegionName'] + date_cols
    return df[keep_cols]

# Select a sample of major cities for comparison
cities = ['San Francisco, CA', 'Austin, TX', 'Pittsburgh, PA', 'Atlanta, GA', 'New York, NY']

# Filter ZHVI data
zhvi_filtered = filter_date_columns(zhvi)
zhvi_cities = zhvi_filtered[zhvi_filtered['RegionName'].isin(cities)].copy()

# Filter ZORI data (starts from 2015)
zori_filtered = filter_date_columns(zori)
zori_cities = zori_filtered[zori_filtered['RegionName'].isin(cities)].copy()

print(f"ZHVI cities: {len(zhvi_cities)}")
print(f"ZORI cities: {len(zori_cities)}")

In [None]:
# Reshape to long format for analysis
zhvi_long = zhvi_cities.melt(id_vars='RegionName', var_name='date', value_name='home_value')
zhvi_long['date'] = pd.to_datetime(zhvi_long['date'])

zori_long = zori_cities.melt(id_vars='RegionName', var_name='date', value_name='rent')
zori_long['date'] = pd.to_datetime(zori_long['date'])

# Merge home values and rent
data = pd.merge(zhvi_long, zori_long, on=['RegionName', 'date'], how='inner')
data = data.dropna()

print(f"Merged observations: {len(data)}")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics by city
data.groupby('RegionName')[['home_value', 'rent']].describe()

In [None]:
# Create affordability ratio (annual rent / home value)
data['rent_to_price_ratio'] = (data['rent'] * 12) / data['home_value']
data['rent_to_price_ratio'].describe()

---
## Step 3 | Visualization

In [None]:
# Home values over time
plt.figure(figsize=(12, 6))
for city in cities:
    city_data = data[data['RegionName'] == city]
    plt.plot(city_data['date'], city_data['home_value'], label=city)

plt.title('Average Home Values (ZHVI) Over Time')
plt.xlabel('Year')
plt.ylabel('Home Value ($)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Rent over time
plt.figure(figsize=(12, 6))
for city in cities:
    city_data = data[data['RegionName'] == city]
    plt.plot(city_data['date'], city_data['rent'], label=city)

plt.title('Average Rent (ZORI) Over Time')
plt.xlabel('Year')
plt.ylabel('Monthly Rent ($)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Scatter: Home value vs Rent
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='home_value', y='rent', hue='RegionName', alpha=0.7)
plt.xlabel('Home Value ($)')
plt.ylabel('Monthly Rent ($)')
plt.title('Rent vs Home Values Across Cities')
plt.legend(bbox_to_anchor=(1.05, 1))
plt.tight_layout()
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS regression: rent ~ home_value
model = smf.ols('rent ~ home_value', data=data).fit()
print(model.summary().tables[1])
print(f"\nR-squared: {model.rsquared:.3f}")

In [None]:
# Calculate percent changes for each city
def calc_growth(group):
    group = group.sort_values('date')
    start_val = group['home_value'].iloc[0]
    end_val = group['home_value'].iloc[-1]
    start_rent = group['rent'].iloc[0]
    end_rent = group['rent'].iloc[-1]
    return pd.Series({
        'home_value_growth': (end_val - start_val) / start_val * 100,
        'rent_growth': (end_rent - start_rent) / start_rent * 100
    })

growth = data.groupby('RegionName').apply(calc_growth)
print("Growth rates (%) from first to last observation:")
growth

---
## Step 5 | Results Interpretation

### Key Findings

**Relationship between home values and rent:**
- There is a strong positive relationship between home values and rent
- Higher-cost cities like San Francisco and New York have both higher home values AND higher rents

**Growth patterns:**
- All cities show significant appreciation in both home values and rents since 2015
- The rate of growth varies substantially by city

### Interpretation

The positive relationship between home values and rent reflects fundamental market dynamics:
- Landlords set rent based on property values and expected returns
- High-demand areas see increases in both metrics
- The rent-to-price ratio provides insight into relative affordability

---
## Replication Exercises

### Exercise 1: More Cities
Add more cities to the analysis. How does the relationship vary across different housing markets?

### Exercise 2: Rent-to-Price Ratio
Analyze the rent-to-price ratio over time. Which cities have become more/less affordable?

### Exercise 3: COVID Impact
Focus on the 2020-2021 period. How did the pandemic affect the relationship between home values and rent?

### Challenge Exercise
Calculate the growth rate in home values and rent for each city. Is there evidence that faster home value appreciation leads to faster rent increases?

In [None]:
# Your code for exercises
