### SPATIAL FEATURES AND SEPARATE REGRESSIONS

### Workflow:
1. Load final tract by day dataset (summer 2025).
2. Create additional interaction terms.
3. Split into high-heat vs. normal-heat categories.
4. OLS on log QoL rate and VIF checks for multicollinearity.
5. Moran's I on OLS residuals for spatial dependence.
6. Negative Binomial GLM count models for each categories.
7. Spatial regression.

For my reference because this is a long notebook.

Variables:

**ACS SES**
- poverty_rate_c
- medhhinc_c
- no_vehicle_rate
- 311 metrics
- total_calls
- qol_calls

**Raster variables**
- ndvi_mean
- ndwi_mean
- ndbi_mean
- albedo_mean
- tree_canopy_fraction
- impervious_fraction
- building_coverage

**Spatially engineered**
- landcover_green_pct
- landcover_developed_pct

**Interactions**
- extreme_x_poverty
- extreme_x_no_vehicle

**POTENTIAL IDEAS**

**Time fixed effects**
- C(dow)

**Spatial fixed effects**
- C(GEOID_TRACT) (optional â€” but expensive)

In [None]:
# Libraries.
import pandas as pd
import numpy as np
import patsy
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
import libpysal
from esda import Moran
from spreg import ML_Lag, ML_Error
import geopandas as gpd

In [None]:
# Load data.
df_hot = pd.read_csv("data/model/model_hot_days_2025.csv", dtype = {"GEOID": str})
df_normal = pd.read_csv("data/model/model_normal_days_2025.csv", dtype = {"GEOID": str})
gdf_tracts = gpd.read_file("data/nyc_tracts_2020/nyc_tracts_2020.shp")

In [None]:
# Merge.
df_hot = df_hot.merge(gdf_tracts[["GEOID20", "geometry"]], 
                      left_on = "GEOID", right_on = "GEOID20")

df_normal = df_normal.merge(gdf_tracts[["GEOID20", "geometry"]], 
                            left_on = "GEOID", right_on = "GEOID20")

High Heat

In [None]:
# OLS baseline.
formula = """
qol_calls ~
    temp_max_f +
    poverty_rate_c + medhhinc_c + no_vehicle_rate +
    ndvi_mean + ndwi_mean + ndbi_mean + albedo_mean +
    tree_canopy_fraction + impervious_fraction + building_coverage +
    landcover_green_pct +
    extreme_x_poverty + extreme_x_no_vehicle +
    C(dow)
"""
y, X = patsy.dmatrices(formula, data = df_hot, return_type = "dataframe")

ols_model = sm.OLS(y, X).fit()

print(ols_model.summary())

In [None]:
# VIF.
vif_df = pd.DataFrame()

vif_df["variable"] = X.columns

vif_df["VIF"] = [variance_inflation_factor(X.values, i)
                 for i in range(X.shape[1])]

vif_df

In [None]:
# Moran's I on OLS residuals.
gdf = df_hot.copy()
gdf = gdf.set_geometry("geometry")

w = libpysal.weights.Queen.from_dataframe(gdf)
w.transform = "r"

moran = Moran(ols_model.resid, w)

print("Moran I:", moran.I)
print("p-value:", moran.p_norm)

In [None]:
# Spatial lag.
y = df_hot["qol_calls"].values
X = df_hot[[
    "temp_max_f",
    "poverty_rate_c", "medhhinc_c", "no_vehicle_rate",
    "ndvi_mean", "ndwi_mean", "ndbi_mean", "albedo_mean",
    "tree_canopy_fraction", "impervious_fraction", "building_coverage"
]].values

slag_model = ML_Lag(y, X, w = w, name_y = "qol_calls")

print(slag_model.summary)

In [None]:
# Spatial error.
serr_model = ML_Error(y, X, w = w, name_y = "qol_calls")

print(serr_model.summary)

Normal Heat

In [None]:
# OLS baseline.
y, X = patsy.dmatrices(formula, data = df_normal, return_type = "dataframe")

ols_model = sm.OLS(y, X).fit()

print(ols_model.summary())

In [None]:
# VIF.
vif_df = pd.DataFrame()

vif_df["variable"] = X.columns

vif_df["VIF"] = [variance_inflation_factor(X.values, i)
                 for i in range(X.shape[1])]

vif_df

In [None]:
# Moran's I on OLS residuals.
gdf = df_normal.copy()
gdf = gdf.set_geometry("geometry")

w = libpysal.weights.Queen.from_dataframe(gdf)
w.transform = "r"

moran = Moran(ols_model.resid, w)

print("Moran I:", moran.I)
print("p-value:", moran.p_norm)

In [None]:
# Spatial lag.
y = df_normal["qol_calls"].values
X = df_normal[[
    "temp_max_f",
    "poverty_rate_c", "medhhinc_c", "no_vehicle_rate",
    "ndvi_mean", "ndwi_mean", "ndbi_mean", "albedo_mean",
    "tree_canopy_fraction", "impervious_fraction", "building_coverage"
]].values

slag_model = ML_Lag(y, X, w = w, name_y = "qol_calls")

print(slag_model.summary)

In [None]:
# Spatial error.
serr_model = ML_Error(y, X, w = w, name_y = "qol_calls")

print(serr_model.summary)