# 01 Prepare Spatial Backbone

**Project:** NORI  
**Author:** Yuseof J  
**Date:** 09/12/25

### **Purpose**
Load the raw NY State tract shapefile, filter for NYC tracts, and set crs to Long Island/NYC. 

### **Inputs**
- `data/raw/tiger_tracts_ny/tl_2025_36_tract.shp`

### **Outputs**
- `data/processed/nyc_tracts.gpkg`
  
--------------------------------------------------------------------------

### 0. Imports and Setup

In [2]:
# package imports
import os
import pandas as pd
import geopandas as gpd
from pathlib import Path

# specify filepaths
path_tracts_shapefile = 'data/raw/tiger_tracts_ny/tl_2025_36_tract.shp'
path_output_processed_geodata = 'data/processed/nyc_tracts.gpkg'

# list of fips codes for nyc - used for filtering whole ny state dataset 
nyc_fips = ["005", "047", "061", "081", "085"]

# EPSG:2263 - this coordinate reference system is specifically used for high-accuracy mapping of nyc boroughs 
nyc_crs = 'EPSG:2263'

# ensure cwd is project root for file paths to function properly
project_root = Path(os.getcwd())            # get current directory
while not (project_root / "data").exists(): # keep moving up until in parent
    project_root = project_root.parent
os.chdir(project_root)                      # switch to parent directory

### 1. Load Data

In [39]:
# load data
gdf_tracts = gpd.read_file(path_tracts_shapefile)

### 2. EDA 

In [47]:
gdf_tracts.columns.tolist()

['STATEFP',
 'COUNTYFP',
 'TRACTCE',
 'GEOID',
 'GEOIDFQ',
 'NAME',
 'NAMELSAD',
 'MTFCC',
 'FUNCSTAT',
 'ALAND',
 'AWATER',
 'INTPTLAT',
 'INTPTLON',
 'geometry']

In [41]:
gdf_tracts.head(10)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,GEOID,GEOIDFQ,NAME,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,36,29,8400,36029008400,1400000US36029008400,84.0,Census Tract 84,G5020,S,10966624,3505091,42.9713848,-78.9194986,"POLYGON ((-78.94456 42.98506, -78.94216 42.992..."
1,36,103,123600,36103123600,1400000US36103123600,1236.0,Census Tract 1236,G5020,S,2302367,1082191,40.6608399,-73.4145754,"POLYGON ((-73.42559 40.65629, -73.42529 40.656..."
2,36,103,146001,36103146001,1400000US36103146001,1460.01,Census Tract 1460.01,G5020,S,2225464,0,40.7703277,-73.2532537,"POLYGON ((-73.26159 40.76307, -73.2615 40.7636..."
3,36,103,190402,36103190402,1400000US36103190402,1904.02,Census Tract 1904.02,G5020,S,44073411,23956,40.8468673,-72.6336641,"POLYGON ((-72.72668 40.8339, -72.72515 40.8387..."
4,36,103,158709,36103158709,1400000US36103158709,1587.09,Census Tract 1587.09,G5020,S,13099359,110761,40.8517499,-72.9216255,"POLYGON ((-72.94716 40.8556, -72.94649 40.8576..."


### 3. Data Processing and Filtering

In [42]:
# filter for nyc boroughs
gdf_tracts_nyc = gdf_tracts[gdf_tracts.COUNTYFP.isin(nyc_fips)]

print("Total tracts: ", len(gdf_tracts_nyc))

Total tracts:  2327


In [43]:
# set coordinate reference system 
gdf_tracts_nyc = gdf_tracts_nyc.to_crs(nyc_crs)

### 4. Save Data

In [44]:
# export processed tract data
gdf_tracts_nyc.to_file(path_output_processed_geodata, layer="tracts")