# 09 Calculate Tree Canopy

**Project:** NORI  
**Author:** Yuseof J  
**Date:** 14/12/25

### **Purpose**
Using the land cover raster from NLCD, calculate the proportion of pixels per NYC census tract that are of type tree, allowing us to determine how much of the tract has tree canopy. 

### **Inputs**
- `data/processed/nyc_tracts.gpkg`
- `data/raw/land_cover/Annual_NLCD_LndCov_2024_CU_C1V1.tif`

### **Outputs**
- `data/processed/model_features_canopy.csv`
- `data/processed/nyc_tracts.gpkg (layer = tract_canopy)`
--------------------------------------------------------------------------

### 0. Imports and Setup

In [93]:
# package imports
import os
import rasterio
from rasterio.mask import mask
from rasterio.features import geometry_mask, geometry_window
import numpy as np
import pandas as pd
import geopandas as gpd
from pathlib import Path

# specify filepaths
path_nyc_tracts = 'data/processed/nyc_tracts.gpkg'
path_land_cover_raster = 'data/raw/land_cover/land_cover_nyc/NYC_2017_LiDAR_LandCover.img'
path_raster_vat_table = 'data/raw/land_cover/land_cover_nyc/NYC_2017_LiDAR_LandCover.img.vat.dbf'
path_output_model_features_csv = 'data/processed/model_features_canopy.csv'
output_gpkg_layer = 'tract_canopy'

# codes for tree pixels in raster
vat = gpd.read_file(path_raster_vat_table)
tree_classes = vat[vat['Class'] == 'Tree Canopy']['Value'].values.tolist()

# ensure cwd is project root for file paths to function properly
project_root = Path(os.getcwd())            # get current directory
while not (project_root / "data").exists(): # keep moving up until in parent
    project_root = project_root.parent
os.chdir(project_root)                      # switch to parent directory

### 1. Load Data

In [85]:
# nyc tracts
gdf_tracts_nyc = gpd.read_file(path_nyc_tracts, layer="tracts")

# land cover raster
src_land_cover = rasterio.open(path_land_cover_raster)

Land cover raster metadata

In [86]:
src_land_cover.crs

CRS.from_wkt('PROJCS["NAD83 New York State Planes, Long Island, US Foot",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",40.1666666666667],PARAMETER["central_meridian",-74],PARAMETER["standard_parallel_1",41.0333333333333],PARAMETER["standard_parallel_2",40.6666666666667],PARAMETER["false_easting",984250],PARAMETER["false_northing",0],UNIT["us_survey_feet",0.304800609601219],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')

In [87]:
src_land_cover.transform

Affine(0.5, 0.0, 912286.93,
       0.0, -0.5, 273618.3)

In [88]:
src_land_cover.res

(0.5, 0.5)

### 2. Feature Engineering

In [89]:
# reproject nyc tracts to the raster crs (essential for accurate calculations below)
gdf_tracts_nyc = gdf_tracts_nyc.to_crs(src_land_cover.crs)

Calculate % Tree Canopy per Tract

> **Algorithm breakdown:**
> 
> General idea: For each tract, calculate the percentage of raster pixels within its boundary that are tree canopy pixels.
>
> Handling MemoryErrors: In order to avoid memory issues (caused by loading the ENTIRE nyc raster for each calculation), we instead take a section of the raster called a window (i.e. a rectangular bounding box). This window is the portion of the raster that contains the tract. Since tracts are not always rectangular, we now mask the window (i.e. determine which pixels in the window are actually INSIDE of the tract). Using these pixels, we calculate the proportion of them that are tree canopy, giving us percent_tree_canopy for the tract.

In [94]:
# this will hold one dictionary per tract, each containing GEOID and % tree canopy (later converted in df for csv export)
results = []

assert(gdf_tracts_nyc.crs == src_land_cover.crs)

# loop over each census tract to calculate % tree canopy
for idx, row in gdf_tracts_nyc.iterrows():

    see what the max size is before there's a memory allocation issue, then create a variable, max_mem. divide up the raster window accordingly
    for each window, keep a running count of tree pixels and normal pixels, perform ratio calculation at the end. To avoid refactoring code below,
    any tract is less than max_mem can still be treated as multiple windows, but just a list with one i.e. [window1] so the iteration
    logice still applies wether its one windowed_window or 7. 

    # each row's geometry is the tract polygon. we cast to a list because that is the input expected for the mask function below
    geom = [row.geometry]

    # here we are taking a rectangular window out of the total raster, specifically one that contains the tract (but not the exact tract shape)
    # this allows us to avoid memory issues by using only relevant section of the raster for the masking function below
    try:
        raster_window = geometry_window(src_land_cover, geom)

    # catch instance where tract is outside of raster
    except ValueError:
        continue

    # read only raster pixels within the window
    window_pixels = src_land_cover.read(1, window=raster_window)

    # get correct spatial transform for this window
    window_transform = src_land_cover.window_transform(raster_window)

    # mask: determine which pixels from the window are inside of the tract bounds
    tract_mask = geometry_mask(
        geom,
        transform=window_transform,
        invert=True,                   # pixels inside of the tract are True 
        out_shape=window_pixels.shape  # ensure alignment with raster window
    )

    # determine total number of pixels for this tract
    tract_pixels = window_pixels[tract_mask]

    # remove any pixels with no data (e.g. outside raster coverage)
    tract_pixels = tract_pixels[tract_pixels != src_land_cover.nodata] 

    # edge case where no useable pixels (e.g. raster has coverage gaps or tract is too small)
    if tract_pixels.size == 0:
        percent_tree_canopy = np.nan

    else:
        # count tree pixels (how many pixels have value == tree canopy class(es), defined in the raster vat table)
        tree_canopy_pixels = np.isin(tract_pixels, tree_classes).sum()

        # calculate percent of pixels that are of class 'Tree Canopy' in this tract
        percent_tree_canopy = tree_canopy_pixels / tract_pixels.size

    # store result
    results.append({
        'GEOID': row['GEOID'],
        'percent_tree_canopy': percent_tree_canopy
    })  

MemoryError: Unable to allocate 2.43 GiB for an array with shape (40624, 64260) and data type bool

In [52]:
# convert results from above into df for quick inspection and export to csv
df_tree_canopy = pd.DataFrame(results)

df_tree_canopy.describe()

Unnamed: 0,percent_tree_canopy
count,2327.0
mean,0.002674
std,0.02506
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.423186


### 4. Save Data

In [55]:
# join tree canopy to gdf tracts for visual inspection
gdf_tracts_nyc = gdf_tracts_nyc.merge(df_tree_canopy, how='left', on='GEOID')

gdf_tracts_nyc.head()

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,GEOID,GEOIDFQ,NAME,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry,percent_tree_canopy
0,36,85,24402,36085024402,1400000US36085024402,244.02,Census Tract 244.02,G5020,S,1823028,2065530,40.4997874,-74.2384712,"MULTIPOLYGON (((1811088.615 2151079.323, 18110...",0.031706
1,36,85,27705,36085027705,1400000US36085027705,277.05,Census Tract 277.05,G5020,S,531529,0,40.5882479,-74.156982,"MULTIPOLYGON (((1816726.263 2162309.589, 18168...",0.0
2,36,85,12806,36085012806,1400000US36085012806,128.06,Census Tract 128.06,G5020,S,1319470,580167,40.557671,-74.1076715,"MULTIPOLYGON (((1820897.393 2160306.067, 18209...",0.0
3,36,47,24400,36047024400,1400000US36047024400,244.0,Census Tract 244,G5020,S,155278,0,40.6217475,-73.9862364,"MULTIPOLYGON (((1829949.661 2169680.562, 18299...",0.0
4,36,47,23000,36047023000,1400000US36047023000,230.0,Census Tract 230,G5020,S,150941,0,40.637816,-73.9842809,"MULTIPOLYGON (((1829552.952 2171725.836, 18297...",0.0


In [56]:
# save tree canopy percents to nyc tracts gpkg as new layer (mostly for visual inspection of calculation)
gdf_tracts_nyc.to_file(path_nyc_tracts, layer=output_gpkg_layer)

# save model feature to csv
df_tree_canopy.to_csv(path_output_model_features_csv, index=False)