# 09 Calculate Tree Canopy

**Project:** NORI  
**Author:** Yuseof J  
**Date:** 14/12/25

### **Purpose**
Using the land cover raster from NLCD, calculate the proportion of pixels per NYC census tract that are of type tree, allowing us to determine how much of the tract has tree canopy. 

### **Inputs**
- `data/processed/nyc_tracts.gpkg`
- `data/raw/land_cover/Annual_NLCD_LndCov_2024_CU_C1V1.tif`

### **Outputs**
- `data/processed/model_features_canopy.csv`
- `data/processed/nyc_tracts.gpkg (layer = tract_canopy)`
--------------------------------------------------------------------------

### 0. Imports and Setup

In [45]:
# package imports
import os
import rasterio
from rasterio.mask import mask
import numpy as np
import pandas as pd
import geopandas as gpd
from pathlib import Path

# specify filepaths
path_nyc_tracts = 'data/processed/nyc_tracts.gpkg'
path_land_cover_raster = 'data/raw/land_cover/Annual_NLCD_LndCov_2024_CU_C1V1.tif'
path_output_model_features_csv = 'data/processed/model_features_canopy.csv'
output_gpkg_layer = 'tract_canopy'


# codes for tree pixels in raster
tree_classes = [41, 42, 43]

# ensure cwd is project root for file paths to function properly
project_root = Path(os.getcwd())            # get current directory
while not (project_root / "data").exists(): # keep moving up until in parent
    project_root = project_root.parent
os.chdir(project_root)                      # switch to parent directory

### 1. Load Data

In [46]:
# nyc tracts
gdf_tracts_nyc = gpd.read_file(path_nyc_tracts, layer="tracts")

# land cover raster
src_land_cover = rasterio.open(path_land_cover_raster)

Land cover raster metadata

In [47]:
src_land_cover.crs

CRS.from_wkt('PROJCS["AEA        WGS84",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",23],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",29.5],PARAMETER["standard_parallel_2",45.5],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')

In [48]:
src_land_cover.transform

Affine(30.0, 0.0, -2415585.0,
       0.0, -30.0, 3314805.0)

In [49]:
src_land_cover.res

(30.0, 30.0)

### 2. Feature Engineering

In [50]:
# reproject nyc tracts to the raster crs (essential for accurate calculations below)
gdf_tracts_nyc = gdf_tracts_nyc.to_crs(src_land_cover.crs)

Calculate % Tree Canopy per Tract

> The code below simply takes the land cover raster (which is categorical, e.g. pixel is of type tree), and calculates the proportion of pixels within each census tract that are tree pixels. NOTE: this is not the most efficient method for achieving this, but for this sprint, it is clear to understand and sufficient given the relatively small amount of tracts

In [51]:
# this will hold one dictionary per tract, each containing GEOID and % tree canopy
results = []

assert(gdf_tracts_nyc.crs == src_land_cover.crs)

# loop over each census tract to calculate % tree canopy
for idx, row in gdf_tracts_nyc.iterrows():

    # each row's geometry is the tract polygon. we cast to a list because that is the input expected for the mask function below
    geom = [row.geometry]

    # here we take only the portion of the raster that falls within the given tract
    try:
        out_image, out_transform = mask(
            src_land_cover,
            geom,
            crop=True, # crop the pixels to the tract's bounds
            nodata=src_land_cover.nodata # pixels outside of the tract polygon set to nodata
        )

    # catch instance where there is no intersection between the raster and a given tract polygon
    except ValueError:
        continue

    raster_tract_intersect = out_image[0]

    # determine total number of pixels for this tract
    valid_pixels = raster_tract_intersect[raster_tract_intersect != src_land_cover.nodata] # only pixels within tract polygon are counted
    total_pixels = valid_pixels.size

    # edge case where tract is too small or resolution is coarse
    if total_pixels == 0:
        percent_tree_canopy = np.nan

    else:
        # count tree pixels (how many pixels have value 41, 42, or 43 - tree classes)
        tree_pixels = np.isin(valid_pixels, tree_classes).sum()

        # calculate percent of pixels that are trees in this tract
        percent_tree_canopy = tree_pixels / total_pixels

    # store result
    results.append({
        'GEOID': row['GEOID'],
        'percent_tree_canopy': percent_tree_canopy
    })  

In [52]:
# convert results from above into df for quick inspection and export to csv
df_tree_canopy = pd.DataFrame(results)

df_tree_canopy.describe()

Unnamed: 0,percent_tree_canopy
count,2327.0
mean,0.002674
std,0.02506
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.423186


### 4. Save Data

In [55]:
# join tree canopy to gdf tracts for visual inspection
gdf_tracts_nyc = gdf_tracts_nyc.merge(df_tree_canopy, how='left', on='GEOID')

gdf_tracts_nyc.head()

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,GEOID,GEOIDFQ,NAME,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry,percent_tree_canopy
0,36,85,24402,36085024402,1400000US36085024402,244.02,Census Tract 244.02,G5020,S,1823028,2065530,40.4997874,-74.2384712,"MULTIPOLYGON (((1811088.615 2151079.323, 18110...",0.031706
1,36,85,27705,36085027705,1400000US36085027705,277.05,Census Tract 277.05,G5020,S,531529,0,40.5882479,-74.156982,"MULTIPOLYGON (((1816726.263 2162309.589, 18168...",0.0
2,36,85,12806,36085012806,1400000US36085012806,128.06,Census Tract 128.06,G5020,S,1319470,580167,40.557671,-74.1076715,"MULTIPOLYGON (((1820897.393 2160306.067, 18209...",0.0
3,36,47,24400,36047024400,1400000US36047024400,244.0,Census Tract 244,G5020,S,155278,0,40.6217475,-73.9862364,"MULTIPOLYGON (((1829949.661 2169680.562, 18299...",0.0
4,36,47,23000,36047023000,1400000US36047023000,230.0,Census Tract 230,G5020,S,150941,0,40.637816,-73.9842809,"MULTIPOLYGON (((1829552.952 2171725.836, 18297...",0.0


In [56]:
# save tree canopy percents to nyc tracts gpkg as new layer (mostly for visual inspection of calculation)
gdf_tracts_nyc.to_file(path_nyc_tracts, layer=output_gpkg_layer)

# save model feature to csv
df_tree_canopy.to_csv(path_output_model_features_csv, index=False)