# Lidar Products Generation Notebook

## Overview
This notebook processes USGS 3DEP (3D Elevation Program) LiDAR data to generate high-resolution elevation products for meadow assessment. The workflow downloads, filters, and processes point cloud data to create Digital Terrain Models (DTM), Digital Surface Models (DSM), and Canopy Height Models (CHM) at 0.6-meter resolution.

## Methodology
The processing pipeline includes:
1. **Data Discovery**: Query USGS 3DEP catalog for intersecting LiDAR datasets
2. **Point Cloud Processing**: Download and filter LiDAR points using PDAL
3. **Surface Generation**: Create DTM (bare earth) and DSM (first surface) models
4. **Gap Filling**: Use Inverse Distance Weighting (IDW) to fill data gaps
5. **Clipping**: Extract products within the meadow boundary
6. **Canopy Analysis**: Generate CHM for vegetation height assessment

## Data Products
- **DTM (Digital Terrain Model)**: Bare earth elevation (ground surface)
- **DSM (Digital Surface Model)**: First surface elevation (includes vegetation/structures)
- **CHM (Canopy Height Model)**: Vegetation height (DSM - DTM)

## Requirements
- Meadow boundary polygon (`meadow_extent.geojson`)
- Internet connection for USGS 3DEP data access
- PDAL library for point cloud processing

## Processing Parameters
- **Resolution**: 0.6 meters
- **Coordinate System**: NAD83(2011) / Colorado Central (EPSG:6339)
- **Point Classification**: Filters noise and retains ground/vegetation points
- **Gap Filling**: IDW interpolation within meadow boundaries only

---

## Environment Setup
Configure Python environment and import required libraries for LiDAR processing.

### Library Imports
Import all required libraries for geospatial processing, point cloud analysis, and data visualization.

## Study Site Configuration
Set the study area name and configure file paths for processing.

## USGS 3DEP Dataset Discovery
Query the USGS 3D Elevation Program catalog to find LiDAR datasets that intersect with the study area.

## PDAL Pipeline Construction
Create a processing pipeline for downloading and filtering LiDAR point cloud data.

## Point Cloud Processing Execution
Execute the PDAL pipeline to download, filter, and process LiDAR data. This may take several minutes depending on the dataset size.

In [1]:
# Add parent directory to Python path to access custom modules
import sys
sys.path.append("..")

In [2]:
# Raster processing and coordinate transformation
from rasterio.warp import calculate_default_transform, reproject, Resampling
import rasterio as rio

# Geospatial libraries
from shapely import BufferCapStyle, BufferJoinStyle, buffer
from shapely.geometry import shape, Point, Polygon
from shapely.ops import transform
from osgeo import gdal
import geopandas as gpd

# Point cloud processing
import pdal

# Data manipulation and analysis
import pandas as pd
import numpy as np

# Visualization and UI
import matplotlib.pyplot as plt
import ipywidgets as widgets

# File system and web requests
from pathlib import Path
import requests
import json
import sys
import os

# Custom LiDAR processing functions
import src.lidar_products as lp

In [4]:
# Study area identifier - change this to match your meadow site
# Available options: "Lacey", "Humbug", "subb"
name = "Lacey"
#name = "Humbug" 
#name = "subb"

print(f"Processing lidar data for: {name} meadow")

Processing lidar data for: Lacey meadow


In [5]:
# Create output directory for the selected study site
# All LiDAR products will be saved to ../data/{name}/
OUTPUT_DIR = Path(f"../data/{name}")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Change working directory to the output folder
os.chdir(OUTPUT_DIR)
print(f"Working directory set to: {os.getcwd()}")
print(f"Output files will be saved to: {OUTPUT_DIR.absolute()}")

Working directory set to: /media/grendel/7db216a7-836f-4e8d-b439-e4f999cedb23/USGS/meadow_assessment/data/Lacey
Output files will be saved to: /media/grendel/7db216a7-836f-4e8d-b439-e4f999cedb23/USGS/meadow_assessment/data/Lacey/../data/Lacey


In [6]:
# Define the meadow boundary file path
# This GeoJSON file must exist in the meadow folder and defines the area of interest
shapefile_path = "meadow_extent.geojson"

# Verify the boundary file exists
if os.path.exists(shapefile_path):
    print(f"âœ“ Found meadow boundary file: {shapefile_path}")
else:
    print(f"âœ— ERROR: Meadow boundary file not found at {shapefile_path}")
    print("Please ensure meadow_extent.geojson exists in the data folder.")

âœ“ Found meadow boundary file: meadow_extent.geojson


In [None]:
# Discover USGS 3DEP datasets that intersect with the meadow boundary
# This function:
# 1. Downloads the 3DEP catalog if not present locally
# 2. Projects the meadow boundary to Web Mercator (EPSG:3857) 
# 3. Buffers the boundary by 10m to ensure complete coverage
# 4. Finds all intersecting LiDAR datasets
print("Searching for intersecting USGS 3DEP LiDAR datasets...")
intersecting_polys, AOI_EPSG3857_wkt = lp.usgs_3dep_datasets(shapefile_path)
#print(f"Found {len(intersecting_polys)} intersecting dataset(s)")

Searching for intersecting USGS 3DEP LiDAR datasets...
Vector file loaded.
resources.geojson exists in the data folder.
3DEP polygons loaded and projected to Web Mercator (EPSG:3857)
AOI buffered by 10 meters
[('USGS_LPC_CA_NoCAL_Wildfires_B1_2018', <MULTIPOLYGON (((-120.174 38.833, -120.116 38.863, -120.116 38.972, -120.134...>, <MULTIPOLYGON (((-13377749 4697773.7, -13371249 4702103.8, -13371249 4717692...>, 'https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_NoCAL_Wildfires_B1_2018/ept.json', np.int64(86376910091))]
Found 1 intersecting dataset(s)
3DEP polygons loaded and projected to Web Mercator (EPSG:3857)
AOI buffered by 10 meters
[('USGS_LPC_CA_NoCAL_Wildfires_B1_2018', <MULTIPOLYGON (((-120.174 38.833, -120.116 38.863, -120.116 38.972, -120.134...>, <MULTIPOLYGON (((-13377749 4697773.7, -13371249 4702103.8, -13371249 4717692...>, 'https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_NoCAL_Wildfires_B1_2018/ept.json', np.int64(86376910091))]
Found 1 in

In [8]:
# Extract dataset names for PDAL pipeline construction
usgs_3dep_datasets = []

print("Available LiDAR datasets:")
for i, poly in enumerate(intersecting_polys):
    dataset_name = poly[0]
    point_count = poly[4]
    usgs_3dep_datasets.append(dataset_name)
    print(f"  {i+1}. {dataset_name} ({point_count:,} points estimated)")

print(f"\nTotal datasets to process: {len(usgs_3dep_datasets)}")

Available LiDAR datasets:
  1. USGS_LPC_CA_NoCAL_Wildfires_B1_2018 (86,376,910,091 points estimated)

Total datasets to process: 1


In [9]:
# Build PDAL pipeline for point cloud processing
# The pipeline includes:
# - Data readers for each USGS 3DEP dataset
# - Outlier and noise filtering
# - Point classification filtering  
# - Coordinate system reprojection to EPSG:6339
# - Ground point classification using Progressive Morphological Filter
# - Raster generation (DSM and DTM) at 0.6m resolution
print("Constructing PDAL processing pipeline...")
p_pipeline = lp.pdal_pipeline(AOI_EPSG3857_wkt, usgs_3dep_datasets)
print("âœ“ Pipeline configured with filtering and raster generation stages")

Constructing PDAL processing pipeline...
âœ“ Pipeline configured with filtering and raster generation stages


In [10]:
# Initialize PDAL pipeline object
# Convert the pipeline dictionary to JSON format for PDAL execution
pl = pdal.Pipeline(json.dumps(p_pipeline))
print("âœ“ PDAL pipeline initialized and ready for execution")

âœ“ PDAL pipeline initialized and ready for execution


In [12]:
%%time
# Execute the PDAL pipeline
# This will:
# 1. Download LiDAR points from AWS S3 (USGS 3DEP)
# 2. Apply filters to remove noise and outliers
# 3. Classify ground points using Progressive Morphological Filter
# 4. Generate temporary DSM and DTM rasters
# 
# Processing time varies by dataset size:
# - Small areas (< 10M points): 2-5 minutes
# - Medium areas (10-30M points): 5-15 minutes  
# - Large areas (> 30M points): 15+ minutes

print("Starting Lidar processing...")
print("This may take several minutes depending on dataset size...")

pl.execute()

print("âœ“ PDAL pipeline execution completed")
print("Generated temporary files: dtm_temp.tif, dsm_temp.tif")

Starting LiDAR processing...
This may take several minutes depending on dataset size...
âœ“ PDAL pipeline execution completed
Generated temporary files: dtm_temp.tif, dsm_temp.tif
CPU times: user 4min 25s, sys: 7.42 s, total: 4min 33s
Wall time: 4min 11s
âœ“ PDAL pipeline execution completed
Generated temporary files: dtm_temp.tif, dsm_temp.tif
CPU times: user 4min 25s, sys: 7.42 s, total: 4min 33s
Wall time: 4min 11s


**Processing Time Reference:**
- Lacey: 2m47s, 7,088,795 points  
- Humbug: 15m43s, 25,865,590 points

*Processing time scales roughly with point count and internet connection speed.*

## Gap Filling with Inverse Distance Weighting
Fill data gaps (NoData pixels) within the meadow boundary using spatial interpolation.

## Final Product Generation
Clip rasters to the exact meadow boundary and generate the Canopy Height Model.

In [None]:
# Fill NoData gaps in the Digital Terrain Model (DTM)
# Uses Inverse Distance Weighting (IDW) interpolation
# Only fills gaps within the meadow boundary to preserve data integrity
print("Filling gaps in DTM using IDW interpolation...")
result_dtm = lp.fill_nodata('dtm_temp.tif', shapefile_path)
#print(f"âœ“ {result_dtm}")

Filling gaps in DTM using IDW interpolation...
âœ“ Filled dtm_temp.tif and saved to dtm_temp_filled.tif
âœ“ Filled dtm_temp.tif and saved to dtm_temp_filled.tif


In [None]:
# Fill NoData gaps in the Digital Surface Model (DSM)
# Uses the same IDW interpolation method as DTM
print("Filling gaps in DSM using IDW interpolation...")
result_dsm = lp.fill_nodata('dsm_temp.tif', shapefile_path)
print(f"âœ“ {result_dsm}")

Filling gaps in DSM using IDW interpolation...


In [15]:
# Load meadow boundary and reproject to match raster coordinate system
# EPSG:6339 = NAD83(2011) / Colorado Central (commonly used for Colorado LiDAR)
print("Loading meadow boundary and reprojecting to EPSG:6339...")
gdf = gpd.read_file(shapefile_path)
gdf = gdf.to_crs(6339)
print(f"âœ“ Boundary loaded with {len(gdf)} feature(s)")

Loading meadow boundary and reprojecting to EPSG:6339...
âœ“ Boundary loaded with 1 feature(s)


In [16]:
# Clip the filled DTM to the exact meadow boundary
# This creates the final DTM product: dtm_clipped.tif
print("Clipping DTM to meadow boundary...")
result_dtm_clip = lp.clip_and_rename_raster('dtm_temp_filled.tif', gdf)
print(f"âœ“ {result_dtm_clip}")

Clipping DTM to meadow boundary...
âœ“ Saved as: dtm_clipped.tif


In [17]:
# Clip the filled DSM to the exact meadow boundary  
# This creates the final DSM product: dsm_clipped.tif
print("Clipping DSM to meadow boundary...")
result_dsm_clip = lp.clip_and_rename_raster('dsm_temp_filled.tif', gdf)
print(f"âœ“ {result_dsm_clip}")

Clipping DSM to meadow boundary...
âœ“ Saved as: dsm_clipped.tif


In [18]:
# Generate Canopy Height Model (CHM) by subtracting DTM from DSM
# CHM = DSM - DTM (vegetation height above ground)
# Negative values are set to 0 (below-ground artifacts)
print("Generating Canopy Height Model (CHM)...")
result_chm = lp.chm('dsm_clipped.tif', 'dtm_clipped.tif', gdf)
print(f"âœ“ {result_chm}")
print("CHM represents vegetation height above ground surface")

Generating Canopy Height Model (CHM)...
âœ“ Saved as: chm_clipped.tif
CHM represents vegetation height above ground surface


## File Cleanup and Naming
Remove temporary files and rename final products with the meadow name for easy identification.

## Processing Summary
Complete the workflow and return to the parent directory.

In [19]:
# Get list of all TIFF files in the working directory
import glob

print("Scanning for TIFF files to clean up...")
tif_files = glob.glob(os.path.join(os.getcwd(), "*.tif"))
print(f"Found {len(tif_files)} TIFF files")

Scanning for TIFF files to clean up...
Found 7 TIFF files


In [20]:
# Clean up temporary files and rename final products
print("Cleaning up temporary files and renaming final products...")

files_deleted = 0
files_renamed = 0

for file_path in tif_files:
    filename = os.path.basename(file_path)

    # Delete temporary/intermediate files (keep only _clipped.tif files)
    if "_clipped.tif" not in filename:
        os.remove(file_path)
        print(f"  Deleted: {filename}")
        files_deleted += 1
        continue

    # Rename final products to include meadow name
    # e.g., "dtm_clipped.tif" becomes "dtm_Lacey.tif"
    if filename.endswith("_clipped.tif"):
        new_filename = filename.replace("_clipped.tif", f"_{name}.tif")
        new_path = os.path.join(os.getcwd(), new_filename)
        os.rename(file_path, new_path)
        print(f"  Renamed: {filename} â†’ {new_filename}")
        files_renamed += 1

print(f"\nâœ“ Cleanup complete: {files_deleted} files deleted, {files_renamed} files renamed")

Cleaning up temporary files and renaming final products...
  Deleted: dsm_temp.tif
  Renamed: dtm_clipped.tif â†’ dtm_Lacey.tif
  Renamed: dsm_clipped.tif â†’ dsm_Lacey.tif
  Deleted: dtm_temp_filled.tif
  Deleted: dsm_temp_filled.tif
  Renamed: chm_clipped.tif â†’ chm_Lacey.tif
  Deleted: dtm_temp.tif

âœ“ Cleanup complete: 4 files deleted, 3 files renamed


In [22]:
# Return to the parent directory (notebooks folder)
os.chdir("..")
current_dir = os.getcwd()
print(f"âœ“ Returned to: {current_dir}")

# Summarize the final products created
output_dir = Path(f"data/{name}")
final_products = [
    f"dtm_{name}.tif",      # Digital Terrain Model (bare earth)
    f"dsm_{name}.tif",      # Digital Surface Model (first surface)  
    f"chm_{name}.tif"       # Canopy Height Model (vegetation height)
]

print(f"\nðŸŽ‰ LiDAR processing completed successfully!")
print(f"Final products saved to: {output_dir.absolute()}")
print("\nGenerated files:")
for product in final_products:
    product_path = product
    if product_path.exists():
        print(f"  âœ“ {product}")
    else:
        print(f"  âœ— {product} (not found)")
        
print(f"\nThese files can now be used for meadow assessment and stream analysis.")

âœ“ Returned to: /media/grendel/7db216a7-836f-4e8d-b439-e4f999cedb23/USGS/meadow_assessment

ðŸŽ‰ LiDAR processing completed successfully!
Final products saved to: /media/grendel/7db216a7-836f-4e8d-b439-e4f999cedb23/USGS/meadow_assessment/data/Lacey

Generated files:


AttributeError: 'str' object has no attribute 'exists'