# Generate DSM from USGS 3D Elevation Program (3DEP) lidar data corresponding to Worldview Mount Baker imagery to train DeepDEM

This notebook has been repurposed from the OpenTopography GitHub repository [here](https://github.com/OpenTopography/OT_3DEP_Workflows/blob/main/notebooks/01_3DEP_Generate_DEM_User_AOI.ipynb)

## Purpose

This notebook enables users to download USGS LIDAR data from Amazon Web Services (AWS) S3 public bucket and process them for use with DeepDEM as training data.

#### Specific features of this notebook

1. The notebook reads in the bounds of the Worldview imagery over which we would like to generate a DSM

2. Send an API request to <a href="https://registry.opendata.aws/usgs-lidar/"> Amazon Web Services (AWS) EPT (Entwine Point Tile) S3 bucket</a> returns 3DEP point cloud data within user-defined AOI. 

3. Process returned pointcloud using PDAL.

4. Create a Digital Surface Model (DSM) with user-specifed resolution, gridding method, and file type.

In [None]:
# Data parsing imports
import json

# GIS imports
import geopandas as gpd
from shapely.geometry import Polygon
import rasterio

# PDAL imports
import pdal

# Misc imports
from pathlib import Path

In [None]:
def build_pdal_pipeline(extent_polygon, usgs_3dep_dataset_names, pc_resolution, filterNoise = False,
                        reclassify = False, savePointCloud = True, outCRS = 3857, pc_outName = 'filter_test', 
                        pc_outType = 'laz'):

    """
    This method builds a PDAL pipeline for requesting, processing, and saving point cloud data. Each processing step is a 'stage' 
    in the final pdal pipeline. Each stages is appended to the 'pointcloud_pipeline' object to produce the final pipeline.
    
    Parameters:
    extent_polygon (shapely polygon): Polygon for user-defined AOI in Web Mercator projection (EPS:3857)
    usgs_3dep_dataset_names (str): List of name of the 3DEP dataset(s) that the data will be obtained.
    pc_resolution (float): The desired resolution of the pointcloud
                            
    filterNoise (bool): Option to remove points from USGS Class 7 (Low Noise) and Class 18 (High Noise).
    reclassify (bool): Option to remove USGS classes and run SMRF to classify ground points only. Default == False.
    savePointCloud (bool): Option to save (or not) the point cloud data. If savePointCloud == False, 
           the pc_outName and pc_outType parameters are not used and can be any value.
    outCRS (int): Output coordinate reference systemt (CRS), specified by ESPG code (e.g., 3857 - Web Mercator)
    pc_outName (str): Desired name of file on user's local file system. If savePointcloud = False, 
                  pc_outName can be in value.
    pc_outType (str):  Desired file extension. Input must be either 'las' or 'laz'. If savePointcloud = False, 
                  pc_outName can be in value. If a different file type is requested,the user will get error.
    
    Returns:
        pointcloud_pipeline (dict): Dictionary of processing stages in sequential order that define PDAL pipeline.

    Raises: 
        Exception: If user passes in argument that is not 'las' or 'laz'.
    """
    
    #this is the basic pipeline which only accesses the 3DEP data
    readers = []
    for name in usgs_3dep_dataset_names:
        url = "https://s3-us-west-2.amazonaws.com/usgs-lidar-public/{}/ept.json".format(name)
        reader = {
            "type": "readers.ept",
            "filename": str(url),
            "polygon": str(extent_polygon),
            "requests": 3,
            "resolution": pc_resolution
        }
        readers.append(reader)
        
    pointcloud_pipeline = {
            "pipeline":
                readers
    }
    
    if filterNoise == True:
        
        # Class 7 and 18 correspond to "Low Noise" and "High Noise". See USGS document "Lidar Base Specification, Techniques and Methods"
        # These data will be filtered out when processing the point cloud

        filter_stage = {
            "type":"filters.range",
            "limits":"Classification![7:7], Classification![18:18]" 
        }
        
        pointcloud_pipeline['pipeline'].append(filter_stage)
    
    if reclassify == True:
        
        remove_classes_stage = {
            "type":"filters.assign",
            "value":"Classification = 0"
        }
        
        classify_ground_stage = {
            "type":"filters.smrf"
        }
        
        reclass_stage = {
            "type":"filters.range",
            "limits":"Classification[2:2]"
        }

        pointcloud_pipeline['pipeline'].append(remove_classes_stage)
        pointcloud_pipeline['pipeline'].append(classify_ground_stage)
        pointcloud_pipeline['pipeline'].append(reclass_stage)
        
    reprojection_stage = {
        "type":"filters.reprojection",
        "out_srs":"EPSG:{}".format(outCRS)
    }
    
    pointcloud_pipeline['pipeline'].append(reprojection_stage)
    
    if savePointCloud == True:
        
        if pc_outType == 'las':
            savePC_stage = {
                "type": "writers.las",
                "filename": str(pc_outName)+'.'+ str(pc_outType),
            }
        elif pc_outType == 'laz':    
            savePC_stage = {
                "type": "writers.las",
                "compression": "laszip",
                "filename": str(pc_outName)+'.'+ str(pc_outType),
            }
        else:
            raise Exception("pc_outType must be 'las' or 'laz'.")

        pointcloud_pipeline['pipeline'].append(savePC_stage)
        
    return pointcloud_pipeline

def make_DEM_pipeline(extent_polygon, usgs_3dep_dataset_name, pc_resolution, dem_resolution,
                      filterNoise = True, reclassify = False, savePointCloud = False, outCRS = 3857,
                      pc_outName = 'filter_test', pc_outType = 'laz', demType = 'dtm', gridMethod = 'idw', 
                      dem_outName = 'dem_test', driver = "GTiff"):
    
    """
    Build pdal pipeline for creating a digital elevation model (DEM) product from the requested point cloud data. The 
    user must specify whether a digital terrain (bare earth) model (DTM) or digital surface model (DSM) will be created, 
    the output DTM/DSM resolution, and the gridding method desired. 

    The `build_pdal_pipeline() method is used to request the data from the Amazon Web Services ept bucket, and the 
    user may define any processing steps (filtering, reclassifying, reprojecting). The user must also specify whether 
    the point cloud should be saved or not. Saving the point cloud is not necessary for the generation of the DEM. 

    Parameters:
        extent_epsg3857 (shapely polygon): User-defined AOI in Web Mercator projection (EPS:3857). Polygon is generated 
                                           either through the 'handle_draw' methor or by inputing their own shapefile.
                                           This parameter is set automatically when the user-defined AOI is chosen.
        usgs_3dep_dataset_names (list): List of name of the 3DEP dataset(s) that the data will be obtained. This parameter is set 
                                        determined through intersecttino of the 3DEP and AOI polys.
        pc_resolution (float): The desired resolution of the pointcloud based on the following definition:

                        Source: https://pdal.io/stages/readers.ept.html#readers-ept
                            A point resolution limit to select, expressed as a grid cell edge length. 
                            Units correspond to resource coordinate system units. For example, 
                            for a coordinate system expressed in meters, a resolution value of 0.1 
                            will select points up to a ground resolution of 100 points per square meter.
                            The resulting resolution may not be exactly this value: the minimum possible 
                            resolution that is at least as precise as the requested resolution will be selected. 
                            Therefore the result may be a bit more precise than requested.

        pc_outName (str): Desired name of file on user's local file system. If savePointcloud = False, 
                          pc_outName can be in value.
        pc_outType (str): Desired file extension. Input must be either 'las' or 'laz'. If savePointcloud = False, 
                          pc_outName can be in value. If a different file type is requested,the user will get error.
    
        dem_resolution (float): Desired grid size (in meters) for output raster DEM 
        filterNoise (bool): Option to remove points from USGS Class 7 (Low Noise) and Class 18 (High Noise).
        reclassify (bool): Option to remove USGS classes and run SMRF to classify ground points only. Default == False.
        savePointCloud (bool): Option to save (or not) the point cloud data. If savePointCloud == False, the pc_outName 
                               and pc_outType parameters are not used and can be any value.

        outCRS (int): Output coordinate reference systemt (CRS), specified by ESPG code (e.g., 3857 - Web Mercator)
        pc_outName (str): Desired name of file on user's local file system. If savePointcloud = False, 
                          pc_outName can be in value.
        pc_outType (str): Desired file extension. Input must be either 'las' or 'laz'. If a different file type is requested,
                    the user will get error stating "Extension must be 'las' or 'laz'". If savePointcloud = False, 
                    pc_outName can be in value.
        demType (str): Type of DEM produced. Input must 'dtm' (digital terrain model) or 'dsm' (digital surface model).
        gridMethod (str): Method used. Options are 'min', 'mean', 'max', 'idw'.
        dem_outName (str): Desired name of DEM file on user's local file system.
        driver (str): File format. Default is GTIFF
    
    Returns:
        dem_pipeline (dict): Dictionary of processing stages in sequential order that define PDAL pipeline.
    Raises: 
        Exception: If user passes in argument that is not 'las' or 'laz'.
        Exception: If user passes in argument that is not 'dtm' or 'dsm'

    """
    if demType not in ["dtm", "dsm"]:
        raise Exception("demType must be either 'dsm' or 'dtm'")

    dem_pipeline = build_pdal_pipeline(extent_polygon, usgs_3dep_dataset_name, pc_resolution,
                                              filterNoise, reclassify, savePointCloud, outCRS, pc_outName, pc_outType)

    dem_stage = {
            "type":"writers.gdal",
            "filename":str(dem_outName),
            "gdaldriver":driver,
            "nodata":-9999,
            "output_type":gridMethod,
            "radius":1,
            "resolution":float(dem_resolution),
            "gdalopts":"COMPRESS=LZW,TILED=YES,blockxsize=256,blockysize=256,COPY_SRC_OVERVIEWS=YES"
    }
    
    if demType == 'dtm':
        groundfilter_stage = {
                "type":"filters.range",
                "limits":"Classification[2:2]"
        }

        dem_pipeline['pipeline'].append(groundfilter_stage)
        
    dem_pipeline['pipeline'].append(dem_stage)
    
    return dem_pipeline

<a name="Data-Access-and-Processing"></a>
## Data Access and Processing
Now that we have the required modules imported and functions defined, we can proceed with defining our area of interest (AOI), accessing/processing the 3DEP data from the Amazon Web Services EPT bucket. 

### Get 3DEP Dataset Boundary Polygons  
Boundaries of the 3DEP dataset are stored as a geojson file on the USGS LIDAR GitHub repo. This repository includes a copy of the file from 2024. For a more up-to-date version, visit https://github.com/hobuinc/usgs-lidar/. 

For this notebook, we load the GeoJSON file into a geopandas dataframe to find intersection with our area of interest.

In [None]:
with open('../data/shapefiles/resources.geojson', 'r') as f:
    df = gpd.read_file(f)
    names = df['name']
    urls = df['url']
    num_points = df['count']

geometries_GCS = df['geometry']
df.set_crs(4326)
df_3857 = df.to_crs(3857)

print('Done. 3DEP polygons downloaded and projected to Web Mercator (EPSG:3857)')

Next, we will derive polygons from the Worldview imagery

In [None]:
# specify path to location of ortho images
datapath = Path('../data/baker_csm_stack/original_rasters')
left_ortho = datapath / 'final_ortho_left_1.0m_holes_filled.tif'
right_ortho = datapath / 'final_ortho_right_1.0m_holes_filled.tif'

# # read bounds and create corresponding shapes
with rasterio.open(left_ortho) as ds:
    left_bounds = ds.bounds
    left_crs = ds.crs

with rasterio.open(right_ortho) as ds:
    right_bounds = ds.bounds
    right_crs = ds.crs

assert left_crs == right_crs, "CRS of input files must be the same!"

polygon1 = Polygon.from_bounds(*left_bounds)
polygon2 = Polygon.from_bounds(*right_bounds)

user_aoi = polygon1.intersection(polygon2)

In [None]:
# we need the bounds of the polygon to be in EPST 3857. Let's load the geometry into a dataframe and do the conversion
user_aoi_gdf = gpd.GeoDataFrame({'geometry':[user_aoi]}, crs=right_crs).to_crs(3857)
user_aoi = user_aoi_gdf.geometry.union_all() # dissolves geometry column into a single polygon
user_aoi_wkt = user_aoi.wkt

### Find 3DEP Polygon(s) Intersecting AOI
Let us now find which 3DEP data intersect the AOI given by the Worldview images.

In [None]:
intersecting_polys = []

for i, geom in enumerate(df_3857.geometry):
    if geom.intersects(user_aoi):
        intersecting_polys.append((names[i], df.iloc[i].geometry, df_3857.loc[i].geometry, df.iloc[i].url, df.iloc[i].count))
        
usgs_dataset_names = [x[0] for x in intersecting_polys]
print("Names of intersecting datasets in the 3DEP database: ", usgs_dataset_names)

In [None]:
# Specify pointcloud resolution needed

pointcloud_file =  datapath / 'ot_mtbaker_dsm_pointcloud'
pointcloud_resolution = 1.0 # in meters


<a name="Make-Digital-Surface-Model-(DSM)"></a>
### Make Digital Surface Model (DSM)
The following cells will produce a Digital Surface Model (DSM) using all of the lidar returns in the point cloud.

In [None]:
dsm_resolution = 1.0
dsm_pipeline = make_DEM_pipeline(user_aoi_wkt, usgs_dataset_names, pointcloud_resolution, dsm_resolution,
                                 filterNoise = True, reclassify = False,  savePointCloud = False, outCRS = 32610,
                                 pc_outName = pointcloud_file, pc_outType = 'laz', demType = 'dsm', 
                                 gridMethod='idw', dem_outName = datapath / 'ot_mtbaker_dsm.tif', driver = "GTiff")

The PDAL pipeline is now constructed for making the DSM. Running the the PDAL Python bindings function ```pdal.Pipeline()``` creates the pdal.Pipeline object from a json-ized version of the pointcloud pipeline we created.

In [None]:
dsm_pipeline = pdal.Pipeline(json.dumps(dsm_pipeline))

In [None]:
%%time
dsm_pipeline.execute_streaming(chunk_size=1000000)