# GEDI Data Access

This notebook contains a workflow for easily accessing GEDI elevation data. The product accessed could be used to co-locate with snow-on and snow-off DEMs to derive and validate snow depth.

This notebook requires the earthaccess Python package. It is also adapted from work by Tasha Snow and Zachary Fair at the 2024 NASA Earth Sciences Hackweek.

We start up `earthaccess` first, so that we are properly authenticated.

In [None]:
import earthaccess

# Authenticate for accessing NASA data
auth = earthaccess.login(strategy="netrc")

# If we are not authenticated
if not auth.authenticated:
    # Ask for credentials and persist them in a .netrc file
    auth.login(strategy="interactive", persist=True)

The `earthaccess` package requires several user inputs:
* `concept-id`: The concept ID of the dataset, as logged by EarthData.
* `bounding_box`: A tuple containing the lat/lon bounds (lon_min, lat_min, lon_max, lat_max)
* `temporal`: A tuple containing the temporal bounds, in string format (YYYY-MM-DD).
* `cloud_hosted`: If `True`, then prioritize cloud-based data.
* `count`: The maximum number of files for earthaccess to return.

Most of these inputs are self-explanatory, but finding the `concept-id` can be non-trivial for `earthaccess` users. As part of SnowPit, a catalog was made that matches NASA datasets with their corresponding concept IDs.

In [None]:
import pandas as pd

# Read earthaccess concept-id catalog
df = pd.read_csv("/home/jovyan/shared-public/SnowPit/cloud_data_access_list.csv")
df = df.set_index('Dataset')

Note that the concept ID for GEDI is for the Level-2A (Elevation and Height Metrics) product, arguably the best product to derive snow depth with GEDI.

We can then access the GEDI data using the below cell. Note that this routine accesses the data directly through the cloud, rather than downloading it.

In [None]:
# User Input
bbox = (-10, 20, 10, 50)
date_range = ("1999-02", "2024-10")
cloud = True
numfiles = 10

# Create earthaccess query
results = earthaccess.search_data(
                concept_id = df.loc["GEDI"]["concept-id"],
                bounding_box = bbox,
                temporal = date_range,
                cloud_hosted = cloud,
                count = numfiles
)

We can now access the GEDI data from the results reasonably quickly through the cloud. Note that this routine loads a single data path from single file into Xarray - for multiple file/variable loading, either downloading the data (further below) or appending to a Pandas DataFrame will be more memory-efficient.

In [None]:
import xarray as xr

# Load elevation data from one of the beams
files = earthaccess.open(results)
ds = xr.open_dataset(files[1], group='/BEAM0001/', phony_dims='sort')

In [None]:
ds

In [None]:
# Plot along-track GEDI elevation from the first beam
ds['elev_lowestmode'].plot()

If the data looks promising, then we can save the files from our query for continued analysis.

In [None]:
# Save GEDI files to specified directory
downloads = earthaccess.download(results, '/home/jovyan/SnowPit/tmp/')

## Using H5Coro to Access the Data

The above workflow is useful for a first-time user, though it may be a bit slow if one wishes to look at multiple files quickly. H5Coro is a new package that makes loading HDF-5 files much faster, which is otherwise somewhat clumsy in Python.

The H5Coro python package is needed for this next workflow.

In [None]:
# Most up-to-date version of H5Coro needed to take NSIDC credentials
%pip install -U h5coro

In [None]:
from h5coro import h5coro, s3driver, webdriver

# Setup s3 credentials
s3_creds = auth.get_s3_credentials(daac="LPDAAC")

We are going to read all of the queried GEDI files, specifically loading `lat_lowestmode` (latitude), `lon_lowestmode` (longitude), and `elev_lowestmode` (elevation) from the first beam.

In [None]:
df = pd.DataFrame()
# Loop through earthaccess files
for f in files:
    # Define h5coro object
    h5obj = h5coro.H5Coro(f.full_name[5:], 
                          s3driver.S3Driver, 
                          credentials=s3_creds)

    # Define variables of interest
    variables = ['/BEAM0001/elev_lowestmode',
                 '/BEAM0001/lat_lowestmode',
                 '/BEAM0001/lon_lowestmode']

    # Get variable paths from h5coro object
    promise = h5obj.readDatasets(variables, block=True, enableAttributes=False)
    var_paths = list(promise.keys())

    # Create DataFrame for current file
    tmp = pd.DataFrame(data={'lat': promise[var_paths[0]],
                             'lon': promise[var_paths[1]],
                             'height': promise[var_paths[2]]
                                }
                          )

    # Add data to main DataFrame
    df = pd.concat([df, tmp], ignore_index=True)

display(df)

Work is currently underway by the icepyx development team to allow for full reading of HDF-5 data, notably ICESat-2 and GEDI, with a combination of icepyx and H5Coro. Stay tuned!

# Using SlideRule Earth to Access GEDI Data

The final example with GEDI involves the use of SlideRule Earth. Although the application does not have as many features for GEDI as it does ICESat-2, it remains a useful way to easily obtain pre-processed GEDI data. The ability to add an underlying DEM facilitates snow depth derivations, if quick snow depth estimates are desired.

Note that the current version of SlideRule on CryoCloud returns an error for the GEDI plug-in, so the following cell will need to be run.

In [None]:
%pip install sliderule -U

From there, the process is very similar to the SlideRule workflow for ICESat-2. Because GEDI is limited to the mid-latitudes, this example is given for Grand Mesa, CO in March 2020.

In [None]:
import numpy as np
from sliderule import gedi

In [None]:
# Initiate SlideRule
gedi.init()

In [None]:
# Define a region in Grand Mesa, CO
region = [ {"lon":-105.82971551223244, "lat": 39.81983728534918},
           {"lon":-105.30742121965137, "lat": 39.81983728534918},
           {"lon":-105.30742121965137, "lat": 40.164048017973755},
           {"lon":-105.82971551223244, "lat": 40.164048017973755},
           {"lon":-105.82971551223244, "lat": 39.81983728534918} ]

# Define the date range
date_range = ['2020-03-01', '2020-03-31']

One of the features of SlideRule is to include a DEM that is co-located with the ICESat-2/GEDI data. Here, we are using mosaics from the 3-D Elevation Program (3DEP) as the snow-off comparison.

The 3DEP data is included under `samples`, and set as a `mosaic` that samples 3DEP elevations within a 10 m radius of an IS-2 point. Zonal statistics (mean, median, standard deviation) are computed at each point.

In [None]:
time_root = 'T00:00:00Z'

# Query parameters
parms = {
             "poly": region,
             "t0": date_range[0]+time_root,
             "t1": date_range[1]+time_root,
             "samples": {"mosaic": {"asset": "usgs3dep-1meter-dem", "radius": 10.0, "zonal_stats": True}}
            }

In [None]:
# Run SlideRule
rsps = gedi.gedi04ap(parms)

# Grab only the elevation from the mosaic median column
rsps['3dep_median'] = rsps['mosaic.median'].str[0]

Before computing the residual (a rough approximation for snow depth), we are applying a correction factor to the data. This was determined by finding the median bias between 3DEP and ICESat-2 during snow-off conditions (not shown).

In [None]:
# Elevation correction factor
correction_factor = -0.62

# Derive residual between GEDI and 3DEP (inferred as snow depth during snow-on season)
rsps['residual'] = rsps['elevation'] - rsps['3dep_median'] - correction_factor

In [None]:
# Filter out very large residuals
rsps.loc[rsps['residual'].abs()>20, 'residual'] = np.nan

# Additional filtering by percentiles
lower = rsps['residual'].quantile(0.1)
upper = rsps['residual'].quantile(0.9)
rsps.loc[(rsps['residual']<lower)&(rsps['residual']>upper), 'residual'] = np.nan

Now that we've applied filtering and corrections to the data, let's take a look at the residuals.

In [None]:
# View a plot of the data
rsps.explore(column='residual', tiles='Esri.WorldImagery', vmin=0, vmax=1.5)

Overall, the residuals seem rather high. This may be due to high snow depths in the region, or because other corrections are needed.