# Contextual data

Once you've identified areas of interest where multiple datasets intersect, you can pull additional data to provide further context. For example:

1. landcover 
2. global elevation data 

In [None]:
import coincident
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import rasterio

%matplotlib inline

## Identify a primary dataset

Start by loading a full resolution polygon of a 3DEP LiDAR workunit which has a known start_datetime and end_datatime:

In [None]:
workunit = "CO_WestCentral_2019"
df_wesm = coincident.search.wesm.read_wesm_csv()
gf_lidar = coincident.search.wesm.load_by_fid(
    df_wesm[df_wesm.workunit == workunit].index
)

gf_lidar

In [None]:
search_aoi = gf_lidar.simplify(0.01)

## Search for coincident contextual data

Coincident provides two convenience functions to load datasets resulting from a given search. 

In [None]:
gf_wc = coincident.search.search(
    dataset="worldcover",
    intersects=search_aoi,
    # worldcover is just 2020 an 2021, so pick one
    datetime=["2020"],
)  # Asset of interest = 'map'

In [None]:
# STAC metadata always has a "stac_version" column
gf_wc.iloc[0].stac_version

In [None]:
gf_cop30 = coincident.search.search(
    dataset="cop30",
    intersects=search_aoi,
)  # Asset of interest = 'data'
gf_cop30.iloc[0].stac_version

### STAC search results to datacube

If the results have [STAC-formatted metadata](https://stacspec.org/en). We can take advantage of the excellent [odc.stac](https://odc-stac.readthedocs.io/) tool to load datacubes. Please refer to odc.stac documentation for all the configuration options (e.g. setting resolution our output CRS, etc.)

By default this uses the [dask](https://www.dask.org/) parallel computing library to intelligently load only metadata initially and defer reading values until computations or visualizations are performed.

#### Copernicus DEM

In [None]:
ds = coincident.io.xarray.to_dataset(
    gf_cop30,
    aoi=search_aoi,
    # chunks=dict(x=2048, y=2048), # manual chunks
    resolution=0.00081,  # ~90m
    mask=True,
)

In [None]:
# By default, these are dask arrays
ds

In [None]:
# You might want to rename the data variable
ds = ds.rename(data="elevation")

In [None]:
# The total size of this dataset is only 14MB, so for faster computations, load it into memory
print(ds.nbytes / 1e6)
ds = ds.compute()

In [None]:
ds.elevation.isel(time=0).plot.imshow();

#### ESA Worldcover

In [None]:
# Same with LandCover
dswc = coincident.io.xarray.to_dataset(
    gf_wc,
    bands=["map"],
    aoi=search_aoi,
    mask=True,
    # resolution=0.00027, #~30m
    resolution=0.00081,  # ~90m
)
dswc

In [None]:
dswc = dswc.rename(map="landcover")

In [None]:
dswc = dswc.compute()

In [None]:
# For landcover there is a convenicence function for a nice categorical colormap
ax = coincident.plot.plot_esa_worldcover(dswc)
ax.set_title("ESA WorldCover");

## Load gridded elevation in a consistent CRS

`coincident` also has a convenience function for loading gridded elevation datasets in a consistent CRS. In order to facilitate comparison with modern altimetry datasets (ICESat-2, GEDI), we convert elevation data on-the-fly to [EPSG:7912](https://spatialreference.org/ref/epsg/7912/). 3D Coordinate Reference Frames are a complex topic, so please see this resource for more detail on how these conversions are done: https://uw-cryo.github.io/3D_CRS_Transformation_Resources/ 

```{note}
Currently gridded DEMs are retrieved from [OpenTopography](https://opentopography.org/) hosted in AWS us-west-2.
```

```{warning}
 The output of `load_dem_7912()` uses the native resolution and grid of the input dataset, and data is immediately read into memory. So this method is better suited to small AOIs.
```

In [None]:
# Start with a small AOI:
from shapely.geometry import box
import geopandas as gpd

aoi = gpd.GeoDataFrame(
    geometry=[box(-106.812163, 38.40825, -106.396812, 39.049796)], crs="EPSG:4326"
)

In [None]:
da_cop = coincident.io.xarray.load_dem_7912("cop30", aoi=aoi)
da_cop

In [None]:
# NASA DEM and COP30 DEM are both 30m but will not necessarily have same coordinates!
da_nasa = coincident.io.xarray.load_dem_7912("nasadem", aoi=aoi)
da_nasa

In [None]:
# We check that coordinates are sufficiently close before force-alignment.
# If sufficiently different, it would be better to use rioxarray.reproject_match!
xr.testing.assert_allclose(da_cop.x, da_nasa.x)  # rtol=1e-05, atol=1e-08 defaults
xr.testing.assert_allclose(da_cop.y, da_nasa.y)
da_nasa = da_nasa.assign_coords(x=da_cop.x, y=da_cop.y)

In [None]:
diff = da_cop - da_nasa
median = diff.median()
mean = diff.mean()
std = diff.std()

fig = plt.figure(layout="constrained", figsize=(4, 8))  # figsize=(8.5, 11)
ax = fig.add_gridspec(top=0.75).subplots()
ax_histx = ax.inset_axes([0, -0.35, 1, 0.25])
axes = [ax, ax_histx]
diff.plot.imshow(
    ax=axes[0], robust=True, add_colorbar=False
)  # , cbar_kwargs={'label': ""})
n, bins, ax = diff.plot.hist(ax=axes[1], bins=100, range=(-20, 20), color="gray")
approx_mode = bins[np.argmax(n)]
axes[0].set_title("COP30 - NASADEM")
axes[0].set_aspect(aspect=1 / np.cos(np.deg2rad(38.7)))
axes[1].axvline(0, color="k")
axes[1].axvline(median, label=f"median={median:.2f}", color="cyan", lw=1)
axes[1].axvline(mean, label=f"mode={mean:.2f}", color="magenta", lw=1)
axes[1].axvline(approx_mode, label=f"mode={approx_mode:.2f}", color="yellow", lw=1)
axes[1].set_xlabel("Elevation difference (m)")
axes[1].legend()
axes[1].set_title("");

```{note}
We see that COP30 elevation values are approximately 1m greater than NASADEM values for this area. Such a difference is within the stated accuracies of each gridded dataset, and also COP30 is derived from X-band TanDEM-X observations with nominal time of 2021-04-22 whereas NASADEM is primarily based on C-band SRTM collected 2000-02-20.
```

In [None]:
# Load 3DEP 10m data
da_3dep = coincident.io.xarray.load_dem_7912("3dep", aoi=aoi)

In [None]:
# To compare to 3dep, we refine cop30 from 30m to 10m using bilinear resampling
da_cop_r = da_cop.rio.reproject_match(
    da_3dep, resampling=rasterio.enums.Resampling.bilinear
)
da_cop_r

In [None]:
diff = da_3dep - da_cop_r
median = diff.median()
mean = diff.mean()
std = diff.std()

fig = plt.figure(layout="constrained", figsize=(4, 8))  # figsize=(8.5, 11)
ax = fig.add_gridspec(top=0.75).subplots()
ax_histx = ax.inset_axes([0, -0.35, 1, 0.25])
axes = [ax, ax_histx]
diff.plot.imshow(
    ax=axes[0], robust=True, add_colorbar=False
)  # , cbar_kwargs={'label': ""})
n, bins, ax = diff.plot.hist(ax=axes[1], bins=100, range=(-20, 20), color="gray")
approx_mode = bins[np.argmax(n)]
axes[0].set_title("3DEP - COP30")
axes[0].set_aspect(aspect=1 / np.cos(np.deg2rad(38.7)))
axes[1].axvline(0, color="k")
axes[1].axvline(median, label=f"median={median:.2f}", color="cyan", lw=1)
axes[1].axvline(mean, label=f"mode={mean:.2f}", color="magenta", lw=1)
axes[1].axvline(approx_mode, label=f"mode={approx_mode:.2f}", color="yellow", lw=1)
axes[1].set_xlabel("Elevation difference (m)")
axes[1].legend()
axes[1].set_title("");

```{note}
There is a clear spatial and terrain dependence for the residuals of 10m 3DEP LiDAR compared to COP30 elevation values for this area! 
```