# Data Interoperability Example

There have been some recent innovations that make accessing data simpler, including the
use of the spatio-temporal asset catalog (STAC) specification and STAC APIs to share
metadata about data that can be loaded without needing to use the Open Data Cube's database
as an index to that data.

This notebook will demonstrate an example of masking a cloudy Sentinel-2 scene differently
depending on elevation. To do this, we'll load data from the Copernicus DEM and use that to
identify high, cloudy peaks and separate the Sentinel-2 scene into two, before applying
two different cloud masks, and then merging it back into a single scene.

## Environment set up

First we import some libraries we need

In [None]:
import folium
from datacube import Datacube
from odc.algo import mask_cleanup
from odc.geo.geom import point
from odc.stac import load
from pystac.client import Client

## Location setup

Next we pick a location. This is using lon/lat coordinates to create a bounding box
to use to load data.

In [None]:
# These coords are in the order Y then X, or Latitude then Longitude
coords = -6.11, 105.41  # Krakatoa
aoi_point = point(coords[1], coords[0], crs="EPSG:4326")
bbox = aoi_point.buffer(0.06).boundingbox

# Preview the area
bbox.explore(zoom=8)

## Find and load Sentinel-2 data

Since we're using Sentinel-2 data to create a tempoaral, cloud-free mosaic, we need to
search for and then load some data. We're just loading the visual bands (`red`, `green` and `blue`)
as well as the 'scene classification' band, `scl`.

This step is lazy-loading the data using Dask, with the `dask_chunks` argument.

In [None]:
# Connect to the Datacube database
dc = Datacube()

# Find and load Sentinel-2 datasets
sentinel2_datasets = dc.find_datasets(
    product=["s2_l2a"],
    time=("2025-01", "2025-03"),
    longitude=(bbox.left, bbox.right),
    latitude=(bbox.bottom, bbox.top),
)

print(f"Found {len(sentinel2_datasets)} Sentinel-2 datasets")

data = dc.load(
    datasets=sentinel2_datasets,
    longitude=(bbox.left, bbox.right),
    latitude=(bbox.bottom, bbox.top),
    output_crs=sentinel2_datasets[0].crs,
    resolution=10,
    measurements=["red", "green", "blue", "scl"],
    group_by="solar_day",
    dask_chunks={"time": 1, "x": 1024, "y": 1024},
    resampling={
        "*": "cubic",
        "scl": "nearest",
    },
    driver="rio",
)

data

## Preview data

The next couple of steps we use to preview data, first the RGB image and then the original cloud mask.

In [None]:
TIME = 3  # Change this to pick a different time
one_time = data.isel(time=TIME).compute()  # Compute() here means "load into memory"
one_time.odc.explore(vmin=0, vmax=3000)

In [None]:
# For Sentinel-2, 3 is cloud shadow, 8 is medium probability cloud, and 9 is high probability cloud
# We could add in 10 too, to cover thin cirrus clouds, but we'll leave that out for now
cloud_mask = data.scl.isin([3, 8, 9])
one_cloud_mask = cloud_mask.isel(
    time=TIME
).compute()  # Compute() here means "load into memory"
one_cloud_mask.where(one_cloud_mask != 0).odc.explore(
    name="Cloud Mask", vmin=0, vmax=1, cmap="Greys_r"
)

## Load elevation data over the same area

In this step, we use a different method to find and load data, this time querying a STAC API
and then using the function `odc.stac.load` to create an Xarray using the STAC Item returned
from the API.

We are passing in the argument `like=data.odc.geobox` that will ask the `odc.stac.load` function
to create an xarray in exactly the same shape as our Sentinel-2 `Dataset`.

In [None]:
# Load elevation data using STAC
client = Client.open("https://earth-search.aws.element84.com/v1")
collection = "cop-dem-glo-30"

items = client.search(collections=[collection], bbox=list(bbox)).item_collection()

print(f"Found {len(items)} DEM items")

dem = load(
    items,
    like=data.odc.geobox,
    # chunks={"x": 1024, "y": 1024}  # No need to use Dask
).squeeze()  # Remove the time dimension

dem

## Preview the elevation model

Now we preview the DEM, and then create a "high" terrain mask for anything about 50 m,
effectively separating anything above approximately sea level.


In [None]:
dem.data.odc.explore(cmap="terrain", vmin=0, vmax=2000, name="DEM")

In [None]:
high = dem.data > 50
high.where(high).odc.explore(cmap="Greens", vmin=0, vmax=1, name="High Elevation")

## Cloud mask creation and exploration

The next range of cells run through some steps creating two modified cloud masks,
one for areas of terrain, and one for everywhere else.

This process uses a function `odc.algo.mask_cleanup` to do some topological cleaning
of masks. `Dilation` makes the mask bigger, while `erosion` makes it smaller. These
topological functions are used to fill holes, remove slivers and generally clean
up the mask.

The goal is to clean up the Sentinel-2 cloud mask, which can sometimes misclassify beaches
as clouds, but also handle cloudy mountain peaks, like the one at Krakatoa.

In [None]:
# Now we create two cloud masks, one for high and one for everywhere else
high_mask = one_cloud_mask.where(high)
non_high_mask = one_cloud_mask.where(~high)

high_mask = high_mask.where(high_mask != 0)
non_high_mask = non_high_mask.where(non_high_mask != 0)

m = folium.Map(location=coords, zoom_start=12)

one_time.odc.to_rgba(vmin=0, vmax=3000).odc.add_to(m, name="Sentinel-2 Data")
high_mask.odc.add_to(m, name="High Elevation Cloud Mask", vmin=0, vmax=1, cmap="Reds")
non_high_mask.odc.add_to(
    m, name="Elsewhere Elevation Cloud Mask", vmin=0, vmax=1, cmap="Blues"
)

# Add a layer control to toggle between the two cloud masks
folium.LayerControl().add_to(m)

m

In [None]:
# Now we want to treat the two cloud masks differently.
# For high areas, we want to use a more aggressive cloud mask
# For other areas, we want to try to not mask out beaches, which are often misclassified as clouds

# For this one, we dilate by 10, which expands the cloud mask, then we dilate again by 5 to clean up the edges
high_cleanup_config = (("dilation", 10), ("erosion", 5))
high_mask_cleaned = mask_cleanup(high_mask == 1, high_cleanup_config)
high_mask_cleaned = high_mask_cleaned.where(high_mask_cleaned != 0)

m = folium.Map(location=coords, zoom_start=12)

one_time.odc.to_rgba(vmin=0, vmax=3000).odc.add_to(m, name="Sentinel-2 Data")
high_mask_cleaned.odc.add_to(
    m, name="High Elevation Cloud Mask Cleaned", vmin=0, vmax=1, cmap="Reds"
)
high_mask.odc.add_to(m, name="High Elevation Cloud Mask", vmin=0, vmax=1, cmap="Blues")
folium.LayerControl().add_to(m)

m

In [None]:
# For this one, we erode first, to remove slivers and small clouds, then dilate to expand the mask
# This helps to avoid masking out small features like beaches
non_high_cleanup_config = (("erosion", 5), ("dilation", 6))
non_high_mask_cleaned = mask_cleanup(non_high_mask == 1, non_high_cleanup_config)
non_high_mask_cleaned = non_high_mask_cleaned.where(non_high_mask_cleaned != 0)

m = folium.Map(location=coords, zoom_start=12)

one_time.odc.to_rgba(vmin=0, vmax=3000).odc.add_to(m, name="Sentinel-2 Data")
non_high_mask_cleaned.odc.add_to(
    m, name="Elsewhere Elevation Cloud Mask Cleaned", vmin=0, vmax=1, cmap="Reds"
)
non_high_mask.odc.add_to(
    m, name="Elsewhere Elevation Cloud Mask", vmin=0, vmax=1, cmap="Blues"
)
folium.LayerControl().add_to(m)

m

In [None]:
# Preview the combined masks first
combined = (high_mask_cleaned == 1) | (non_high_mask_cleaned == 1)
combined.where(combined != 0).odc.explore(
    cmap="Purples", vmin=0, vmax=1, name="Combined Cloud Mask"
)

## Apply the masks and create a mosaic

Finally, we can apply the two different cloud masks. One is the original Sentinel-2 SCL
mask, with no modifications, and the other is a combination of the high and elsewhere
modified masks, which we hope will lead to a better result.

In [None]:
# Now we'll apply the original mask to the data
# And separately apply the combined mask to the data
# Compute a median of them both, and explore the result

# Apply the original cloud mask to the data
data_simple_mask = data.where(~cloud_mask)

# Redo the high and non-high masks to all data
high_mask = cloud_mask.where(dem.data > 50)
non_high_mask = cloud_mask.where(~(dem.data > 50))

high_mask_cleaned = mask_cleanup(high_mask == 1, high_cleanup_config)
non_high_mask_cleaned = mask_cleanup(non_high_mask == 1, non_high_cleanup_config)
combined_mask = high_mask_cleaned | non_high_mask_cleaned

data_complex_mask = data.where(~combined_mask)

# Compute the median of the two masked datasets
data_median = data_simple_mask.median(dim="time").compute()
data_complex_median = data_complex_mask.median(dim="time").compute()

In [None]:
# Explore the results
m = folium.Map(location=coords, zoom_start=12)
data_median.odc.to_rgba(vmin=0, vmax=3000).odc.add_to(m, name="Simple Mask Median")
data_complex_median.odc.to_rgba(vmin=0, vmax=3000).odc.add_to(
    m, name="Complex Mask Median"
)

folium.LayerControl().add_to(m)

m


## Consider the results

While this is a complicated example, and there are only subtle differences, if you look at the island
to the south east, the lower right of the image, and zoom to the southern coast, you can see a better result
in that there is less noise there. And at the mountain peak, there is more clear data, and less of a
hole (i.e., areas where there were no clear observations).

### What we learnt

We have successfully used two datasets, Sentinel-2 and the Copernicus Digital Elevation Model to
undertake a complicated analysis using cloud masking to create a temporal mosaic.

The two datasets were loaded two different ways, one using the Open Data Cube and the other using
ODC STAC.