# Load Senitnel-2 data from the CDSE STAC catalog

This notebook shows an example how to load Sentinel-2 data from the [CDSE STAC API](https://documentation.dataspace.copernicus.eu/APIs/STAC.html).

### Setup
In order to run this notebook you may install [`xcube_stac`](https://github.com/xcube-dev/xcube-stac). You may install [`xcube_stac`](https://github.com/xcube-dev/xcube-stac) directly from the git repository by cloning the repository, directing into `xcube-stac`, and following the steps below:

```bash
conda env create -f environment.yml
conda activate xcube-stac
pip install .
```

Note that [`xcube_stac`](https://github.com/xcube-dev/xcube-stac) is a plugin of [`xcube`](https://xcube.readthedocs.io/en/latest/), where `xcube` is included in the `environment.yml`.  

The data can be accessed via S3, where key and secret can be obtained following the [CDSE access documentation to EO data via S3](https://documentation.dataspace.copernicus.eu/APIs/S3.html#generate-secrets). The store object will receive the key and secret upon initialization, as demonstrated below.

Now, we first import everything we need:

In [1]:
%%time
from xcube.core.store import new_data_store, get_data_store_params_schema
import itertools
import rasterio

CPU times: user 3.52 s, sys: 297 ms, total: 3.82 s
Wall time: 1.95 s


Next store the credentials in a dictionary. 

In [2]:
credentials = {
    "key": "O0M0CUQIDQO9TDZ4D8NR",
    "secret": "qPUyXs9G6j8on6MY5KPhQNHuA5uZTqxEscrbBCGx"
}

First, we get the store parameters needed to initialize a STAC [data store](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework). Note that key and secret of the S3 access are required.

In [3]:
%%time
store_params = get_data_store_params_schema("stac-cdse")
store_params

CPU times: user 784 ms, sys: 111 ms, total: 895 ms
Wall time: 896 ms


<xcube.util.jsonschema.JsonObjectSchema at 0x78499bbb7680>

Note that the user does not need to provide the URL for the [CDSE STAC API](https://documentation.dataspace.copernicus.eu/APIs/STAC.html). Only the key and secret for S3 access are required when initializing a `stac-cdse` [data store](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework). First, we will initialize a store supporting the stacking mode. Then, for completeness, we will initialize a store in single-tile mode.

In [4]:
%%time
store = new_data_store("stac-cdse", stack_mode=True, **credentials)

CPU times: user 38.7 ms, sys: 9.29 ms, total: 48 ms
Wall time: 284 ms


The data IDs point to a [STAC item's JSON](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md) and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example.

In [5]:
%%time
data_ids = store.list_data_ids()
data_ids

CPU times: user 10.2 ms, sys: 3.01 ms, total: 13.2 ms
Wall time: 88.8 ms


['COP-DEM',
 'S2GLC',
 'TERRAAQUA',
 'SENTINEL-3',
 'SENTINEL-5P',
 'SENTINEL-1-RTC',
 'SENTINEL-1',
 'SMOS',
 'LANDSAT-7',
 'CCM',
 'LANDSAT-5',
 'ENVISAT',
 'LANDSAT-8-ESA',
 'SENTINEL-6',
 'GLOBAL-MOSAICS',
 'SENTINEL-2']

Below, the parameters for the `open_data` method can be viewed. Note that `processing_level` and and [`processing_baseline`](https://sentiwiki.copernicus.eu/web/s2-processing#S2Processing-ProcessingBaseline) can be selected. 

In [6]:
%%time
open_params = store.get_open_data_params_schema()
open_params

CPU times: user 26 μs, sys: 3 μs, total: 29 μs
Wall time: 31.5 μs


<xcube.util.jsonschema.JsonObjectSchema at 0x784993423650>

So far, only data from the collection `SENTINEL-2` can be accessed. We therefore assign `data_id` to `"SENTINEL-2"`. We set the bounding box to cover the greater Hamburg area and the time range to July 2020. Additionally, we assign the processing level to `"L2A"` and the processing baseline to `"5.00"`.

We can apply a reprojection by setting `crs` to the desired output projection. The odc-stac library uses [`rasterio.wrap.reproject`](https://rasterio.readthedocs.io/en/stable/topics/reproject.html) to handle the reprojection. All arguments of [`odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html) can be passed into the `open_data(...)` method, which forwards them to the `odc.stac.load` function.

In [7]:
%%time
bbox = [9.5, 53., 10.5, 54.]
time_range = ["2020-07-01", "2020-07-15"]

CPU times: user 6 μs, sys: 1 μs, total: 7 μs
Wall time: 11.4 μs


In [8]:
%%time
ds = store.open_data(
    data_id="SENTINEL-2",
    bbox=bbox,
    time_range=time_range,
    processing_level="L2A",
    processing_baseline="5.00",
    resolution=0.00018,
    crs="EPSG:4326",
)
ds

/home/konstantin/micromamba/envs/xcube-stac/lib/python3.12/site-packages/pystac_client/item_search.py:447: DoesNotConformTo: Server does not conform to SORT
Exception ignored in: <bound method IPythonKernel._clean_thread_parent_frames of <ipykernel.ipkernel.IPythonKernel object at 0x784a66d9d220>>
Traceback (most recent call last):
  File "/home/konstantin/micromamba/envs/xcube-stac/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 775, in _clean_thread_parent_frames
    def _clean_thread_parent_frames(

KeyboardInterrupt: 


KeyboardInterrupt: 

Note that the search function in the CDSE STAC API is very slow. Further investigation and comparison with other STAC APIs is needed. 

Since the tiles of the Sentinel-2 L2A in this STAC catalog are stored as `.jp2`, we can also open the data as a [xcube's multi-resolution dataset](https://xcube.readthedocs.io/en/latest/mldatasets.html#xcube-multi-resolution-datasets), where each level is linked to an overview level of the data pyramid. The number of levels can be extracted as shown below. 

In [None]:
%%time
mlds = store.open_data(
    data_id="SENTINEL-2",
    data_type="mldataset",
    bbox=bbox,
    time_range=time_range,
    processing_level="L2A",
    processing_baseline="5.00",
    crs="EPSG:4326",
)
mlds.num_levels

To get a specific level, we use the `get_dataset(idx_level)` method.

In [None]:
%%time
ds = mlds.get_dataset(2)
ds

We can plot a tile of the B04 (red) band for a given timestamp as an example. Note that the bounding box encompasses all tiles within the dataset, so without mosaicking, many of the pixels will be NaN values.

In [None]:
%%time
session = rasterio.session.AWSSession(
    aws_unsigned=False,
    endpoint_url="eodata.dataspace.copernicus.eu",
    aws_access_key_id=credentials["key"],
    aws_secret_access_key=credentials["secret"],
)

In [None]:
%%time
with rasterio.Env(session, AWS_VIRTUAL_HOSTING=False):
    ds.B04.isel(time=-1).plot()

To apply mosaicking, we need to assign `groupby="solar_day"`, as shown in the [documentation of `odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html).

In [None]:
%%time
mlds = store.open_data(
    data_id="SENTINEL-2",
    data_type="mldataset",
    bbox=bbox,
    time_range=time_range,
    processing_level="L2A",
    processing_baseline="5.00",
    crs="EPSG:4326",
    groupby="solar_day",
)

We again select the desired level and plot the same example as before. 

In [None]:
%%time
ds = mlds.get_dataset(4)
ds

In [None]:
%%time
with rasterio.Env(session, AWS_VIRTUAL_HOSTING=False):
    ds.B04.isel(time=1).plot()

----
## Data store in the single-tile
For completeness, we initiate the data store in the single-tile mode and open data of one tile. 

In [None]:
%%time
store = new_data_store("stac-cdse", stack_mode=False, **credentials)

The data IDs point to a [STAC item's JSON](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md) and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example.

In [None]:
%%time
data_ids = store.get_data_ids()
list(itertools.islice(data_ids, 10))

In the next step, we can search for items using search parameters. The following code shows which search parameters are available.

In [None]:
%%time
search_params = store.get_search_params_schema()
search_params

 Note that since only the data access for Sentinel-2 data is supported so far, we will search for tiles of Sentinel-2 data.

In [None]:
%%time
descriptors = list(
    store.search_data(
        collections=["SENTINEL-2"],
        bbox=[9, 47, 10, 48],
        time_range=["2020-07-01", "2020-07-05"],
        processing_level="L2A",
        processing_baseline="2.14",   
    )
)
[d.to_dict() for d in descriptors]

In the next step, we can open the data for each data ID. The following code shows which parameters are available for opening the data.

In [None]:
%%time
open_params = store.get_open_data_params_schema()
open_params

We select the band B04 (red), B03 (green), and B02 (blue) and lazily load the corresponding data.

In [None]:
%%time
ds = store.open_data(
    descriptors[3].data_id,
    bands=["B04", "B03", "B02"]
)
ds

We plot the loaded data as an example below.

In [None]:
%%time
ds.B04[:5000, :5000].plot()

We can also open a `.jp2` as a [xcube's multi-resolution  dataset](https://xcube.readthedocs.io/en/latest/mldatasets.html#xcube-multi-resolution-datasets), where we can select the level of resolution, shown below.  

In [None]:
%%time
mlds = store.open_data(
    descriptors[3].data_id,
    bands=["B04", "B03", "B02"],
    data_type="mldataset"
)
mlds.num_levels

In [None]:
ds = mlds.get_dataset(4)
ds

In [None]:
%%time
ds.B04[:312, :312].plot(vmin=0.0, vmax=0.2)