## Tutorial notebook on how to use the CLMS Data Store and its `preload_data` method for `EEA pre-packaged` datasets from the `CLMS API`

This notebook shows the preloading of pre-packaged EEA datasets from the CLMS API that is then converted to `.zarr` for efficient processing.

### Setup
In order to run this notebook, you need to install [`xcube_clms`](https://github.com/xcube-dev/xcube-clms). You can install it following either of the steps below:

1. via `conda-forge`
```bash
conda env create -f environment.yml
conda activate xcube-clms
conda install xcube-clms
```

2. via Development mode

```bash
git clone https://github.com/xcube-dev/xcube-clms.git
cd xcube-clms
conda env create -f environment.yml
conda activate xcube-clms
pip install .
```


Note that [`xcube_clms`](https://github.com/xcube-dev/xcube-clms) is a plugin of [`xcube`](https://xcube.readthedocs.io/en/latest/), where `xcube` is included in the `environment.yml`.  

You also need the credentials from the Land Monitoring service. Please follows the steps outlined [`here`](https://eea.github.io/clms-api-docs/authentication.html) to download your credentials and place them in the same directory as this notebook.

In [1]:
%%time
import json

from xcube.core.store import new_data_store

CPU times: user 4.01 s, sys: 262 ms, total: 4.27 s
Wall time: 1.46 s


To get the credentials.json, please follow the steps outlined [here](https://eea.github.io/clms-api-docs/authentication.html)

In [2]:
%%time
json_file_path = "credentials.json"
with open(json_file_path, "r") as j:
    credentials = json.loads(j.read())

CPU times: user 1.32 ms, sys: 27 μs, total: 1.35 ms
Wall time: 10 ms


When the user creates a new CLMS data store, it already sends requests to the CLMS API to get the catalog information which takes around 15-20 seconds. If no path is provided for the cache location, it will create a `clms_cache/` in the current directory to store all the downloaded data and initialize a local file store

In [3]:
%%time
clms_data_store = new_data_store("clms", credentials=credentials)

CPU times: user 53.7 ms, sys: 11.9 ms, total: 65.6 ms
Wall time: 65.2 ms


The `Cache Store` within the CLMS data store is another data store which can be any user-defined data store that is used by the preload handle. It defaults to the `file` store. Use `cache_id` and `cache_params` to provide information about the data store you would like to use for caching the preloaded data 

In [4]:
%%time
clms_data_store.list_data_ids()[:5]

CPU times: user 732 ms, sys: 93.7 ms, total: 826 ms
Wall time: 1min 30s


['clc-backbone-2021|CLMS_CLCplus_RASTER_2021',
 'clc-backbone-2018|CLMS_CLCplus_RASTER_2018_010m_eu_03035_V1_1',
 'forest-type-2015|FTY_2015_100m_eu_03035_d02_Full',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_Full',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E00N20']

In [5]:
%%time
clms_data_store.list_data_ids(include_attrs=True)[:5]

CPU times: user 248 ms, sys: 5.07 ms, total: 253 ms
Wall time: 252 ms


[('clc-backbone-2021|CLMS_CLCplus_RASTER_2021',
  {'@id': 'b813d203-d09b-4663-95f7-65dc6d53789e',
   'area': 'Europe',
   'file': 'CLMS_CLCplus_RASTER_2021',
   'format': 'Geotiff',
   'path': 'H:\\Corine_Land_Cover_Backbone\\Corine_Land_Cover_Backbone_CLCBB_2021\\CLC_BB_2021\\Data\\data-details\\raster\\CLMS_CLCplus_RASTER_2021.zip',
   'resolution': '10 m',
   'size': '7 GB',
   'source': 'EEA',
   'title': '',
   'type': 'Raster',
   'version': 'V1_1',
   'year': ''}),
 ('clc-backbone-2018|CLMS_CLCplus_RASTER_2018_010m_eu_03035_V1_1',
  {'@id': 'c3290e93-1463-4bed-93c9-6315b0059048',
   'area': 'Europe',
   'file': 'CLMS_CLCplus_RASTER_2018_010m_eu_03035_V1_1',
   'format': 'Geotiff',
   'path': 'H:\\Corine_Land_Cover_Backbone\\Corine_Land_Cover_Backbone_CLCBB_2018\\CLC_BB_2018\\Data\\data-details\\raster\\CLMS_CLCplus_RASTER_2018_010m_eu_03035_V1_1.zip',
   'resolution': '10 m',
   'size': '9 GB',
   'source': 'EEA',
   'title': '',
   'type': 'Raster',
   'version': 'V1_1',
   'ye

In [6]:
%%time
clms_data_store.cache_store.root

CPU times: user 6 μs, sys: 1 μs, total: 7 μs
Wall time: 8.34 μs


'/home/yogesh/Projects/BC/xcube-clms/examples/notebooks/clms_cache'

In [7]:
%%time
clms_data_store.get_data_store_params_schema()

CPU times: user 41 μs, sys: 4 μs, total: 45 μs
Wall time: 48.6 μs


<xcube.util.jsonschema.JsonObjectSchema at 0x7cdc67de1c50>

With the following command, the users can see how the data_ids look like for the CLMS store. Due to the nature of the API, we came up with the computed data_id for some products containing several datasets within it that combines the product_id and item_id along with the `|` separator. But as a user, you dont have to worry about how it is created, but just that you need to use the complete data_id for interacting with any of the methods of this store

In [8]:
%%time
clms_data_store.list_data_ids()[:20]

CPU times: user 290 ms, sys: 2.8 ms, total: 293 ms
Wall time: 292 ms


['clc-backbone-2021|CLMS_CLCplus_RASTER_2021',
 'clc-backbone-2018|CLMS_CLCplus_RASTER_2018_010m_eu_03035_V1_1',
 'forest-type-2015|FTY_2015_100m_eu_03035_d02_Full',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_Full',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E00N20',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E10N00',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E10N10',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E10N20',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E20N10',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E20N20',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E20N30',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E20N40',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E20N50',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E30N10',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E30N20',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E30N30',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E30N40',
 'forest-type-2015|FTY_2015_020m_eu_03035_d04_E30N50',
 'forest-ty

In [9]:
%%time
clms_data_store.get_data_opener_ids()

CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 8.11 μs


('dataset:zarr:file', 'dataset:netcdf:https')

In [10]:
%%time
clms_data_store.get_data_types()

CPU times: user 7 μs, sys: 1e+03 ns, total: 8 μs
Wall time: 10.5 μs


('dataset',)

In [11]:
%%time
clms_data_store.get_open_data_params_schema()

CPU times: user 7.13 ms, sys: 917 μs, total: 8.05 ms
Wall time: 7.16 ms


<xcube.util.jsonschema.JsonObjectSchema at 0x7cdc66e97150>

In [12]:
%%time
clms_data_store.get_preload_data_params_schema()

CPU times: user 0 ns, sys: 58 μs, total: 58 μs
Wall time: 65.1 μs


<xcube.util.jsonschema.JsonObjectSchema at 0x7cdc66e96c10>

In [5]:
%%time
dataset_to_preload = (
    # "tree-cover-density-2015|TCD_2015_100m_eu_03035_d04_Full",
    # "forest-type-2015|FTY_2015_100m_eu_03035_d02_Full",
    "clc-backbone-2021|CLMS_CLCplus_RASTER_2021",
)

CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 5.48 μs


When the user runs the `preload_data` method, it returns a new cache data store with which the user can view its status using table provided via the returned `.preload_handle`, which indicates the status of the download request along with its progress, messages and exceptions, if any. It can be run in both blocking and non-blocking manner. The user can silent the progress using the silent flag. The user can also choose to cleanup the downloads.

In [6]:
%%time
cache_data_store = clms_data_store.preload_data(
    *dataset_to_preload,
    blocking=True,  # Defaults to True
    cleanup=False,  # Defaults to True,
    silent=False,  # Defaults to False
    tile_size=2000,  # Defaults to 2000. You can pass a single integer, or a tuple of (int, int)
)

Data ID,Status,Progress,Message,Exception
clc-backbone-2021|CLMS_CLCplus_RASTER_2021,FAILED,80%,Task ID 47889426261: Extraction complete. Processing now...,This naming format is not supported. Currently only filenames with Eastings and Northings are supported.


CPU times: user 482 ms, sys: 13.2 ms, total: 495 ms
Wall time: 5.43 s


In [19]:
%%time
handle = cache_data_store.preload_handle
handle

CPU times: user 21 μs, sys: 0 ns, total: 21 μs
Wall time: 26.2 μs


Data ID,Status,Progress,Message,Exception
clc-backbone-2021|CLMS_CLCplus_RASTER_2021,FAILED,40%,Task ID 47889426261: Download link created. Downloading and extracting now...,Response ended prematurely


Now that the preload is completed, the user can open the data using open_data as usual which uses the file data store underneath at the cache location

In [22]:
%%time
cache_data_store.list_data_ids()

CPU times: user 1.58 ms, sys: 549 μs, total: 2.13 ms
Wall time: 1.25 ms


['forest-type-2015|FTY_2015_100m_eu_03035_d02_Full.zarr',
 'tree-cover-density-2015|TCD_2015_100m_eu_03035_d04_Full.zarr']

In [None]:
cache_data_store.get_open_data_params_schema()

In [None]:
cache_data_store.get_open_data_params_schema(
    "forest-type-2015|FTY_2015_100m_eu_03035_d02_Full.zarr"
)

In [None]:
%%time
fty = cache_data_store.open_data(
    "forest-type-2015|FTY_2015_100m_eu_03035_d02_Full.zarr"
)
fty

In [None]:
%%time
tcd = cache_data_store.open_data(
    "tree-cover-density-2015|TCD_2015_100m_eu_03035_d04_Full.zarr"
)
tcd

In [None]:
%%time
downsampled_cube = fty.isel(
    x=slice(None, None, 100),
    y=slice(None, None, 100),
)
downsampled_cube.FTY_2015_100m_eu_03035_d02_Full.plot(vmin=0, vmax=100)

In [None]:
%%time
downsampled_cube = tcd.isel(
    x=slice(None, None, 100),
    y=slice(None, None, 100),
)
downsampled_cube.TCD_2015_100m_eu_03035_d04_Full.plot(vmin=0, vmax=100)

Execute the following command to terminate the preload job and trigger the cleanup process to clean the `downloads` directory. This step is only necessary if the preload job was originally run with `cleanup=False`, as downloads will not be removed automatically in that case. If `cleanup=True` was used (which is the default), the cleanup is performed automatically upon completion of the preload job.

In [None]:
handle.close()
handle