## ESA CCI Open Data Portal access using xcube

Tha aim of this notebook is to show how to query for all the data sets provided by the CCI Store and how to get the neccessary information to create a cube configuration for a data set

To run this Notebook, make sure the ESA CCI ODP / xcube Integration is setup correctly, see [Ex0-DCFS-Setup](./Ex0-DCFS-Setup.ipynb).

In [1]:
# xcube_cci imports
from xcube_cci.cciodp import CciOdp
from xcube_cci.config import CubeConfig
from xcube_cci.cube import open_cube

In [2]:
%matplotlib inline
import nest_asyncio
nest_asyncio.apply()

---
If you access the CCI Open Data Portal instance directly, you can get a list of available datasets.

In [3]:
cci_odp = CciOdp()
dataset_names = cci_odp.dataset_names
num_sets = len(dataset_names)
dataset_names

['esacci.OZONE.month.L3.NP.multi-sensor.multi-platform.MERGED.fv0002.r1',
 'esacci.CLOUD.month.L3C.CLD_PRODUCTS.MODIS.Aqua.MODIS_AQUA.2-0.r1',
 'esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.2-1.anomaly',
 'esacci.CLOUD.month.L3C.CLD_PRODUCTS.multi-sensor.multi-platform.AVHRR-AM.2-0.r1',
 'esacci.OC.day.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic',
 'esacci.OC.5-days.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic',
 'esacci.OC.8-days.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic',
 'esacci.OC.month.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.NOAA-15.AVHRR15_G.2-1.r1',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.Metop-A.AVHRRMTA_G.2-1.r1',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.NOAA-19.AVHRR19_G.2-1.r1',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.NOAA-17.AVHRR17_G.2-1.r1',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.NOAA-18.AVHRR18_G.2-1.r1',
 'esacci.SST.day.L3C.SSTskin.AVHRR-3.NOAA-16.AV

Also, you can get a list of variables for any dataset. Executing this cell takes some time.

In [4]:
for i, dataset_name in enumerate(dataset_names):
    var_names = cci_odp.var_names(dataset_name)
    print(f'First variable of dataset #{i + 1} of {num_sets}: {dataset_name} is "{var_names[0]}", last one is "{var_names[-1]}"')

First variable of dataset #1 of 64: esacci.OZONE.month.L3.NP.multi-sensor.multi-platform.MERGED.fv0002.r1 is "surface_pressure", last one is "O3e_ndens"
First variable of dataset #2 of 64: esacci.CLOUD.month.L3C.CLD_PRODUCTS.MODIS.Aqua.MODIS_AQUA.2-0.r1 is "nobs", last one is "hist1d_cla_vis008"
First variable of dataset #3 of 64: esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.2-1.anomaly is "analysed_sst", last one is "analysed_sst_uncertainty"
First variable of dataset #4 of 64: esacci.CLOUD.month.L3C.CLD_PRODUCTS.multi-sensor.multi-platform.AVHRR-AM.2-0.r1 is "nobs", last one is "hist1d_cla_vis008"
First variable of dataset #5 of 64: esacci.OC.day.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic is "MERIS_nobs_sum", last one is "total_nobs_sum"
First variable of dataset #6 of 64: esacci.OC.5-days.L3S.CHLOR_A.multi-sensor.multi-platform.MERGED.4-0.geographic is "MERIS_nobs_sum", last one is "total_nobs_sum"
First variable of dataset #7 of 64: esacci.OC.8-da

You can also ask for more information about a data set. This will include the information that is necessary to configure a cube.

In [5]:
data_info = cci_odp.get_dataset_info(dataset_names[0])
data_info

{'lat_res': 1.0,
 'lon_res': 1.0,
 'bbox': (-180.0, -90.0, 180.0, 90.0),
 'temporal_coverage_start': '2000-02-01T00:00:00',
 'temporal_coverage_end': '2014-12-31T23:59:59',
 'var_names': ['surface_pressure',
  'O3_du',
  'O3e_du',
  'O3_du_tot',
  'O3e_du_tot',
  'O3_vmr',
  'O3e_vmr',
  'O3_ndens',
  'O3e_ndens']}

'lat_res' and 'lon_res' refer to the spatial latitude and longitude resolution, respectively. These are fixed for the CCI Data Store. 'bbox' shows the extent of the dataset (for most, the extent will be global, i.e. (-180.0, -90.0, 180.0, 90.0)), 'temporal_coverage_start' and 'temporal_coverage_end' indicate the period for which data is available, 'var_names' finally gives you the names of the variables which your cube can consist of.
When configuring a cube, make sure that your geometry lies within the data set's bounding box and that your time_range is within the temporal coverage of the data set. 
You are currently not able to determine the spatial and temporal resolution of the cube. The properties of the data set will be used.

In [6]:
cube_config = CubeConfig(dataset_name=dataset_names[0],
                         variable_names=['surface_pressure'],
                         tile_size=(512, 512),
                         geometry=(5,35,25,55),
                         time_range=[
                             '2007-04-12',
                             '2007-07-20'
                         ])
cube = open_cube(cube_config)
cube

Unnamed: 0,Array,Chunk
Bytes,64 B,64 B
Shape,"(4, 2)","(4, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 64 B 64 B Shape (4, 2) (4, 2) Count 2 Tasks 1 Chunks Type datetime64[ns] numpy.ndarray",2  4,

Unnamed: 0,Array,Chunk
Bytes,64 B,64 B
Shape,"(4, 2)","(4, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.40 kB,1.60 kB
Shape,"(4, 20, 20)","(1, 20, 20)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 6.40 kB 1.60 kB Shape (4, 20, 20) (1, 20, 20) Count 5 Tasks 4 Chunks Type float32 numpy.ndarray",20  20  4,

Unnamed: 0,Array,Chunk
Bytes,6.40 kB,1.60 kB
Shape,"(4, 20, 20)","(1, 20, 20)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray


With the following cell, you can create a cube for a small piece of every data set of the CCI Store. Note that executing this will take some time.

In [None]:
from datetime import datetime

for i, dataset_name in enumerate(dataset_names):
    data_info = cci_odp.get_dataset_info(dataset_name)
    print(
        f'First variable of dataset #{i + 1} of {num_sets}: {dataset_name} is {data_info["var_names"][0]}, '
        f'last one is {data_info["var_names"][-1]}, lat_res is {data_info["lat_res"]}, '
        f'lon_res is {data_info["lon_res"]}, bbox is {data_info["bbox"]}, '
        f'starting at {data_info["temporal_coverage_start"]}, ending at {data_info["temporal_coverage_end"]}')
    starting_time = datetime.strptime(data_info["temporal_coverage_start"], "%Y-%m-%dT%H:%M:%S")
    ending_time = datetime.strptime(data_info["temporal_coverage_end"], "%Y-%m-%dT%H:%M:%S")
    year = starting_time.year + int((ending_time.year - starting_time.year) / 2)
    start_time = datetime(year, 5, 1)
    center_time = datetime(year, 6, 7)
    end_time = datetime(year, 7, 15)
    var_name = data_info["var_names"][0]
    cube_config = CubeConfig(dataset_name=dataset_name,
                             variable_names=[var_name],
                             tile_size=(512, 512),
                             geometry=(5,35,25,55),
                             time_range=[
                                 start_time.strftime("%Y-%m-%d"),
                                 end_time.strftime("%Y-%m-%d")
                             ])
    cube = open_cube(cube_config)
    cube[var_name].sel(time=center_time.strftime("%Y-%m-%d %H:%M:%S"), method='nearest').plot.imshow(cmap='Greys_r', figsize=(16, 10))                                   
    print('----------------------------------')

First variable of dataset #1 of 64: esacci.OZONE.month.L3.NP.multi-sensor.multi-platform.MERGED.fv0002.r1 is surface_pressure, last one is O3e_ndens, lat_res is 1.0, lon_res is 1.0, bbox is (-180.0, -90.0, 180.0, 90.0), starting at 2000-02-01T00:00:00, ending at 2014-12-31T23:59:59
----------------------------------
First variable of dataset #2 of 64: esacci.CLOUD.month.L3C.CLD_PRODUCTS.MODIS.Aqua.MODIS_AQUA.2-0.r1 is nobs, last one is hist1d_cla_vis008, lat_res is 0.5, lon_res is 0.5, bbox is (-180.0, -90.0, 180.0, 90.0), starting at 2000-02-01T00:00:00, ending at 2014-12-31T23:59:59
----------------------------------
First variable of dataset #3 of 64: esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.2-1.anomaly is analysed_sst, last one is analysed_sst_uncertainty, lat_res is 0.05, lon_res is 0.05, bbox is (-180.0, -90.0, 180.0, 90.0), starting at 2000-02-01T00:00:00, ending at 2014-12-31T23:59:59
----------------------------------
First variable of dataset #4 of 64: esa