# Exploración del dataset

En este notebook se carga un fichero del dataset de análisis. Este dataset está disponible en https://esg1.umr-cnrm.fr/thredds/catalog/esgcet/27/CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.AERhr.tas.gr.v20181206.html#CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.AERhr.tas.gr.v20181206.

En este primer caso, estamos acciendo únicamente a uno de los ficheros NetCDF que componen el dataset. Accedemos haciendo uso del protocolo DAP mediante la librería NetCDF4-python, mediante el parámetro engine de `open_dataset`.

In [17]:
import xarray
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, ProgressBar

In [18]:
with ProgressBar():
    name = 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_201001010030-201412312330.nc'
    ds = xarray.open_dataset(name, engine='netcdf4', decode_cf=True)

In [16]:
ds

<xarray.Dataset>
Dimensions:      (axis_nbounds: 2, lat: 128, lon: 256, time: 43824)
Coordinates:
  * lat          (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * lon          (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
    height       float64 ...
  * time         (time) datetime64[ns] 2010-01-01T00:30:00 ... 2014-12-31T23:30:00
Dimensions without coordinates: axis_nbounds
Data variables:
    time_bounds  (time, axis_nbounds) datetime64[ns] ...
    tas          (time, lat, lon) float32 ...
Attributes:
    Conventions:                     CF-1.7 CMIP-6.2
    creation_date:                   2018-10-27T01:03:09Z
    description:                     CMIP6 historical
    title:                           CNRM-ESM2-1 model output prepared for CM...
    activity_id:                     CMIP
    contact:                         contact.cmip@meteo.fr
    data_specs_version:              01.00.21
    dr2xml_version:                  1.13
    experiment_id

Xarray nos permite acceder a todos los ficheros que conforman el dataset y verlo como un único fichero, haciendo uso de la función `open_mfdataset`.

In [4]:
base = 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_'
ranges = ["185001010030-185412312330","185501010030-185912312330","186001010030-186412312330","186501010030-186912312330","187001010030-187412312330","187501010030-187912312330","188001010030-188412312330","188501010030-188912312330","189001010030-189412312330","189501010030-189912312330","190001010030-190412312330","190501010030-190912312330","191001010030-191412312330","191501010030-191912312330","192001010030-192412312330","192501010030-192912312330","193001010030-193412312330","193501010030-193912312330","194001010030-194412312330","194501010030-194912312330","195001010030-195412312330","195501010030-195912312330","196001010030-196412312330","196501010030-196912312330","197001010030-197412312330","197501010030-197912312330","198001010030-198412312330","198501010030-198912312330","199001010030-199412312330","199501010030-199912312330","200001010030-200412312330","200501010030-200912312330","201001010030-201412312330"]
urls = []

for period in ranges:
    urls.append(base + period + '.nc')

In [10]:
urls

['https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_201001010030-201412312330.nc',
 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_200501010030-200912312330.nc',
 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_200001010030-200412312330.nc',
 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_199501010030-199912312330.nc',
 'https://esg1.umr-cnrm.fr/thredds/dodsC/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/AERhr/tas/gr/v20181206/tas_AERhr_CNRM-ESM2-1_historical_r1i1p1f2_gr_199001010030-199412312330.

In [13]:
import dask # every period will be threated as a dask chunk

# esto tarda un huevazo, sería bueno mostrar lo rapido que es zarr ¿y hsds?
# empezar a poner timers en todos los sitios
# despues de 10 min da error...
with
ds = xarray.open_mfdataset(urls[0:2], parallel=True)
ds

RuntimeError: NetCDF: Access failure