# Read Replay Ocean (MOM6) Data, Store to Zarr

Example using UFS2ARCO.MOM6Dataset to read data from a single cycle of the replay dataset.

In [1]:
from os.path import join

import xarray as xr
from datetime import datetime

In [2]:
import sys
sys.path.append("../src")
from UFS2ARCO import MOM6Dataset

## Setup path to read from

In this case, read replay data from the s3 bucket.
Right now, `FV3Dataset` requires a `path_in` callable with 3 entries, so that it can build file paths for a given date (denoting a DA cycle), forecast hours to grab, and file prefixes.

In [3]:
def ocean_path(date: datetime, forecast_hours: list, file_prefixes: list):

    from datetime import timedelta
    upper = "s3://noaa-ufs-gefsv13replay-pds/1deg"
    this_dir = f"{date.year:04d}/{date.month:02d}/{date.year:04d}{date.month:02d}{date.day:02d}{date.hour:02d}"
    files = []
    for fp in file_prefixes:
        for fhr in forecast_hours:
            this_date = date + timedelta(hours=fhr)
            files.append(
                    f"{fp}{this_date.year:04d}_{this_date.month:02d}_{this_date.day:02d}_{this_date.hour:02d}.nc")
    return [join(upper, this_dir, this_file) for this_file in files]

For instance to grab the `ocn_` files at forecast hour `fhr00` 
from the DA cycle at 00:00 Jan 1, 1994, this would be used as follows:

In [4]:
cycle = datetime(1994,1,1,0)
ocean_path(cycle, [0], ["ocn_"])

['s3://noaa-ufs-gefsv13replay-pds/1deg/1994/01/1994010100/ocn_1994_01_01_00.nc']

However, reading from s3 is super slow with `xarray.open_dataset`.
Luckily we can tell fsspec to cache the files locally before opening, this is denoted by
prepending the names as follows.

See discussion [here](https://discourse.pangeo.io/t/reading-goes-r-s3-netcdfs-from-an-aws-ec2-instance-is-it-possible-to-get-faster-speeds-than-from-my-local-machine/2440/13)
for more info.

In [5]:
def cached_path(date: datetime, forecast_hours: list, file_prefixes: list):
    return [f"simplecache::{u}" for u in ocean_path(date, forecast_hours, file_prefixes)]

In [6]:
cycle = datetime(1994,1,1,0)
cached_path(cycle, [0], ["ocn_"])

['simplecache::s3://noaa-ufs-gefsv13replay-pds/1deg/1994/01/1994010100/ocn_1994_01_01_00.nc']

## Use MOM6Dataset

Some of that is wrapped up under the hood in `MOM6Dataset`, we just need to give two inputs:
1. The filename mapping as defined above
2. A configuration yaml file

In [7]:
reader = MOM6Dataset(path_in=cached_path, config_filename="../scripts/config-replay.yaml")

We'll use this reader to open a dataset, note that the `fsspec_kwargs` is not necessary for local file reads, but necessary for reading from buckets like s3

In [8]:
%%time
ds = reader.open_dataset(cycle, fsspec_kwargs={"s3":{"anon":True}}, engine="h5netcdf")
ds

CPU times: user 2.01 s, sys: 699 ms, total: 2.71 s
Wall time: 14.7 s


  xds["time"] = xr.DataArray(
  xds["ftime"] = xr.DataArray(


Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 2.20 MiB Shape (1, 75, 320, 360) (1, 5, 320, 360) Dask graph 15 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 2.20 MiB Shape (1, 75, 320, 360) (1, 5, 320, 360) Dask graph 15 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 2.20 MiB Shape (1, 75, 320, 360) (1, 5, 320, 360) Dask graph 15 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 2.20 MiB Shape (1, 75, 320, 360) (1, 5, 320, 360) Dask graph 15 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,3.96 MiB
Shape,"(1, 75, 320, 360)","(1, 9, 320, 360)"
Dask graph,9 chunks in 2 graph layers,9 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 3.96 MiB Shape (1, 75, 320, 360) (1, 9, 320, 360) Dask graph 9 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,3.96 MiB
Shape,"(1, 75, 320, 360)","(1, 9, 320, 360)"
Dask graph,9 chunks in 2 graph layers,9 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 450.00 kiB Shape (1, 320, 360) (1, 320, 360) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,450.00 kiB
Shape,"(1, 320, 360)","(1, 320, 360)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type timedelta64[ns] numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(1, 2)","(1, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 16 B 16 B Shape (1, 2) (1, 2) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",2  1,

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(1, 2)","(1, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray


xarray gives a nice view of the data, we can look at a single variable to see how the data are laid out

In [9]:
ds["temp"]

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 2.20 MiB Shape (1, 75, 320, 360) (1, 5, 320, 360) Dask graph 15 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,2.20 MiB
Shape,"(1, 75, 320, 360)","(1, 5, 320, 360)"
Dask graph,15 chunks in 2 graph layers,15 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Storing the dataset

The dataset has more variables than we wanted, but these are trimmed when we call `reader.store_dataset`.

Any arguments beyond the dataset are passed to `xarray.to_zarr`.
This is useful for appending multiple datasets in time (see e.g., ../scripts/read_from_s3.py)

By default, dask (which we're accessing through xarray under the hood) stores data to zarr using the chunksizes that are being used currently by dask (i.e., how the data are viewed in memory, currently).
However, it's more efficient to make zarr stores with smaller chunks, which we can then inflate later.
These smaller chunks are specified in our configuration yaml via chunks_out, and are used in the zarr store.

In [10]:
reader.chunks_out

{'time': 1, 'z_l': 5, 'z_i': 5, 'yh': 30, 'xh': 30, 'yq': 30, 'xq': 30}

In [11]:
reader.store_dataset(ds, mode="w")

Stored dataset at replay-1deg/forecast/mom6.zarr


In [14]:
xds = xr.open_zarr("replay-1deg/forecast/mom6.zarr/")

In [15]:
xds

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type timedelta64[ns] numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,3.52 kiB
Shape,"(1, 320, 360)","(1, 30, 30)"
Dask graph,132 chunks in 2 graph layers,132 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 450.00 kiB 3.52 kiB Shape (1, 320, 360) (1, 30, 30) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray",360  320  1,

Unnamed: 0,Array,Chunk
Bytes,450.00 kiB,3.52 kiB
Shape,"(1, 320, 360)","(1, 30, 30)"
Dask graph,132 chunks in 2 graph layers,132 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 17.58 kiB Shape (1, 75, 320, 360) (1, 5, 30, 30) Dask graph 1980 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 17.58 kiB Shape (1, 75, 320, 360) (1, 5, 30, 30) Dask graph 1980 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 17.58 kiB Shape (1, 75, 320, 360) (1, 5, 30, 30) Dask graph 1980 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 360, 320)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 17.58 kiB Shape (1, 75, 360, 320) (1, 5, 30, 30) Dask graph 1980 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  320  360  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 360, 320)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We can see that this only has the data variables we asked for in the config yaml file.

In [16]:
xds.temp

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 17.58 kiB Shape (1, 75, 320, 360) (1, 5, 30, 30) Dask graph 1980 chunks in 2 graph layers Data type float32 numpy.ndarray",1  1  360  320  75,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,17.58 kiB
Shape,"(1, 75, 320, 360)","(1, 5, 30, 30)"
Dask graph,1980 chunks in 2 graph layers,1980 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type timedelta64[ns] numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray


And we can see this has a different chunking scheme than the original dataset, it has what we asked for in the config yaml.

In [17]:
reader.chunks_out

{'time': 1, 'z_l': 5, 'z_i': 5, 'yh': 30, 'xh': 30, 'yq': 30, 'xq': 30}