# What is behind auto-kerchunk?

Auto-kerchunk was developed in 2021, following [this blog post example of applying kerchunk to NetCDF4 datas stored in S3](https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685), by adapting it to an HPC environment and automating the creation of a kerchunk catalog of many existing datasets in an HPC datalake.

This notebook explains what is behind auto-kerchunk by manually proceeding with the creation of a kerchunk catalog (method 1), but it also indicates the recent update from the kerchunk capability 'auto_dask' (methods 2 and 3).

auto-kerchunk was created to 
- convert multiple NetCDF files to kerchunk catalogue, 
- make an intake catalogue to be able to submit a job to PBS scheduler, 
- auto-mate that using bash script and submit that to PBS scheduler.  

Please refer to [this documentation](https://pangeo-data.github.io/clivar-2022/pangeo101/chunking_introduction.html) to understand what kerchunk is (as well as what zarr and 'chunk' mean).


## We will use [MARC](https://marc.ifremer.fr) NetCDF4 datasets as an example.

First, we will list all the files we will use as 'file_pattern'.


In [2]:
import glob

# dir_url = 'https://data-dataref.ifremer.fr/marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/'
# file_pattern = 'MARC_F1-MARS3D-SEINE_*Z.nc'
dir_url = "/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/"
file_pattern = "MARC_F1-MARS3D-SEINE_20150101T*Z.nc"
file_paths = glob.glob(dir_url + file_pattern)
file_paths

['/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T1900Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T0400Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T1600Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T0100Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T2000Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T1100Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T2300Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_20150101T2200Z.nc',
 '/home/ref-marc/f1_e2500_agrif/MARC_F1-MARS3D-SEINE/best_estimate/2015/MARC_F1-MARS3D-SEINE_201

## Starting Dask cluster on HPC

Please refer to the [Dask-hpcconfig datarmor example Jupyter notebook](https://github.com/umr-lops/dask-hpcconfig/tree/main/docs/tutorials) to understand what the next three cells mean.  

In [None]:
import dask_hpcconfig
from distributed import Client

In [None]:
overrides = {}  # ,"cluster.c": n_worker_per_node }

cluster = dask_hpcconfig.cluster("datarmor-local", **overrides)
client = Client(cluster)
client

## Converting Multiple NetCDF Files to a Kerchunk Catalogue

In this notebook, we will show three ways to convert multiple NetCDF files to one kerchunk catalogue.


### Method 1

We use `kerchunk.hdf.SingleHdf5ToZarr` with `dask.bag` to convert each NetCDF file to kerchunk catalogs, and then concatenate them with `kerchunk.combine.MultiZarrToZarr` to create a single kerchunk catalog. This workflow is used in the first version of auto-kerchunk.


In [None]:
%%time
import dask.bag as db
import fsspec
from kerchunk.hdf import SingleHdf5ToZarr


def translate_dask(file):
    url = "file://" + file
    print("working on ", file)
    with fsspec.open(url) as inf:
        h5chunks = SingleHdf5ToZarr(inf, url, inline_threshold=100)
        return h5chunks.translate()


b = db.from_sequence(file_paths)
result_indask = b.map(translate_dask)
result = result_indask.compute()

from kerchunk.combine import MultiZarrToZarr

mzz = MultiZarrToZarr(
    result,
    concat_dims=["time"],
)
a = mzz.translate()

### Method 2

We use `kerchunk.combine.auto_dask`  instead of `kerchunk.combine.MultiZarrToZarr`  as described above.


In [None]:
%%time
from kerchunk.combine import auto_dask
from kerchunk.hdf import SingleHdf5ToZarr


class PassThrough:
    def __init__(self, refs):
        self.refs = refs

    def translate(self):
        return self.refs


def translate_dask(file):
    url = "file://" + file
    print("working on ", file)
    with fsspec.open(url) as inf:
        h5chunks = SingleHdf5ToZarr(inf, url, inline_threshold=100)
        return h5chunks.translate()


b = db.from_sequence(file_paths)
result_indask = b.map(translate_dask)
result = result_indask.compute()

b = auto_dask(
    result,
    single_driver=PassThrough,
    # single_driver=JustLoad,
    single_kwargs={},  # "storage_options": {"anon": False}},
    mzz_kwargs={"concat_dims": ["time"]},
    n_batches=3,
)

### Method 3. 

We use `kerchunk.combine.auto_dask`  with `kerchunk.hdf.SingleHdf5ToZarr`  

`kerchunk.combine.auto_dask`  convert each NetCDF files into kerchunk catalogue and concatenate them to one kerchunk catalogue all at once, in parallel using dask.   

In [9]:
%%time
import warnings

warnings.filterwarnings("ignore")

from kerchunk.combine import auto_dask  # , SingleHdf5ToZarr
from kerchunk.hdf import SingleHdf5ToZarr

c = auto_dask(
    file_paths,
    single_driver=SingleHdf5ToZarr,
    # single_driver=JustLoad,
    single_kwargs={},  # "storage_options": {"anon": False}},
    mzz_kwargs={"concat_dims": ["time"]},
    n_batches=3,
)

CPU times: user 2.17 s, sys: 88 ms, total: 2.26 s
Wall time: 2.25 s


## Loading data to Xarray using kerchunk



In [8]:
%%time
import xarray as xr

data = xr.open_dataset(
    "reference://",
    engine="zarr",
    backend_kwargs={
        "storage_options": {
            "fo": c,
        },
        "consolidated": False,
    },
    chunks={},
)
data

CPU times: user 48 ms, sys: 4 ms, total: 52 ms
Wall time: 58 ms


Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 13.84 MiB 590.46 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float64 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,13.84 MiB,590.46 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 4.00 MiB Shape (24, 40, 183, 413) (1, 28, 129, 290) Dask graph 192 chunks in 2 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 4.00 MiB Shape (24, 40, 183, 413) (1, 28, 129, 290) Dask graph 192 chunks in 2 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 4.00 MiB Shape (24, 40, 183, 413) (1, 28, 129, 290) Dask graph 192 chunks in 2 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 4.00 MiB Shape (24, 40, 183, 413) (1, 28, 129, 290) Dask graph 192 chunks in 2 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,4.00 MiB
Shape,"(24, 40, 183, 413)","(1, 28, 129, 290)"
Dask graph,192 chunks in 2 graph layers,192 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 2 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 2 graph layers,24 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


**Using kerchunk + `xarray.open_dataset` accomplishes the same thing as using `xarray.open_mfdataset`, but as you will see from the time it takes, `xarray.open_dataset` is much faster with kerchunk.**


In [14]:
%%time
import xarray as xr

xr.open_mfdataset(file_paths, engine="h5netcdf", concat_dim=["time"],combine='nested', parallel=True)

CPU times: user 8.43 s, sys: 820 ms, total: 9.25 s
Wall time: 8.59 s


Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 590.46 kiB 590.46 kiB Shape (183, 413) (183, 413) Dask graph 1 chunks in 115 graph layers Data type float64 numpy.ndarray",413  183,

Unnamed: 0,Array,Chunk
Bytes,590.46 kiB,590.46 kiB
Shape,"(183, 413)","(183, 413)"
Dask graph,1 chunks in 115 graph layers,1 chunks in 115 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.75 kiB,160 B
Shape,"(24, 40)","(1, 40)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.75 kiB 160 B Shape (24, 40) (1, 40) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",40  24,

Unnamed: 0,Array,Chunk
Bytes,3.75 kiB,160 B
Shape,"(24, 40)","(1, 40)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.75 kiB,160 B
Shape,"(24, 40)","(1, 40)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.75 kiB 160 B Shape (24, 40) (1, 40) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",40  24,

Unnamed: 0,Array,Chunk
Bytes,3.75 kiB,160 B
Shape,"(24, 40)","(1, 40)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 73 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 73 graph layers,24 chunks in 73 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 6.92 MiB 295.23 kiB Shape (24, 183, 413) (1, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",413  183  24,

Unnamed: 0,Array,Chunk
Bytes,6.92 MiB,295.23 kiB
Shape,"(24, 183, 413)","(1, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 11.53 MiB Shape (24, 40, 183, 413) (1, 40, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 11.53 MiB Shape (24, 40, 183, 413) (1, 40, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 11.53 MiB Shape (24, 40, 183, 413) (1, 40, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 276.78 MiB 11.53 MiB Shape (24, 40, 183, 413) (1, 40, 183, 413) Dask graph 24 chunks in 49 graph layers Data type float32 numpy.ndarray",24  1  413  183  40,

Unnamed: 0,Array,Chunk
Bytes,276.78 MiB,11.53 MiB
Shape,"(24, 40, 183, 413)","(1, 40, 183, 413)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Intake
above `xarray.open_dataset` is very long.  Th


In [None]:
%%time
data = data.TEMP.isel(level=10).mean(dim="nj").mean(dim="ni").compute()

In [None]:
data.plot()

## 

In [None]:
import ujson

jsonfile = "test.json.zstd"
storage_options = {"compression": "zstd"}
# with open(jsonfile, mode='w') as f :
#    json.dump(out, f)

with fsspec.open(jsonfile, mode="wt", **(storage_options or {})) as f:
    ujson.dump(a, f)

In [None]:
%%time
import xarray as xr

test = xr.open_dataset(
    "reference://",
    engine="zarr",
    backend_kwargs={
        "storage_options": {
            "fo": "file:///home1/datawork/todaka/git/auto-kerchunk/notebooks/test.json.zstd",
            "target_options": {"compression": "zstd"},
        },
        "consolidated": False,
    },
    chunks={"time": 1, "level": "auto", "ni": "auto", "nj": "auto"},
)
test

In [None]:
%%time
data = test.TEMP.isel(level=10).mean(dim="ni").mean(dim="nj").compute()

In [None]:
data.plot()