# Kvikio demo

Requires
- [ ] https://github.com/pydata/xarray/pull/10078
- [ ] https://github.com/rapidsai/kvikio/pull/646

In [1]:
%load_ext watermark
%xmode minimal

import cupy_xarray  # registers cupy accessor
import kvikio.zarr

import numpy as np
import xarray as xr
import zarr

%watermark -iv

Exception reporting mode: Minimal
numpy      : 2.2.3
zarr       : 3.0.5
cupy_xarray: 0.1.4+36.ge26ed24.dirty
kvikio     : 25.4.0
xarray     : 2025.1.3.dev22+g0184702f



In [2]:
xr.backends.list_engines()

{'netcdf4': <NetCDF4BackendEntrypoint>
   Open netCDF (.nc, .nc4 and .cdf) and most HDF5 files using netCDF4 in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.NetCDF4BackendEntrypoint.html,
 'kvikio': <KvikioBackendEntrypoint>
   Open zarr files (.zarr) using Kvikio
   Learn more at https://docs.rapids.ai/api/kvikio/stable/api/#zarr,
 'store': <StoreBackendEntrypoint>
   Open AbstractDataStore instances in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.StoreBackendEntrypoint.html,
 'zarr': <ZarrBackendEntrypoint>
   Open zarr files (.zarr) using zarr in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.ZarrBackendEntrypoint.html}

## Create example dataset

- cannot be compressed

In [3]:
store = "/tmp/air-temperature.zarr"
airt = xr.tutorial.open_dataset("air_temperature", engine="netcdf4")
for var in airt.variables:
    airt[var].encoding["compressors"] = None
airt["scalar"] = 12.0
airt.to_zarr(store, mode="w", zarr_format=3, consolidated=False)

  return to_zarr(  # type: ignore[call-overload,misc]


<xarray.backends.zarr.ZarrStore at 0x7f44a21e57e0>

## Test opening

### Standard usage

In [4]:
ds_cpu = xr.open_dataset(store, engine="zarr")
print(ds_cpu.air.data.__class__)
ds_cpu.air

<class 'numpy.ndarray'>


1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  ds_cpu = xr.open_dataset(store, engine="zarr")


### Now with kvikio!

 - must read with `consolidated=False` (https://github.com/rapidsai/kvikio/issues/119)
 - dask.from_zarr to GDSStore / open_mfdataset

In [5]:
# Consolidated must be False
ds = xr.open_dataset(store, engine="kvikio", consolidated=False)
print(ds.air._variable._data)
ds

MemoryCachedArray(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f449f3ed980>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _scale_offset_decoding at 0x7f44a35896c0>, scale_factor=0.01, add_offset=None, dtype=<class 'numpy.float64'>), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None))))))


In [6]:
ds.scalar

## Lazy reading

In [7]:
ds.air

## Data load for repr

In [8]:
ds["air"].isel(time=0, lat=10).load()

In [9]:
ds.scalar

In [10]:
ds.air

## CuPy array on load

Configure Zarr to use GPU memory by setting `zarr.config.enable_gpu()`.

See https://zarr.readthedocs.io/en/stable/user-guide/gpu.html#using-gpus-with-zarr

In [11]:
ds["air"].isel(time=0, lat=10).variable._data

MemoryCachedArray(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f449f3ed980>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _scale_offset_decoding at 0x7f44a35896c0>, scale_factor=0.01, add_offset=None, dtype=<class 'numpy.float64'>), dtype=dtype('float64')), key=BasicIndexer((0, 10, slice(None, None, None))))))

In [12]:
with zarr.config.enable_gpu():
    print(type(ds["air"].isel(time=0, lat=10).load().data))

<class 'cupy.ndarray'>


## Load to host

In [13]:
zarr.config.enable_gpu()

<donfig.config_obj.ConfigSet at 0x7f449e250d50>

In [14]:
ds.air

In [15]:
print(type(ds["air"].data))

<class 'cupy.ndarray'>


In [16]:
type(ds.air.as_numpy().data)

numpy.ndarray

In [17]:
type(ds.air.mean("time").load().data)

cupy.ndarray

## Doesn't work: Chunk with dask

`meta` is wrong

In [18]:
ds.chunk(time=10).air

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,103.52 kiB
Shape,"(2920, 25, 53)","(10, 25, 53)"
Dask graph,292 chunks in 2 graph layers,292 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 29.52 MiB 103.52 kiB Shape (2920, 25, 53) (10, 25, 53) Dask graph 292 chunks in 2 graph layers Data type float64 numpy.ndarray",53  25  2920,

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,103.52 kiB
Shape,"(2920, 25, 53)","(10, 25, 53)"
Dask graph,292 chunks in 2 graph layers,292 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


`dask.array.core.getter` calls `np.asarray` on each chunk.

This calls `ImplicitToExplicitIndexingAdapter.__array__` which calls `np.asarray(cupy.array)` which raises.

Xarray uses `.get_duck_array` internally to remove these adapters. We might need to add
```python
# handle xarray internal classes that might wrap cupy
if hasattr(c, "get_duck_array"):
    c = c.get_duck_array()
else:
    c = np.asarray(c)
```

In [19]:
from dask.utils import is_arraylike

data = ds.air.variable._data
is_arraylike(data)

False

In [20]:
from xarray.core.indexing import ImplicitToExplicitIndexingAdapter

In [21]:
ImplicitToExplicitIndexingAdapter(data).get_duck_array()

array([[[241.2 , 242.5 , 243.5 , ..., 232.8 , 235.5 , 238.6 ],
        [243.8 , 244.5 , 244.7 , ..., 232.8 , 235.3 , 239.3 ],
        [250.  , 249.8 , 248.89, ..., 233.2 , 236.39, 241.7 ],
        ...,
        [296.6 , 296.2 , 296.4 , ..., 295.4 , 295.1 , 294.7 ],
        [295.9 , 296.2 , 296.79, ..., 295.9 , 295.9 , 295.2 ],
        [296.29, 296.79, 297.1 , ..., 296.9 , 296.79, 296.6 ]],

       [[242.1 , 242.7 , 243.1 , ..., 232.  , 233.6 , 235.8 ],
        [243.6 , 244.1 , 244.2 , ..., 231.  , 232.5 , 235.7 ],
        [253.2 , 252.89, 252.1 , ..., 230.8 , 233.39, 238.5 ],
        ...,
        [296.4 , 295.9 , 296.2 , ..., 295.4 , 295.1 , 294.79],
        [296.2 , 296.7 , 296.79, ..., 295.6 , 295.5 , 295.1 ],
        [296.29, 297.2 , 297.4 , ..., 296.4 , 296.4 , 296.6 ]],

       [[242.3 , 242.2 , 242.3 , ..., 234.3 , 236.1 , 238.7 ],
        [244.6 , 244.39, 244.  , ..., 230.3 , 232.  , 235.7 ],
        [256.2 , 255.5 , 254.2 , ..., 231.2 , 233.2 , 238.2 ],
        ...,
        [295

In [22]:
ds.chunk(time=10).air.compute()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

### explicit meta

In [23]:
import cupy as cp

chunked = ds.chunk(time=10, from_array_kwargs={"meta": cp.array([])})
chunked.air

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,103.52 kiB
Shape,"(2920, 25, 53)","(10, 25, 53)"
Dask graph,292 chunks in 2 graph layers,292 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 29.52 MiB 103.52 kiB Shape (2920, 25, 53) (10, 25, 53) Dask graph 292 chunks in 2 graph layers Data type float64 numpy.ndarray",53  25  2920,

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,103.52 kiB
Shape,"(2920, 25, 53)","(10, 25, 53)"
Dask graph,292 chunks in 2 graph layers,292 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [24]:
chunked.compute()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.