Skip to content

Add an asynchronous load method? #10326

Open
@TomNicholas

Description

@TomNicholas

Is your feature request related to a problem?

Currently all xarray .load() calls are blocking, so the only way to concurrently load data for a bunch of different xarray objects is to use dask. This comes up when loading data from high-latency backends such as Zarr on remote object storage.

Describe the solution you'd like

But now that zarr v3 has async get methods, it should be possible to add an async version of the .load() method that could be used like this:

async def load_many_dataarrays_concurrently(dataarrays):
    tasks = [da.async_load() for da in dataarrays]
    results = await asyncio.gather(*tasks)
    return results

For N zarr stores pointing to remote object storage, each of which has a latency of ~1s, this code could take in theory only ~1s, whereas the blocking equivalent (i.e. return [da.load() for da in dataarrays]) would take at least ~N seconds.

(Note this suggestion is not the same as #8965, which is about concurrently loading multiple variables behind the scenes, rather than exposing an async interface to the user.)

The new method could be da.async_load(), or even use an accessor namespace like da.async.load().

To make this work we would need to add an async version of BackendArray.get_duck_array

def get_duck_array(self, dtype: np.typing.DTypeLike = None):

and plumb that down through to zarr's AsyncArray methods somehow.

Describe alternatives you've considered

Using dask is massive overhead and additional complexity. There may be some other way to do this that I'm not aware of.

Additional context

This is a desired-enough feature that other people have done it before in 3rd-party libraries, e.g. https://github.com/jeliashi/xarray-async. That particular implementation also targeted zarr, but predates the async get methods now available in zarr v3.

cc @dcherian @rabernat @jhamman @ianhi

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions