Group Backend Keyword Arguments #10422

kmuehlbauer · 2025-06-14T16:07:18Z

Closes Unconstrained forwarding of backend keyword arguments #10377
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

This is a first attempt and base for discussion.

This PR does the following:

split open_dataset kwargs into four groups:
Here I followed @shoyer's suggestion to use dataclasses Group together decoding options into a single argument #4490.

coder_opts: options for CF coders (eg. mask_and_scale, decode_times)
open_opts: options for the backend file opener (eg. driver, clobber, diskless, format)
backend_opts: options for xarray (eg. chunk, cache, inline_array)
store_opts: options for the backend store (eg. group, lock, autoclose)

define these classes in BackendEntrypoint and override them in the subclasses.
for now only for netcdf4/h5netcdf backends
implement logic into open_dataset
implement logic into to_netcdf
for backwards compatibility reinitialize the above options with the given kwargs as needed

Example usage:

# simple call, use backend default options
ds = xr.open_dataset("test.nc", engine="netcdf4") # simple call
# define once, use many , these should be imported from the backend 
open_opts = NetCDF4OpenOptions(auto_complex=True)
coder_opts = NetCDF4CoderOptions(decode_times=False, mask_and_scale=False)
backend_opts = XarrayBackendOptions(chunk={"time": 10})
store_opts = NetCDF4StoreOptions(group="test")
# engine could also be the `BackenEntryPoint`
ds = xr.open_dataset("test.nc", engine="netcdf4", open_opts=open_opts, coder_opts=coder_opts, backend_opts=backend_opts, store_opts=store_opts)

CONS:

Most users might not need to use these added options at all, but could fallback to current behaviour
Users might complain about the additional complexity for setting up the dataclasses
tbc.

PROS:

strict separation of kwargs/options
easy forwarding
per backend kwargs/options
easy adding kwargs/options
tbc.

What this PR still needs to do:

implement everything above for the other built-in backends (zarr, scipy, pydap, etc.)

I have follow-up ideas:

implement save_dataset in BackendEntrypoint to write to the engine's native format, like to_netcdf would be for scipy/netcdf4/h5netcdf and to_zarr would be for zarr. With that we could do the writing with a unified API, something like:

ds = xr.open_dataset("test.nc", engine="netcdf4")
# Dataset API
ds.save_dataset("test.zarr", engine="zarr)
ds.save_dataset("test2.nc", engine="netcdf4")
# general API
xr.save_dataset(ds, "test2.nc", engine="netcdf4")
ds.save_dataset("test.grib", engine="grib") # my imagination
ds.save_dataset("test.hdf5", engine="hdf5") # my imagination

further disentangle the current built-in backends from xarray so that they could be their own module

I'm sure I have not taken into account all the possible pitfalls/problems which might arise here. I'd appreciate any comments and suggestions.

…ure footprint of open_dataset

kmuehlbauer · 2025-06-20T06:33:37Z

Please have a look at #10429, where I've split out the cf coder related kwargs grouping.

keewis · 2025-07-02T20:57:12Z

To summarize what I argued for after the end of the meeting today, I think we should slowly transition to an API where we pass the entire decoding chain as a sequence of functions / callable objects into xr.open_dataset that would be executed in that order they were passed. Additionally, backends should have the option to disable certain builtin coders (this is especially important when encoding).

This would require a lot of thought to figure out a good API, and even more to find a good way to transition towards that. I think this would make extending the coders a lot easier, and possibly pave the way towards dataset coders (or rather, multi-variable coders).

I think it might be possible to change the dataclass added in this PR to act as a bridge towards the idea in #4490 (comment) (which should probably be extended to allow other libraries / backends to modify that chain).

kmuehlbauer and others added 4 commits June 14, 2025 15:04

WIP: use dataclasses for combining keyword arguments to reduce signat…

a7e096e

…ure footprint of open_dataset

WIP

fa55f2e

WIP: to_netcdf

7e45379

clean up

edcc10c

github-actions bot added topic-backends topic-documentation topic-CF conventions io labels Jun 14, 2025

kmuehlbauer mentioned this pull request Jun 18, 2025

Group decoding options into single argument #10429

Open

4 tasks

kmuehlbauer mentioned this pull request Jul 2, 2025

Shaping the future of Backends #8548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Group Backend Keyword Arguments #10422

Group Backend Keyword Arguments #10422

Uh oh!

kmuehlbauer commented Jun 14, 2025

Uh oh!

kmuehlbauer commented Jun 20, 2025

Uh oh!

keewis commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Group Backend Keyword Arguments #10422

Are you sure you want to change the base?

Group Backend Keyword Arguments #10422

Uh oh!

Conversation

kmuehlbauer commented Jun 14, 2025

Uh oh!

kmuehlbauer commented Jun 20, 2025

Uh oh!

keewis commented Jul 2, 2025

Uh oh!

Uh oh!