Skip to content

write behavior for empty chunks #2015

@d-v-b

Description

@d-v-b

In v2, at array access time it is possible to set whether empty chunks (defined as chunks that are entirely fill_value) should be written to storage or skipped. This is an extremely useful feature for high-latency storage backends, or in any context where too many objects in storage is burdensome.

We don't support this in v3 yet, but we should. How should we do it? I will throw out a few options in order of practicality:

  • emulate v2: provide a keyword argument like write_empty_chunks when accessing an array. All chunk writes from that array will be affected.
  • put the write_empty_chunks setting in a global config. All chunk writes from all arrays in a session will be affected by the config parameter.
  • design an API for array IO wherein IO is wrapped in a context that can be parametrized, e.g. with a context manager, and one of those parameters is the write_empty_chunks-ness of the write transaction. Highly speculative.

The first option seems pretty expedient, and I don't think we had a lot of problems with this approach in v2. The only drawback is that if people want the same array to exhibit conditional write_empty_chunks behavior, then they might need something like the second approach, which has its own drawbacks IMO (i'm not a big fan of mutable global state).

I would propose that we emulate v2 for now (i.e., make write_empty_chunks a keyword argument to array access) and note any friction this causes, and consider ways to alleviate that in a subsequent design refresh if the friction is severe.

cc @constantinpape

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions