feat: subchunk write order by ilan-gold · Pull Request #160 · zarrs/zarrs-python

ilan-gold · 2026-03-22T15:32:47Z

I could not find a clean way to do Literal -> Enum in PyO3 despite extensive googling
Maybe @d-v-b we standardize where this option goes since it applies to everyone? Happy to contribute this upstream to zarr-python (including C + random write order)
Major release after this?

ilan-gold · 2026-03-22T15:37:19Z

src/lib.rs

    #[pyo3(signature = (
        array_metadata,
        store_config,
        *,
        validate_checksums=false,
        chunk_concurrent_minimum=None,
        chunk_concurrent_maximum=None,
        num_threads=None,
        direct_io=false,
+        subchunk_write_order=SubchunkWriteOrderWrapper(SubchunkWriteOrder::Random),


I basically just got this working with the compiler but is it used since we have stubgen? I found https://pyo3.rs/v0.28.2/function/signature#type-annotations-in-the-signature which might be interesting

Neat, yeah that’s new! We could try migrating to it in a different PR.

ilan-gold · 2026-03-23T00:14:36Z

A complication: this only applies to the outer sharding codec.

Also maybe this isn't a pipeline setting in zarr-python but a codec setting (even though I think a lot of people just use shards= for sharding). Not sure here....

LDeakin · 2026-03-23T00:27:39Z

src/utils.rs

+
+impl pyo3_stub_gen::PyStubType for SubchunkWriteOrderWrapper {
+    fn type_output() -> pyo3_stub_gen::TypeInfo {
+        pyo3_stub_gen::TypeInfo::builtin("str")


Perhaps a bit cleaner for type hints: what about pyo3_stub_gen::TypeInfo::with_module("zarrs.SubchunkWriteOrder", "zarrs".into()) with

class SubchunkWriteOrder(StrEnum): C = "C" random = "random"

write_order: SubchunkWriteOrder would only allow passing a enum member and not a regular string then right?

So users couldn’t do write_order="C" according to Python’s type system, they’d have to do write_order=SubchunkWriteOrder.C

nit but i find "lexicographic" a bit nicer and less NumPy-specific than "C"

So users couldn’t do write_order="C" according to Python’s type system, they’d have to do write_order=SubchunkWriteOrder.C

I didn't even try to check if the zarr.config accepts arbitrary objects but I think in theory one could set these via environment variables as well?

nit but i find "lexicographic" a bit nicer and less NumPy-specific than "C"

Wow I didn't realize C was numpy-specific!

d-v-b · 2026-03-23T07:53:58Z

Also maybe this isn't a pipeline setting in zarr-python but a codec setting (even though I think a lot of people just use shards= for sharding). Not sure here....

My feeling is that in zarr-python today the sharding codec owns the subchunk order, and that this setting should be passed to the sharding codec constructor. But why does this only apply to the outer sharding codec? The subchunk order is a degree of freedom for inner sharding codecs too, no?

ilan-gold · 2026-03-23T09:21:15Z

But why does this only apply to the outer sharding codec?

Yes in zarrs this is no problem: the option gets put in the codec, which can be nested

However, since we have just been making everything a runtime config setting here, I then though "oh, this could be done in zarr-python as well in zarr.config." But then I (re)realized last night that this is a codec setting as I had literally just implemented in rust. And therefore it applies per (potentially nested) codec, which I forgot in the course of making this PR.

So I think that since we have a way of grabbing the codecs from the ArrayMetadata, a codec setting is the way to go, you're right. And then zarrs-python is responsible for handling that setting.

d-v-b · 2026-03-23T09:28:41Z

what does "random" mean? is there a randomness guarantee, or is it just that there is no order guarantee?

ilan-gold · 2026-03-23T09:40:40Z

what does "random" mean? is there a randomness guarantee, or is it just that there is no order guarantee?

That's a good point, "no order guarantee" I would say. This is effectively random in zarrs due to the threading so I guess that was the naming convention but I am not sure what happesns with only one thread.

d-v-b · 2026-03-23T09:44:13Z

If the order is sensitive to execution time, then subchunks that take more time to process (e.g., they are harder to compress, or just larger if we get a rectilinear chunk grid inside shards) will tend to get stored "toward the back" of the chunk. In this case, "random" might mislead people into thinking the order is really random.

ilan-gold · 2026-03-23T09:54:00Z

Great point @d-v-b - random is definitely a misnomer then.

flying-sheep · 2026-03-23T10:56:04Z

src/utils.rs

 }
+
+#[derive(Debug, Clone)]
+pub struct SubchunkWriteOrderWrapper(pub SubchunkWriteOrder);


So this is requeired because of the orphan rule right? SubchunkWriteOrder is not our type and IntoPyObject isn’t either. Did you try implementing SubchunkWriteOrderExt instead for which you then implement IntoPyObject? The orphan rule probably prevents that too, right?

So this is requeired because of the orphan rule right? SubchunkWriteOrder is not our type and IntoPyObject isn’t either.

So went my thinking, although I think IntoPyObject not being our trait has nothing to do with this - it's just that we don't create SubchunkWriteOrderhere

Did you try implementing SubchunkWriteOrderExt instead for which you then implement IntoPyObject?

Is that different than what we have here?

yeah, it’d be a trait instead of a type, which would let us use the unwrapped type. But as said, probably doesn’t work.

README.md

flying-sheep · 2026-03-23T10:58:50Z

README.md

  - Defaults to `False`.
 - `codec_pipeline.strict`: raise exceptions for unsupported operations instead of falling back to the default codec pipeline of `zarr-python`.
  - Defaults to `False`.
+- `codec_pipeline.subchunk_write_order`: Tells `zarrs` in what order to write subchunks within a shard. One of "C" or "random."


maybe add a bit of reasoning why one would be chosen over the other.

Co-authored-by: Philipp A. <flying-sheep@web.de>

flying-sheep · 2026-03-23T11:11:27Z

Regarding "random": maybe "arbitrary" would work instead? Or just None if we don’t want to reserve that to mean “the default” or “auto-derive” or so?

We don’t have to keep (and therefore explain in the readme) the same name as zarrs if we agree it’s a misnomer.

ilan-gold · 2026-03-23T11:16:34Z

We don’t have to keep (and therefore explain in the readme) the same name as zarrs if we agree it’s a misnomer.

Agreed, but I'd like to figure out what zarr-python is going to do. There is not much rush on this feature so I'm happy to take some time to get it right.

ilan-gold · 2026-03-23T15:43:09Z

src/utils.rs

 impl pyo3_stub_gen::PyStubType for SubchunkWriteOrderWrapper {
    fn type_output() -> pyo3_stub_gen::TypeInfo {
-        pyo3_stub_gen::TypeInfo::builtin("str")
+        pyo3_stub_gen::TypeInfo::with_module("typing.Literal['C', 'random']", "typing".into())


the parameter is called name, so it’s not clear that it can accept any code.

ilan-gold added 4 commits March 21, 2026 11:22

feat: subchunk write order

5ef4b1b

chore: test

746f39c

chore: deprecated calls

2a59fc5

fix: not unreachable

28cce28

ilan-gold requested review from LDeakin and flying-sheep March 22, 2026 15:34

ilan-gold commented Mar 22, 2026

View reviewed changes

LDeakin reviewed Mar 23, 2026

View reviewed changes

flying-sheep reviewed Mar 23, 2026

View reviewed changes

flying-sheep approved these changes Mar 23, 2026

View reviewed changes

ilan-gold and others added 4 commits March 23, 2026 12:00

Update README.md

037db6e

Co-authored-by: Philipp A. <flying-sheep@web.de>

fmt

1338e49

chore: explanation in readme

46506a5

clippy

0ff7867

fix type

4dad7bb

ilan-gold commented Mar 23, 2026

View reviewed changes

ilan-gold mentioned this pull request Mar 24, 2026

feat: subchunk write order zarr-developers/zarr-python#3826

Open

6 tasks

Conversation

ilan-gold commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Mar 23, 2026

Uh oh!

ilan-gold commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Mar 23, 2026

Uh oh!

ilan-gold commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flying-sheep commented Mar 23, 2026

Uh oh!

ilan-gold commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ilan-gold commented Mar 22, 2026 •

edited

Loading

ilan-gold Mar 22, 2026 •

edited

Loading

ilan-gold commented Mar 23, 2026 •

edited

Loading

ilan-gold commented Mar 23, 2026 •

edited

Loading

d-v-b commented Mar 23, 2026 •

edited

Loading