Conversation
| #[pyo3(signature = ( | ||
| array_metadata, | ||
| store_config, | ||
| *, | ||
| validate_checksums=false, | ||
| chunk_concurrent_minimum=None, | ||
| chunk_concurrent_maximum=None, | ||
| num_threads=None, | ||
| direct_io=false, | ||
| subchunk_write_order=SubchunkWriteOrderWrapper(SubchunkWriteOrder::Random), |
There was a problem hiding this comment.
I basically just got this working with the compiler but is it used since we have stubgen? I found https://pyo3.rs/v0.28.2/function/signature#type-annotations-in-the-signature which might be interesting
There was a problem hiding this comment.
Neat, yeah that’s new! We could try migrating to it in a different PR.
|
A complication: this only applies to the outer sharding codec. Also maybe this isn't a pipeline setting in |
src/utils.rs
Outdated
|
|
||
| impl pyo3_stub_gen::PyStubType for SubchunkWriteOrderWrapper { | ||
| fn type_output() -> pyo3_stub_gen::TypeInfo { | ||
| pyo3_stub_gen::TypeInfo::builtin("str") |
There was a problem hiding this comment.
Perhaps a bit cleaner for type hints: what about pyo3_stub_gen::TypeInfo::with_module("zarrs.SubchunkWriteOrder", "zarrs".into()) with
class SubchunkWriteOrder(StrEnum):
C = "C"
random = "random"There was a problem hiding this comment.
write_order: SubchunkWriteOrder would only allow passing a enum member and not a regular string then right?
So users couldn’t do write_order="C" according to Python’s type system, they’d have to do write_order=SubchunkWriteOrder.C
There was a problem hiding this comment.
nit but i find "lexicographic" a bit nicer and less NumPy-specific than "C"
There was a problem hiding this comment.
So users couldn’t do write_order="C" according to Python’s type system, they’d have to do write_order=SubchunkWriteOrder.C
I didn't even try to check if the zarr.config accepts arbitrary objects but I think in theory one could set these via environment variables as well?
There was a problem hiding this comment.
nit but i find "lexicographic" a bit nicer and less NumPy-specific than "C"
Wow I didn't realize C was numpy-specific!
My feeling is that in zarr-python today the sharding codec owns the subchunk order, and that this setting should be passed to the sharding codec constructor. But why does this only apply to the outer sharding codec? The subchunk order is a degree of freedom for inner sharding codecs too, no? |
Yes in However, since we have just been making everything a runtime config setting here, I then though "oh, this could be done in So I think that since we have a way of grabbing the codecs from the |
|
what does "random" mean? is there a randomness guarantee, or is it just that there is no order guarantee? |
That's a good point, "no order guarantee" I would say. This is effectively random in |
|
If the order is sensitive to execution time, then subchunks that take more time to process (e.g., they are harder to compress, or just larger if we get a rectilinear chunk grid inside shards) will tend to get stored "toward the back" of the chunk. In this case, "random" might mislead people into thinking the order is really random. |
|
Great point @d-v-b - random is definitely a misnomer then. |
| } | ||
|
|
||
| #[derive(Debug, Clone)] | ||
| pub struct SubchunkWriteOrderWrapper(pub SubchunkWriteOrder); |
There was a problem hiding this comment.
So this is requeired because of the orphan rule right? SubchunkWriteOrder is not our type and IntoPyObject isn’t either. Did you try implementing SubchunkWriteOrderExt instead for which you then implement IntoPyObject? The orphan rule probably prevents that too, right?
There was a problem hiding this comment.
So this is requeired because of the orphan rule right? SubchunkWriteOrder is not our type and IntoPyObject isn’t either.
So went my thinking, although I think IntoPyObject not being our trait has nothing to do with this - it's just that we don't create SubchunkWriteOrderhere
Did you try implementing SubchunkWriteOrderExt instead for which you then implement IntoPyObject?
Is that different than what we have here?
There was a problem hiding this comment.
yeah, it’d be a trait instead of a type, which would let us use the unwrapped type. But as said, probably doesn’t work.
README.md
Outdated
| - Defaults to `False`. | ||
| - `codec_pipeline.strict`: raise exceptions for unsupported operations instead of falling back to the default codec pipeline of `zarr-python`. | ||
| - Defaults to `False`. | ||
| - `codec_pipeline.subchunk_write_order`: Tells `zarrs` in what order to write subchunks within a shard. One of "C" or "random." |
There was a problem hiding this comment.
maybe add a bit of reasoning why one would be chosen over the other.
Co-authored-by: Philipp A. <flying-sheep@web.de>
|
Regarding We don’t have to keep (and therefore explain in the readme) the same name as |
Agreed, but I'd like to figure out what |
| impl pyo3_stub_gen::PyStubType for SubchunkWriteOrderWrapper { | ||
| fn type_output() -> pyo3_stub_gen::TypeInfo { | ||
| pyo3_stub_gen::TypeInfo::builtin("str") | ||
| pyo3_stub_gen::TypeInfo::with_module("typing.Literal['C', 'random']", "typing".into()) |
There was a problem hiding this comment.
the parameter is called name, so it’s not clear that it can accept any code.
Literal->Enumin PyO3 despite extensive googlingzarr-python(includingC+randomwrite order)