Experimental Support for Subarray DTypes #3587
Open
+717
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@d-v-b
This PR adds experimental support for subarray dtypes (https://numpy.org/doc/stable/glossary.html#term-subarray-data-type, https://numpy.org/doc/stable/user/basics.rec.html#structured-datatype-creation) and closes #3582 and #3583.
It also fixes support for nested (and subarray-containing)
Structureddtypes for Zarr v2 which worked before in 2.18.* but not anymore 3.1.*. In particular, the buggy implementation forgot that a nested structured dtype is again a list of lists and not just a single flat list.Note 1:
Subarray dtypes are in a very weird spot. They are a proper
np.dtype, particular anp.VoidDTypewith unsetfieldsattribute but setsubdtypefield. Hence, it makes sense to map them one-to-one to aZDType. This also makes sense from an implementation standpoint wrt. serialization.On the other hand, they do not have a proper scalar value. I.e. one can not create a
np.voidscalar for a subarray dtype (throws). Conceptually, a scalar value of a subarray dtype would be anp.ndarray. This, however, is not a subtype ofnp.genericdespite sharing a lot of the interface. When one creates a np.ndarray with a subarray dtype directly, the result is "flat"np.ndarraywith shapearray_shape + subarray_shape.I've decided to still implement them as separate
Subarray-ZDType and not conflate them within theStructuredclass. While this works flawlessly when used within a structured dtype, the intended use case, using them directly is not fully supported. Specifically, there is no specification for standalone subarray dtypes in Zarr V2, making a lot of test cases fail. Apart from that, some tests in test_array.py do not expect an array as scalar and hence fail. I want to stress though, that I was able to successfully create and read a Subarray zarr array with V3.Solving this conundrum adequately is beyond my possibilities and might require significant conceptual changes in Zarr. I did not add the dtype directly to
test_dtype/contest.pybut instead added a new test case forStructuredthat uses a Subarray inside which passes.Note 2: I've also added a test case for an invalid float value string which fails due to #3584. Since that test case highlights an existing bug, I've decided to leave it there.
TODO:
docs/user-guide/*.mdchanges/