Skip to content

Conversation

@sehoffmann
Copy link

@d-v-b

This PR adds experimental support for subarray dtypes (https://numpy.org/doc/stable/glossary.html#term-subarray-data-type, https://numpy.org/doc/stable/user/basics.rec.html#structured-datatype-creation) and closes #3582 and #3583.

It also fixes support for nested (and subarray-containing) Structured dtypes for Zarr v2 which worked before in 2.18.* but not anymore 3.1.*. In particular, the buggy implementation forgot that a nested structured dtype is again a list of lists and not just a single flat list.

Note 1:
Subarray dtypes are in a very weird spot. They are a proper np.dtype, particular a np.VoidDType with unset fields attribute but set subdtype field. Hence, it makes sense to map them one-to-one to a ZDType. This also makes sense from an implementation standpoint wrt. serialization.

On the other hand, they do not have a proper scalar value. I.e. one can not create a np.void scalar for a subarray dtype (throws). Conceptually, a scalar value of a subarray dtype would be a np.ndarray. This, however, is not a subtype of np.generic despite sharing a lot of the interface. When one creates a np.ndarray with a subarray dtype directly, the result is "flat" np.ndarray with shape array_shape + subarray_shape.

I've decided to still implement them as separate Subarray-ZDType and not conflate them within the Structured class. While this works flawlessly when used within a structured dtype, the intended use case, using them directly is not fully supported. Specifically, there is no specification for standalone subarray dtypes in Zarr V2, making a lot of test cases fail. Apart from that, some tests in test_array.py do not expect an array as scalar and hence fail. I want to stress though, that I was able to successfully create and read a Subarray zarr array with V3.

Solving this conundrum adequately is beyond my possibilities and might require significant conceptual changes in Zarr. I did not add the dtype directly to test_dtype/contest.py but instead added a new test case for Structured that uses a Subarray inside which passes.

Note 2: I've also added a test case for an invalid float value string which fails due to #3584. Since that test case highlights an existing bug, I've decided to leave it there.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Nov 20, 2025
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 63.15789% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.97%. Comparing base (edd47db) to head (c21fcc6).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/dtype/npy/subarray.py 58.55% 46 Missing ⚠️
src/zarr/core/dtype/npy/structured.py 80.00% 5 Missing ⚠️
src/zarr/core/dtype/common.py 71.42% 4 Missing ⚠️
src/zarr/core/dtype/__init__.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3587      +/-   ##
==========================================
+ Coverage   61.95%   61.97%   +0.01%     
==========================================
  Files          86       87       +1     
  Lines       10170    10311     +141     
==========================================
+ Hits         6301     6390      +89     
- Misses       3869     3921      +52     
Files with missing lines Coverage Δ
src/zarr/core/dtype/npy/bytes.py 53.00% <100.00%> (ø)
src/zarr/core/dtype/__init__.py 29.50% <0.00%> (-0.50%) ⬇️
src/zarr/core/dtype/common.py 33.33% <71.42%> (+5.62%) ⬆️
src/zarr/core/dtype/npy/structured.py 60.34% <80.00%> (+3.96%) ⬆️
src/zarr/core/dtype/npy/subarray.py 58.55% <58.55%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subarray dtypes get lost on serialization / casted to void type

1 participant