Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all indexing variants #1917

Merged
merged 16 commits into from
Jun 3, 2024
Merged

Support all indexing variants #1917

merged 16 commits into from
Jun 3, 2024

Conversation

normanrz
Copy link
Contributor

In this PR I ported over all indexing variants from the v2 codebase into v3. The Array class exposes the sames methods as in the v2 code base (e.g. get_basic_selection, set_orthogonal_selection, oindex[...]) whereas the AsyncArray only has _get_selection(indexer, ...) and _set_selection(indexer, ...).

The current status of this PR is: the tests are green but typing still causes some headaches.

There are a few breaking changes of the v3 code that we should discuss:

  • 0-dimensional arrays (i.e. single values) are not supported
  • selecting fields (object dtype) is shaky
  • The out kwarg for getitem doesn't work and can be complicated to implement with the new NDBuffer abstraction

Please let me know your thoughts on these limitations @jhamman @d-v-b.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@normanrz normanrz added the V3 Related to compatibility with V3 spec label May 27, 2024
@normanrz normanrz self-assigned this May 27, 2024
@d-v-b
Copy link
Contributor

d-v-b commented May 27, 2024

Thanks for this effort! I will give it a closer look later today.

regarding these two concerns:

  • 0-dimensional arrays (i.e. single values) are not supported

  • selecting fields (object dtype) is shaky

I don't have a lot of experience with either of these features from v2, so I am probably not the best judge of how we want this to look. Are there any big time 0-dimensional array or object dtype users we can ping to have them look it over?

The out kwarg for getitem doesn't work and can be complicated to implement with the new NDBuffer abstraction

Similarly, I never use out, but maybe @madsbk has some thoughts here?

@madsbk
Copy link
Contributor

madsbk commented May 28, 2024

Regarding out, I think it should be all or nothing. That is, either we implement an output-by-caller policy where all components takes an output argument, or we don't use output arguments at all.

As @akshaysubr point out in #1751 (comment), it might be diffecult to support an output-by-caller policy so I am leaning to remove the out argument and make it clear that the Zarr stack will copy data in most cases.

@normanrz
Copy link
Contributor Author

While I agree that providing an out kwarg is pointless if zarr-python is doing copies anyways, we have it in the existing API. I wonder if we should continue to provide the out arg but issue a (deprecation) warning?

@madsbk
Copy link
Contributor

madsbk commented May 29, 2024

Good point, a deprecation warning is a good idea :)

@rabernat
Copy link
Contributor

Are there any big time 0-dimensional array or object dtype users we can ping to have them look it over?

#1874 recently revealed that there are definitely users of 0-dimensional arrays. (Also python objects dtypes.)

@normanrz normanrz added this to the 3.0.0.alpha milestone May 31, 2024
src/zarr/codecs/pipeline.py Outdated Show resolved Hide resolved
BlockSelection = BlockSelector | tuple[BlockSelector, ...]
BlockSelectionNormalized = tuple[BlockSelector, ...]
MaskSelection = npt.NDArray[np.bool_]
OrthogonalSelector = int | slice | npt.NDArray[np.intp | np.bool_]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should look for a way to simplify this as much as we can.OrthogonalSelector vs OrthogonalSelection, where the latter contains the former, is a recipe for confusion.

CoordinateSelection = npt.NDArray[np.intp]
BlockSelector = int | slice
BlockSelection = BlockSelector | tuple[BlockSelector, ...]
BlockSelectionNormalized = tuple[BlockSelector, ...]
Copy link
Contributor

@d-v-b d-v-b Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we use a type alias like BlockSelectionNormalized for tuple[BlockSelector,...] we lose some readability. To know that BlockSelectionNormalized is tuple[int | slice, ...] I have to do 2 lookups, and we aren't even saving lines of code because the type aliases are longer than the types. Was there a particular problem caused by the plain type annotations?

MaskSelection = npt.NDArray[np.bool_]
OrthogonalSelector = int | slice | npt.NDArray[np.intp | np.bool_]
OrthogonalSelection = OrthogonalSelector | tuple[OrthogonalSelector, ...]
OrthogonalSelectionNormalized = tuple[OrthogonalSelector, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name OrthogonalSelectionNormalized I would expect that this type takes OrthogonalSelection, but it takes OrthogonalSelecTOR, which is a bit surprising / confusing

@normanrz
Copy link
Contributor Author

normanrz commented Jun 1, 2024

I got the types to a state where mypy doesn't complain anymore. I know that the types are far from ideal. However, it will take more work to iron that out given the variety of indexing methods that zarr-python supports and I don't want to hold this off from the alpha release.

@normanrz normanrz marked this pull request as ready for review June 1, 2024 20:27
@d-v-b
Copy link
Contributor

d-v-b commented Jun 1, 2024

This is great, thanks @normanrz! Agreed that we can sort out the types in a later effort.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great @normanrz -- a few comments to help tidy this up.

tests/v3/util.py Outdated Show resolved Hide resolved
tests/v3/util.py Outdated Show resolved Hide resolved
tests/v3/util.py Outdated Show resolved Hide resolved
@normanrz normanrz merged commit 24e855c into v3 Jun 3, 2024
21 checks passed
@normanrz normanrz deleted the v3-indexing branch June 3, 2024 11:57
d-v-b pushed a commit to d-v-b/zarr-python that referenced this pull request Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Related to compatibility with V3 spec
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants