Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing with list not supported #1029

Closed
brentp opened this issue May 17, 2022 · 14 comments
Closed

indexing with list not supported #1029

brentp opened this issue May 17, 2022 · 14 comments

Comments

@brentp
Copy link

brentp commented May 17, 2022

Minimal, reproducible code sample, a copy-pastable example if possible

import numpy as np
import zarr

rows = [3, 4]
a = np.arange(20).reshape((10, 2))

print(a[rows]) # OK

z = zarr.array(a)
print(z[rows]) # indexerror

Problem description

Based on this I expected this to work and to return the same rows as numpy. But it raises in index error.

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.11.3
@joshmoore
Copy link
Member

Hi @brentp. Which cells are you expecting to receive? e.g.

In [1]: import zarr, numpy as np

In [2]: z = zarr.array(np.arange(20).reshape(10,2))

In [3]: z[:]
Out[3]:
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [4]: z[[0, 1], [1, 0]]
Out[4]: array([1, 2])

@brentp
Copy link
Author

brentp commented May 17, 2022

Hi, I expect to get the same result as numpy, but index it's an indexerror. I have updated the example to be more complete with this code:

import numpy as np
import zarr

rows = [3, 4]
a = np.arange(20).reshape((10, 2))

print(a[rows])

z = zarr.array(a)
print(z[rows])

@joshmoore
Copy link
Member

Ah, I see. I remember there being limitations but I don't remember this one. @jni?

@jakirkham
Copy link
Member

Zarr also supports oindex and vindex, which may be another way to get at the functionality needed here

@joshmoore
Copy link
Member

z.oindex[rows] certainly returns the same values, but vindex is being used by z[rows].

@jakirkham
Copy link
Member

There's some overlap between these which can be a kind of confusing aspect of advanced indexing. Hence why vindex & oindex were added before advanced indexing was

@jni
Copy link
Contributor

jni commented May 18, 2022

Yes, basically as I remember it @shoyer raised the point that mixing integer indexing with slices (which this use case implicitly uses) can be tricky to get right, so we left that out of the fancy indexing implementation in #725 for a braver soul to tackle. 😜 The discussion starts here:

#725 (comment)

As I see it, the next "easy" step is to allow leading integer indexing, which won't reorder axes in either NumPy or zarr.Array.vindex (I think!). What do you think about this @joshmoore @jakirkham @shoyer?

@joshmoore
Copy link
Member

Sorry, I failed to note that no one else has responded. I'm all for incremental steps, but maybe there's a question of how to document the rest of the steps needed for the likes of @brentp.

@joshmoore
Copy link
Member

Having apparently killed the conversation, I'm inclined to say, "Go for it, @jni! 👏🏽"

@jni
Copy link
Contributor

jni commented Jul 13, 2022

I'm currently on leave, but a PR would be pretty easy, hopefully @brentp or someone else can handle it. This is the bit of code that needs modifying:

zarr-python/zarr/core.py

Lines 784 to 789 in 9ce2f32

fields, pure_selection = pop_fields(selection)
if is_pure_fancy_indexing(pure_selection, self.ndim):
result = self.vindex[selection]
else:
result = self.get_basic_selection(pure_selection, fields=fields)
return result

Before the else:, add an elif clause:

elif is_just_a_list_of_integers(pure_selection):
    return self.oindex[pure_selection]

For an appropriate implementation of "is just a list of integers." 😂

You probably want to also do this on __setitem__, here.

@AndreasAlbertQC
Copy link
Contributor

I'd be interested in implementing a solution to this issue. Beyond what was discussed above, I'm also interested in multi-dimensional indexing with slices. Something like:

# Example from above
rows = [3, 4]
a = np.arange(20).reshape((10, 2))
print(a[rows])

# What I'd like to have in addition
a[rows, :]
a[rows,1:3]

columns = [1,5,4]
a[:,columns]
a[3:4, colums]

The solution proposed by @jni would have to be expanded a little bit. I think this should work:

def is_pure_orthogonal_indexing(selection, ndim):
    # Case 1: Selection is a single iterable of integers
    if is_integer_list(selection) or is_integer_array(selection, ndim=1):
        return True

    # Case two: selection contains either zero or one integer iterables. 
    # All other selection elements are slices.
    return (
        isinstance(selection, tuple) and len(selection) == ndim and 
        sum(is_integer_list(elem) or is_integer_array(elem) for elem in selection) <=1 and
        all(is_integer_list(elem) or is_integer_array(elem) or isinstance(elem, slice) for elem in selection)
    )

And then the elif proposed by @jni would be elif is_pure_orthogonal_indexing(pure_selection, self.ndim).

If there are no objections to this approach, I'll be happy to make a PR.

@shoyer
Copy link
Contributor

shoyer commented Jan 24, 2023

NumPy's combined advanced and basic indexing is a little complex, but not terrible: https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing

As long as you follow those rules exactly (and test for compatibility with NumPy's rules), we can definitely extend the cases Zarr supports.

@MSanKeys963
Copy link
Member

Safe to close this in favour of #1333?

@jni
Copy link
Contributor

jni commented Mar 28, 2023

Yes this was fixed by #1333, thanks for the reminder @MSanKeys963!

@jni jni closed this as completed Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants