Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing zarr group inside jupyter yields zero values and len(group) = 0 #714

Open
AdemFr opened this issue Mar 30, 2021 · 4 comments
Open

Comments

@AdemFr
Copy link

AdemFr commented Mar 30, 2021

Minimal, reproducible code sample, a copy-pastable example if possible

import numpy as np
import zarr

store = zarr.open_group("gs://my_bucket/remote_group.zarr", mode="r")
for idx in ("group_1", "group_2"):
    nested_group = store[idx]["nested"]["group"]
    my_array = nested_group["array_in_group"]
    print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))

# Output:
# Len:  0 ; Unique Values:  [0]
# Len:  0 ; Unique Values:  [0]
# Len:  0 ; Unique Values:  [0]

When copying the whole zarr store with gsutil cp gs://my_bucket/remote_group.zarr . and running the same code again while only changing the path, I get the desired output with the correct values:

import numpy as np
import zarr

store = zarr.open_group("./remote_group.zarr", mode="r")  # Only change
for idx in ("group_1", "group_2"):
    nested_group = store[idx]["nested"]["group"]
    my_array = nested_group["array_in_group"]
    print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))

# Output:
# Len:  1 ; Unique Values:  [0 7]
# Len:  1 ; Unique Values:  [0]
# Len:  1 ; Unique Values:  [ 0  1  2  3  6  7  8  9 10]

Problem description

Access to the same zarr group on a remote google cloud storage bucket give zeros as output when run in jupyter lab or jupyter notebook. I tested this in a number of different cases, for local access to remote storage and from a remote kubernetes pod in the same gco project with access to the bucket.

Cases tested:

  1. Access remote zarr from local jupyter / jupyter lab --> Doesn’t work
  2. Access remote zarr from remote jupyter / jupyter lab (kubernetes pod) --> Doesn’t work
  3. Access local zarr (copied with gsutil) from local jupyter --> Works
  4. Access remote zarr from a remote run (kubernetes pod) inside python my_zarr_access_file.py --> Works
  5. Access remote zarr from local run (same venv as jupyter) inside python my_zarr_access_file.py --> Works

Version and installation information

Please provide the following:

  • Value of zarr.__version__: Tested '2.7.0' (kubernetes pod) and '2.6.1' (local venv)
  • Value of numcodecs.__version__: Tested '0.7.3' (kubernetes pod and local venv)
  • Version of Python interpreter: Tested 3.8.3 (kubernetes pod) and 3.7.6 (local venv)
  • Operating system (Linux/Windows/Mac) Linux (kubernetes pod) Mac (local venv)
  • How Zarr was installed: "pip into venv" (local) / "pip inside conda env (kubernetes pod)"
@joshmoore
Copy link
Member

@AdemFr : you may need to configure CORS on your bucket. See alimanfoo/zarrita#32 (comment)

@AdemFr
Copy link
Author

AdemFr commented Mar 30, 2021

@joshmoore Thanks for the quick reply! This unfortunately did not solve the issue.

@joshmoore
Copy link
Member

@AdemFr : did you ever have success here?

@AdemFr
Copy link
Author

AdemFr commented Sep 22, 2021

@joshmoore Unfortunately not, sorry. By now we also built a workaround so these len calls are not directly relevant to me right know, but could not find the real reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants