-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use =
to reassign into object arrays in a group
#502
Comments
Thanks @hyanwong, I get what's happening here now. TLDR, here's how to get what (I think) you want to work: import zarr
import numcodecs
import numpy as np
store = zarr.DirectoryStore('example.zarr')
g = zarr.group(store=store, overwrite=True)
g.create_dataset('bar', shape=0, chunks=10, dtype=object, object_codec=numcodecs.JSON())
g['bar'].append(["a", "b", "c", "d"])
b = np.array([1,0,0,1], dtype=bool)
new_bar = g['bar'][:][b]
g.create_dataset('bar', data=new_bar, chunks=10, dtype=object, object_codec=numcodecs.JSON(), overwrite=True) Long explanation, if you have a group g['bar'] = x ...that is actually a shorthand for creating a new zarr array called "bar" as a member of the group g.create_dataset("bar", data=x, dtype=object, chunks=10, object_codec=numcodecs.JSON()) |
Ah, I thought it might be something like that (ability to assign via Either way, it would be useful to include something about this in the documentation (or maybe I missed it). There is a more general question of whether there is an efficient way to reassign a boolean indexed version of the same array back into the original zarr data store, without (necessarily) having to read the entire array into memory. That's a slightly different question (although I'm not sure if it's something peculiar that I want to do, but which wouldn't be of general use). However, going down this route, I can imagine it being a useful addition to be able to copy a zarr mask selection into a new zarr array via the |
Yeah, I think this might be asking too much of group item assignment (
Good idea.
For that I would generally suggest to use dask. E.g., something like:
For all computations on zarr arrays, including copying data from one array to another, I would generally recommend to use dask. We're trying to avoid putting any logic in zarr for things that could be done with dask and generally will be done better with dask given its ability to parallelize work. Note that if you need to create a new array with some or all of the same parameters as an existing array, there are convenience functions, e.g., Hth. |
What about just reusing the same |
Minimal, reproducible code sample, a copy-pastable example if possible
Problem description
Can't use
=
to reassign into object arrays in a group. Error isValueError: missing object_codec for object array
, see https://stackoverflow.com/questions/58745967/how-to-cut-down-delete-a-zarr-array and below:Version and installation information
Please provide the following:
zarr.__version__
: '2.3.2'numcodecs.__version__
: '0.6.3'The text was updated successfully, but these errors were encountered: