Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding existing group to a newly created group #476

Open
sofroniewn opened this issue Sep 5, 2019 · 7 comments
Open

adding existing group to a newly created group #476

sofroniewn opened this issue Sep 5, 2019 · 7 comments

Comments

@sofroniewn
Copy link

Hi all,

I have what might be a pretty basic question, but I couldn't figure it out from the tutorials.

I have some existing group say foo and would like to to create a new group called root that I can then put foo inside, i.e. something like the following:

root = zarr.group()
root.create_group('foo')
root['foo'] = foo

but I get the following error message

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-84-ece5fe0103d1> in <module>
      1 root = zarr.group()
      2 root.create_group('foo')
----> 3 root['foo'] = foo

/anaconda3/lib/python3.7/site-packages/zarr/hierarchy.py in __setitem__(self, item, value)
    334 
    335     def __setitem__(self, item, value):
--> 336         self.array(item, value, overwrite=True)
    337 
    338     def __delitem__(self, item):

/anaconda3/lib/python3.7/site-packages/zarr/hierarchy.py in array(self, name, data, **kwargs)
    907         """Create an array. Keyword arguments as per
    908         :func:`zarr.creation.array`."""
--> 909         return self._write_op(self._array_nosync, name, data, **kwargs)
    910 
    911     def _array_nosync(self, name, data, **kwargs):

/anaconda3/lib/python3.7/site-packages/zarr/hierarchy.py in _write_op(self, f, *args, **kwargs)
    627 
    628         with lock:
--> 629             return f(*args, **kwargs)
    630 
    631     def create_group(self, name, overwrite=False):

/anaconda3/lib/python3.7/site-packages/zarr/hierarchy.py in _array_nosync(self, name, data, **kwargs)
    914         kwargs.setdefault('cache_attrs', self.attrs.cache)
    915         return array(data, store=self._store, path=path, chunk_store=self._chunk_store,
--> 916                      **kwargs)
    917 
    918     def empty_like(self, name, data, **kwargs):

/anaconda3/lib/python3.7/site-packages/zarr/creation.py in array(data, **kwargs)
    317     # ensure data is array-like
    318     if not hasattr(data, 'shape') or not hasattr(data, 'dtype'):
--> 319         data = np.asanyarray(data)
    320 
    321     # setup dtype

/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in asanyarray(a, dtype, order)
    589 
    590     """
--> 591     return array(a, dtype, copy=False, order=order, subok=True)
    592 
    593 

/anaconda3/lib/python3.7/site-packages/zarr/hierarchy.py in __getitem__(self, item)
    331                          synchronizer=self._synchronizer)
    332         else:
--> 333             raise KeyError(item)
    334 
    335     def __setitem__(self, item, value):

KeyError: 0

Is this behaviour supported / is there another way to achieve something like this?

The thing I ultimately want to do it put many groups foo, bar, etc. inside root, but I want to write functions that are self contained that generate the individual foo groups and then put them together at the end.

Thanks!

@sofroniewn
Copy link
Author

@jakirkham do you have any thoughts on how to do the above?

@alimanfoo
Copy link
Member

Hi @sofroniewn, it sounds like you are asking if you can move a group to a different location in a hierarchy?

@sofroniewn
Copy link
Author

Yes, or maybe more attach an existing group to a new point in the hierarchy of another group

@alimanfoo
Copy link
Member

alimanfoo commented Sep 20, 2019 via email

@joshmoore
Copy link
Member

Found myself independently discussing this today. My hope was to basically do it implicitly:

/tmp $ cat z.py
#!/usr/bin/env python
import zarr
store = zarr.DirectoryStore("ztop")
ztop = zarr.group(store=store, overwrite=True)
print ztop.info

arr = zarr.open("ztop/arr1", mode="w", shape=10, dtype="i4")
arr[0] = 1

store = zarr.DirectoryStore("ztop/inner")
inner = zarr.group(store=store, overwrite=True)
print inner.info

ztop = zarr.open("ztop")
print ztop.info

and

/tmp $./z.py
Name        : /
Type        : zarr.hierarchy.Group
Read-only   : False
Store type  : zarr.storage.DirectoryStore
No. members : 0
No. arrays  : 0
No. groups  : 0

Name        : /
Type        : zarr.hierarchy.Group
Read-only   : False
Store type  : zarr.storage.DirectoryStore
No. members : 0
No. arrays  : 0
No. groups  : 0

Name        : /
Type        : zarr.hierarchy.Group
Read-only   : False
Store type  : zarr.storage.DirectoryStore
No. members : 2
No. arrays  : 1
No. groups  : 1
Arrays      : arr1
Groups      : inner

i.e. let the independent processes write to the appropriate subdirectory, but by having created the top-level directory as a group in its own right, on re-opening the top group, things "just work".

@sofroniewn
Copy link
Author

sofroniewn commented Sep 20, 2019

interesting @joshmoore an approach like that might be more sensible for me too.

I'm starting to think that what i'm trying to do might be a bit of an antipattern and that the solution I currently have might be the right one.

@alimanfoo I didn't really want two locations in the hierarchy - it was more that I wanted to move everything over from where it was and put it in the new group.

Here's my exact use case - I am trying to ultimately create one single zarr file containing all the information about a complex object I am working with. My object is broken up into a list of instances of different classes. I'd like to write a methods on my classes that knows how to generate a zarr group for each class. I'd also like to write a method on my top level object that creates a root zarr group (at the top of the hierarchy) and then iterates through my instances, generates my smaller zarr groups and attaches them to the root. In pseduocode:

root = zarr.group(store=my_disk_path)
for obj in my_objects:
    root[str(obj)] = obj.to_zarr()

This didn't work, and I guess what I realize now is that the obj.to_zarr() needs to know where to put that data (say if it was enormous, it needs its own store), and so what I ended up doing instead was more as follows:

root = zarr.group(store=my_disk_path)
for obj in my_objects:
    root = obj.to_zarr(root)

where my to_zarr methods now take in a zarr group and are responsible for putting their data in the right place in the hierarchy. This works fine but at the time I thought made my methods slightly less well-contained.

If you have any advice on this @alimanfoo or @joshmoore that would be great, but I think at this point I'm probably good to go, so no worries if that all made no sense!!

@jakirkham
Copy link
Member

There's rename. IDK if that's what you want though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants