-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion with Darr developers #8
Comments
Darr developer here. Thanks for you interest and reaching out. I am very much interested in Zarr although I must confess to my shame that I haven't actually tried it out yet. Will certainly do so very soon. I have used pytables for much of my work in behavioral neuroscience, for over 10 year, and still use it. It is brilliant. Zarr seems to aim at similar use cases, so I should definitely have a closer look. Darr really is much more modest than Zarr or PyTables. In essence it is not much more than a way to save flat binary data with a separate description of how to read it for an audience that is as wide as possible. I wrote it because I need it for my own work work, in cases where I want to quickly share data with colleagues or students who do not use Python, but R or Matlab or something else. A npy file would already be too complex. Darr does not have hierarchies, compression etc etc. I am also interested in long-term archiving of data. My data likely is longer-lived than the tools I use. In my field, especially in electrophysiology, at least some people seem to backpedal from complex formats, at least for simple binary data such as signal recordings. I would be happy to work together in whatever way is useful. An obvious start could be to create the possibility to save a darr array as a zarr array and the other way around. For the array data this should be very simple, but attribute/metadata need attention. Since darr is very new and no one except our lab is using it yet, I am not sure how useful it is for zarr to save as darr array. Potentially zarr could use darr, or code from darr, for writing the descriptive readme file on how to read the binary data without zarr. But only if arrays are not compressed of course. |
Thank you Gabriel, it's great to have this background and your perspective
on what's important for you and your colleagues. The idea of generating
some documentation on how to read the data in multiple languages is very
neat. We haven't given much if any thought yet to long term preservation,
but it's an important consideration.
FWIW in designing zarr we have made some efforts to keep everything as
simple as possible, so the barrier to implementation stays low. We wrote a
spec of the storage format -
https://zarr.readthedocs.io/en/stable/spec/v2.html - and there are fairly
complete implementations in several languages now. Even if some or all of
those implementations rot, it should still be relatively straightforward to
implement enough of the spec to get data out in future. But we wanted to
have some features of HDF5 particularly compression and hierarchies, so the
data are not as simple or portable as darr.
I'm any case it is great to make a connection between our projects, and
would be cool to explore ways of moving data between formats and improving
portability and self-describing properties of zarr.
…On Wed, 31 Oct 2018, 05:21 Gabriel Beckers ***@***.*** wrote:
Darr developer here. Thanks for you interest and reaching out. I am very
much interested in Zarr although I must confess to my shame that I haven't
actually tried it out yet. Will certainly do so very soon. I have used
pytables for much of my work in behavioral neuroscience, for over 10 year,
and still use it. It is brilliant. Zarr seems to aim at similar use cases,
so I should definitely have a closer look.
Darr really is *much* more modest than Zarr or PyTables. In essence it is
not much more than a way to save flat binary data with a separate
description of how to read it for an audience that is as wide as possible.
I wrote it because I need it for my own work work, in cases where I want to
quickly share data with colleagues or students who do not use Python, but R
or Matlab or something else. A npy file would already be too complex. Darr
does not have hierarchies, compression etc etc.
I am also interested in long-term archiving of data. My data likely is
longer-lived than the tools I use. In my field, especially in
electrophysiology, at least some people seem to backpedal from complex
formats, at least for simple binary data such as signal recordings.
I would be happy to work together in whatever way is useful. An obvious
start could be to create the possibility to save a darr array as a zarr
array and the other way around. For the array data this should be very
simple, but attribute/metadata need attention. Since darr is very new and
no one except our lab is using it yet, I am not sure how useful it is for
zarr to save as darr array. Potentially zarr could use darr, or code from
darr, for writing the descriptive readme file on how to read the binary
data without zarr. But only if arrays are not compressed of course.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/zarr-developers/zarr/issues/320#issuecomment-434634222>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qr0LsoV_TaWq7LTVvhKtby46vP0Rks5uqXmQgaJpZM4YCWe9>
.
|
OK, thanks. I had a closer look at Zarr. Very impressed. I am going to try this out in some actual analyses. In the mean time I started writing some code to read and write zarr arrays in darr, which turned indeed out to be very simple. It is easy to go from one to the other, also very large arrays pose no memory problems because zarr reads in chunks. Nice! |
Sorry, linked to wrong ticket, it should have been 325 (fixed it now), please can someone remove the above ? |
Hi @eddienko. I don't know of a way to remove a reference, but thanks for letting us know. |
Ran across @gbeckers's Darr library recently, which seems very similar in intent and behavior to Zarr. Would be great to compare and contrast Darr and Zarr to see where the two implementations can learn from each other and possibly work together.
The text was updated successfully, but these errors were encountered: