-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVM Zarr implementation? #15
Comments
In #285 there was mention of n5, which has a java and rust implementations, maybe more. |
N5 is basically this. The specs differ a bit in minor ways. Convergence would be good to have. Some relevant discussion in issue ( https://github.com/zarr-developers/zarr/issues/231 ). |
JVM implementation of Zarr would be very cool, particularly if it had the same flexibility as the Python implementation to plug in different storage back-ends including cloud object stores. |
Thanks for all the pointers! I've looked a bit at n5 and z5; a couple questions:
|
gcsfs's FUSE module does allow this, and there are other FUSE solutions out there too. The implementation is not at all performance compared to zarr. In addition, https://github.com/ContinuumIO/intake-xarray will shortly allow streaming of any xarray dataset, including hdf, from a server; again, there are other solutions that do something similar. |
z5 acts as a C++ and a python implementation for both zarr and N5
No, it's purely targeted at the file system format for both zarr and n5 as far as I know. @ryan-williams there is already a bit of an ecosystem (albeit one tightly constrained to one institute...) rapidly evolving around the java N5 implementation, including a high-performance 3D data viewer, some image registration tools, and a volumetric image annotation suite. The java N5 already supports a number of backends, including the N5 filesystem format, HDF5, google cloud, and AWS (take a look here). It might make sense for a JVM implementation of the zarr file system format to take the form of an N5 backend (initially, at least) - that would potentially give all of those other tools access to zarr datasets for free, as well as saving you writing some of the higher-level boilerplate. That's if you're happy with the API, of course. My feeling is that zarr has more momentum behind it and will have more impact in the future. Convergence would be great, but if the N5 tool ecosystem could get access to zarr file system arrays for free, that could also solve the problem. |
I'd be really happy if Zarr and N5 converged on the same spec. It would make it much easier for people in this problem domain to collaborate more effectively on many other common challenges. |
checking in here after a long gap! I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here. Some notes:
Looking forward to sharing more info on this shortly! |
Very exciting!
…On Fri, 28 Sep 2018, 05:21 Ryan Williams, ***@***.***> wrote:
checking in here after a long gap!
I'm far along with a Zarr implementation in Scala, which will address the
"JVM implementation" request here.
Some notes:
- it's in a branch that I am aggressively cleaning up atm; I'll send a
link by Monday, but wanted to just mention now since other relevant
discussions are ongoing.
- as one concrete use: I can directly convert HDF5 files to Zarr in
"the cloud"
- currently: S3 or GCS (via Java NIO APIs; ABS doesn't have an NIO
impl yet
<https://github.com/Azure/azure-storage-java/issues/305#issuecomment-391806835>
)
- AFAIK that's not otherwise possible today:
- h5py can't do direct cloud IO
<h5py/h5py#925>
- various FUSE-based workarounds are brittle
<fsspec/gcsfs#107> or missing features
<GoogleCloudPlatform/gcsfuse#286>.
- @tomwhite <https://github.com/tomwhite> added an NIO read-path
<https://github.com/tomwhite/hdf5-java-cloud/blob/master/src/main/java/com/tom_e_white/hdf5_java_cloud/NioReadOnlyRandomAccessFile.java>
to the netCDF Java lib <https://github.com/Unidata/thredds>, and
that's what I use, along with my JVM Zarr impl, to do the conversion
- Incidentally, this Scala implementation will also provide a
javascript implementation "for free", via scala.js
<https://www.scala-js.org/>
- I'm hoping to also compile it to native, via scala-native
<https://github.com/scala-native/scala-native>, but that's a at
least another 6mos out (other libraries need to support
scala-native first <typelevel/cats#1549>)
Looking forward to sharing more info this shortly!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/zarr-developers/zarr/issues/286#issuecomment-425316837>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QqiZS87xRLkgJ7qxg5tUl9YblnQSks5ufaPIgaJpZM4Vqj9v>
.
|
Excellent! I would love to build off of this work on the netCDF-Java side to provide an IOSP to Zarr (read Zarr into the Common Data Model). At that point, we could enable the THREDDS Data Server to serve data stored in Zarr :-) Would you be open to that idea, and does the license permit such usage? |
@lesserwhirls yea, it will be Apache-2.0 licensed, happy to have it feed into netCDF things! |
It might be helpful/less painful for everyone if we get the changes made to |
@lesserwhirls, yes I'd be happy to. I'll open an issue/PR to discuss. |
Hi @ryan-williams how is it going?
|
hello! I've been side-tracked, but what I have is here lasersonlab/ndarray.scala. it's pretty "alpha" still, and the issues reasonably capture the things I'm focused on next. I'll be checking back in on this in the coming weeks, and will give some more updates here. |
Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst cc: @SabineEmbacher |
see https://jzarr.readthedocs.io/en/latest/
hugs
Sabine
…--------------------------------------------------------------------------------
Sabine Embacher
Brockmann Consult GmbH
phone: +49 (0)40 69 63 89 - 330
email: sabine.embacher@brockmann-consult.de
skype-id: sabine.embacher.bc
--------------------------------------------------------------------------------
Brockmann Consult GmbH
Chrysanderstr. 1
D-21029 Hamburg, Germany
Amtsgericht Hamburg HRB 157689
Geschäftsführer Dr. Carsten Brockmann
Web: www.brockmann-consult.de
Twitter: @BrockmannCon
--------------------------------------------------------------------------------
Am 30.03.2020 um 14:37 schrieb Josh Moore:
Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst
cc: @SabineEmbacher <https://github.com/SabineEmbacher>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABTBCETA3MRAHO5WJH76ZTRKCHCDANCNFSM4H5MBR2Q>.
|
If you need array objects which behave almost like NumPy arrays you also can wrap the data using ND4J INDArray from deeplearning4j.org. You can find examples in the data writing and reading examples. https://jzarr.readthedocs.io/en/latest/tutorial.html#writing-and-reading-data Or directly in the code example |
Can any of you tell me how to register the jzarr java library to the maven central repository. I've never done this before. Best Regards |
Hi @SabineEmbacher. I don't remember what HOWTO we followed originally for our jars (cc: @sbesson) but https://stackoverflow.com/questions/28846802/how-to-manually-publish-jar-to-maven-central looks reasonable enough. The biggest hurdles I remember are (1) proving that you own your groupId ( |
Following-up on #15 (comment), the process used by OME for releasing some of its Java components to Sonatype is documented here with the relevant links to OSSRH in case it's useful. If possible, big 👍 for having |
alimanfoo commented on 1 Aug 2018
Did you see the example of how to read and write to Amazon AWS S3 cloud storage using JZarr? |
Completely missed this thread but wanted to mention that https://github.com/saalfeldlab/n5-zarr implements https://zarr.readthedocs.io/en/stable/spec/v2.html as an N5 backend since September 2019. This way it is available for array processing with ImgLib2 https://github.com/saalfeldlab/n5-imglib2 which has no size limits and built in memory caching, and is also the native data library for BigDataViewer and a bunch of processing tools that we use and build. n5-zarr includes blosc compression and locking and is included in the standard distribution of https://fiji.sc/. With the N5-API, talking to Zarr, N5, HDF5 is all the same. There is currently no official cloud backend (other than through FS wrappers) for N5-Zarr because we haven't yet separated the interfaces for store and translation layers, i.e. writing a backend for HDF5 or Zarr is entangled with writing a backend for another store (like the AWS and GoogleCloud stores for N5). I remember that there was a fork that copied the n5-aws-s3 logic into n5-zarr as a temporary solution @joshmoore wasn't that you who did this? |
Yup, see saalfeldlab/n5-aws-s3#10 and saalfeldlab/n5-zarr#5 |
Yup. It then got copied into the bdv/mobie code base for @tischi's I2K work. Having a way to unblock all of that would be great. (Note: I only copied-n-pasted the reader side of things. Writing still needs work as far as I know.) |
As with the rust focus during the Feb. 10th meeting, there may be a Java-leaning to the upcoming call this Wednesday if anyone is interested in joining to chat. |
Thanks @joshmoore! I'll be there. Looking forward to seeing you all. |
There isn't one, is there?
I've started making one, will post updates here.
The text was updated successfully, but these errors were encountered: