-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write manifests to zarr store #45
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #45 +/- ##
==========================================
+ Coverage 90.18% 90.72% +0.54%
==========================================
Files 14 16 +2
Lines 998 1067 +69
==========================================
+ Hits 900 968 +68
- Misses 98 99 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool @TomNicholas -- This will be very helpful in getting a target for the zarr-python work to come.
I realized that I could also add a reader for this type of store, which would create a ManifestArray-backed dataset from a chunk-manifest-ZEP-compliant store. You could then use VirtualiZarr to combine the chunks from multiple such stores. |
This PR now also adds the ability to open a zarr v3 store with all arrays as |
return json.JSONEncoder.default(self, o) | ||
|
||
|
||
def json_dumps(o: Any) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to vendor this because I didn't want to import internals of the zarr-python library while it's in flux, and also this helps make it clear exactly which parts of this package even need zarr-python
at all.
if filetype == "zarr_v3": | ||
# TODO is there a neat way of auto-detecting this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly - I want to automatically distinguish between non-zarr, zarr v2 (both to be read using kerchunk) and zarr v3 (to be read using this code). I guess I will just have to search for .zgroup
/zarr.json
files explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norlandrhagen do you have any thoughts on a neat way to handle this?
virtualizarr/xarray.py
Outdated
|
||
# TODO recursive glob to create a datatree | ||
vars = {} | ||
for array_dir in _storepath.glob("*/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this is going awry in the CI, but working as intended locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that when run locally (on MacOS), a pathlib.Path.glob("*/")
call only returns directories (as the pathlib docs say it will), but for some reason when run in this CI the glob will include files too. I've hacked around this by excluding any paths for which .is_file()
is True.
I'm just going to merge this as we can always change it later. |
Closes #6
This shows how we could write an xarray Dataset containing
ManifestArray
objects to disk as a new zarr store, where each array's chunk data is written as byte range references in the form of amanifest.json
file.This therefore creates an example of the type of zarr store described in zarr-developers/zarr-specs#287 by @jhamman.
(Currently this writes using the V2 spec, which I guess is not correct, but you get the idea.)