Write manifests to zarr store #45

TomNicholas · 2024-03-22T15:50:32Z

Closes #6

This shows how we could write an xarray Dataset containing ManifestArray objects to disk as a new zarr store, where each array's chunk data is written as byte range references in the form of a manifest.json file.

This therefore creates an example of the type of zarr store described in zarr-developers/zarr-specs#287 by @jhamman.

(Currently this writes using the V2 spec, which I guess is not correct, but you get the idea.)

codecov · 2024-03-22T15:53:12Z

Codecov Report

Attention: Patch coverage is 91.33858% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 90.72%. Comparing base (f226093) to head (98a259e).

❗ Current head 98a259e differs from pull request most recent head 6ba41de. Consider uploading reports for the commit 6ba41de to get more accurate results

Files	Patch %	Lines
virtualizarr/vendor/zarr/utils.py	58.33%	5 Missing ⚠️
virtualizarr/xarray.py	90.90%	3 Missing ⚠️
virtualizarr/zarr.py	95.16%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   90.18%   90.72%   +0.54%     
==========================================
  Files          14       16       +2     
  Lines         998     1067      +69     
==========================================
+ Hits          900      968      +68     
- Misses         98       99       +1

Flag	Coverage Δ
unittests	`90.72% <91.33%> (+0.54%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jhamman

Very cool @TomNicholas -- This will be very helpful in getting a target for the zarr-python work to come.

xref: zarr-developers/zarr-python#1718

virtualizarr/zarr.py

TomNicholas · 2024-03-25T16:32:50Z

I realized that I could also add a reader for this type of store, which would create a ManifestArray-backed dataset from a chunk-manifest-ZEP-compliant store. You could then use VirtualiZarr to combine the chunks from multiple such stores.

TomNicholas · 2024-03-31T02:45:34Z

This PR now also adds the ability to open a zarr v3 store with all arrays as manifest.json files via open_virtual_dataset(storepath, filetype="zarr_v3"). I did this partly so that then I can test via roundtripping.

TomNicholas · 2024-03-31T02:46:39Z

virtualizarr/vendor/zarr/utils.py

+        return json.JSONEncoder.default(self, o)
+
+
+def json_dumps(o: Any) -> bytes:


I chose to vendor this because I didn't want to import internals of the zarr-python library while it's in flux, and also this helps make it clear exactly which parts of this package even need zarr-python at all.

TomNicholas · 2024-03-31T02:48:17Z

virtualizarr/xarray.py

+    if filetype == "zarr_v3":
+        # TODO is there a neat way of auto-detecting this?


This is a bit ugly - I want to automatically distinguish between non-zarr, zarr v2 (both to be read using kerchunk) and zarr v3 (to be read using this code). I guess I will just have to search for .zgroup/zarr.json files explicitly?

@norlandrhagen do you have any thoughts on a neat way to handle this?

TomNicholas · 2024-04-02T12:59:22Z

virtualizarr/xarray.py

+
+    # TODO recursive glob to create a datatree
+    vars = {}
+    for array_dir in _storepath.glob("*/"):


Somehow this is going awry in the CI, but working as intended locally

It seems that when run locally (on MacOS), a pathlib.Path.glob("*/") call only returns directories (as the pathlib docs say it will), but for some reason when run in this CI the glob will include files too. I've hacked around this by excluding any paths for which .is_file() is True.

TomNicholas · 2024-05-01T21:04:06Z

I'm just going to merge this as we can always change it later.

TomNicholas added 3 commits March 22, 2024 11:08

basic structure for writing a zarr store containing manifests

386844a

write in nicely indented form

02e457e

use pathlib for everything

43872ab

TomNicholas added zarr-python Relevant to zarr-python upstream zarr-specs Requires adoption of a new ZEP labels Mar 22, 2024

jhamman reviewed Mar 22, 2024

View reviewed changes

virtualizarr/zarr.py Outdated Show resolved Hide resolved

virtualizarr/zarr.py Outdated Show resolved Hide resolved

virtualizarr/zarr.py Outdated Show resolved Hide resolved

TomNicholas mentioned this pull request Mar 25, 2024

Write zarr stores with manifest.json files to non-local storage #46

Open

TomNicholas mentioned this pull request Mar 25, 2024

Rename open_dataset_via_kerchunk to open_virtual_dataset #47

Merged

TomNicholas added 2 commits March 27, 2024 14:18

Merge branch 'main' into to_zarr

f16911b

documentation

c319d30

TomNicholas mentioned this pull request Mar 28, 2024

Virtual datasets from Zarr stores #63

Open

TomNicholas added 8 commits March 28, 2024 22:07

docstrings

8f0ee51

Merge branch 'main' into to_zarr

fc4cb84

vendor zarr.utils.json_dumps

c8add61

remove consolidated metadata, as v3 doesn't have this yet

42e17d1

license for vendoring part of zarr-python

23772b9

change to write v3

4f2655f

implement reading from v3-compliant stores

b4c38fe

roundtripping test

79f39e1

TomNicholas requested a review from jhamman March 31, 2024 02:45

TomNicholas commented Mar 31, 2024

View reviewed changes

forgot to add the file with the test

98a259e

TomNicholas commented Apr 2, 2024

View reviewed changes

TomNicholas added 3 commits April 9, 2024 17:46

Merge branch 'main' into to_zarr

c3d88d5

test dataset-level attributes

973eccd

debugging print

0151652

try explicitly separating files from directories

6ba41de

TomNicholas mentioned this pull request Apr 10, 2024

Generating references without kerchunk #78

Open

TomNicholas merged commit e014bb7 into main May 1, 2024
5 checks passed

TomNicholas deleted the to_zarr branch May 1, 2024 21:04

TomNicholas mentioned this pull request May 1, 2024

"Inlining" data when writing references to disk #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write manifests to zarr store #45

Write manifests to zarr store #45

TomNicholas commented Mar 22, 2024 •

edited

Loading

codecov bot commented Mar 22, 2024 •

edited

Loading

jhamman left a comment

TomNicholas commented Mar 25, 2024 •

edited

Loading

TomNicholas commented Mar 31, 2024

TomNicholas Mar 31, 2024 •

edited

Loading

TomNicholas Mar 31, 2024 •

edited

Loading

TomNicholas Apr 9, 2024

TomNicholas Apr 2, 2024

TomNicholas Apr 9, 2024

TomNicholas commented May 1, 2024

		return json.JSONEncoder.default(self, o)


		def json_dumps(o: Any) -> bytes:

		if filetype == "zarr_v3":
		# TODO is there a neat way of auto-detecting this?

Write manifests to zarr store #45

Write manifests to zarr store #45

Conversation

TomNicholas commented Mar 22, 2024 • edited Loading

codecov bot commented Mar 22, 2024 • edited Loading

Codecov Report

jhamman left a comment

Choose a reason for hiding this comment

TomNicholas commented Mar 25, 2024 • edited Loading

TomNicholas commented Mar 31, 2024

TomNicholas Mar 31, 2024 • edited Loading

Choose a reason for hiding this comment

TomNicholas Mar 31, 2024 • edited Loading

Choose a reason for hiding this comment

TomNicholas Apr 9, 2024

Choose a reason for hiding this comment

TomNicholas Apr 2, 2024

Choose a reason for hiding this comment

TomNicholas Apr 9, 2024

Choose a reason for hiding this comment

TomNicholas commented May 1, 2024

TomNicholas commented Mar 22, 2024 •

edited

Loading

codecov bot commented Mar 22, 2024 •

edited

Loading

TomNicholas commented Mar 25, 2024 •

edited

Loading

TomNicholas Mar 31, 2024 •

edited

Loading

TomNicholas Mar 31, 2024 •

edited

Loading