Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N5FSStore #793

Merged
merged 45 commits into from Sep 19, 2021
Merged

N5FSStore #793

merged 45 commits into from Sep 19, 2021

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 6, 2021

An fsstore-backed implementation of n5

There's a lot of ugly duplicated logic in n5.py. It might be worth some time cleaning that up.

cc @joshmoore, in case you want to test dimension_separator stuff against this branch

TODO:

  • [] Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

joshmoore and others added 12 commits June 14, 2021 13:33
Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.
@pep8speaks
Copy link

pep8speaks commented Jul 6, 2021

Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-19 14:01:04 UTC

@codecov
Copy link

codecov bot commented Jul 6, 2021

Codecov Report

Merging #793 (aa4a723) into master (7ec8c64) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master     #793    +/-   ##
========================================
  Coverage   99.94%   99.94%            
========================================
  Files          31       31            
  Lines       10681    10936   +255     
========================================
+ Hits        10675    10930   +255     
  Misses          6        6            
Impacted Files Coverage Δ
zarr/__init__.py 100.00% <100.00%> (ø)
zarr/n5.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <100.00%> (ø)
zarr/tests/test_core.py 100.00% <100.00%> (ø)
zarr/tests/test_storage.py 100.00% <100.00%> (ø)
zarr/creation.py 100.00% <0.00%> (ø)
zarr/tests/test_convenience.py 100.00% <0.00%> (ø)

@joshmoore
Copy link
Member

joshmoore commented Aug 17, 2021

Note that #773 is still passing against this branch. Any thoughts on moving forward here, @d-v-b, re: #769 (comment)?

(Personally, I'd be in favor of prioritizing a fix for the state of dimension_separator over code coverage if need be)

@d-v-b
Copy link
Contributor Author

d-v-b commented Aug 17, 2021

@joshmoore please try again, I discovered that this branch was missing N5FSStore tests in test_core, which were failing until I made some changes. I'm pretty sure anything that changes the chunk keys on the array side is going to break what I have here.

@joshmoore
Copy link
Member

Good morning, @d-v-b. The merged branches passed for me locally. I've re-activated #794 for fuller testing.

@@ -1207,7 +1207,7 @@ def test_chunk_nesting(self):
assert 'foo/10.20.30' in store
assert b'yyy' == store['foo/10.20.30']
# N5 reverses axis order
# assert b'yyy' == store['foo/30/20/10']
assert b'yyy' == store['foo/30/20/10']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #773 (review), I have a heard time understanding how it's possible to support the requirement that multiple keys be supported for the same item.

This may be part and parcel of the conversation we're having in gitter: https://gitter.im/zarr-developers/community?at=611d562e933ee46b7564044c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: ah, now I see that they are re-ordered not just .// swapped, but I think my same concern holds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the concern -- it looks like this behavior is expected from the NestedDirectoryStore: https://github.com/zarr-developers/zarr-python/blob/master/zarr/tests/test_storage.py#L1148

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my PR I resurrected this. I at least understand know why it's supported, though I think it's a hackdoor of sorts.

@joshmoore
Copy link
Member

Sounds good. If you want to test a reversion of the hexdigests, you can pin numcodecs to <0.9. I'm unsure of whether or not to try fixing the NestedDirectoryStore issue. Nothing is failing at the moment so I'm tempted to leave it as in #773.

@d-v-b d-v-b changed the title [WIP] N5FSStore N5FSStore Sep 16, 2021
@joshmoore joshmoore force-pushed the n5fsstore branch 2 times, most recently from b8f12a8 to 5fcf68d Compare September 17, 2021 07:16
 * Adds a test for the dimension_separator warning
 * uses the parent test_complex for listdir
 * "nocover" the import error since fsspec is ever present
@joshmoore
Copy link
Member

@d-v-b: I pushed a commit trying to get codecov to 100%. I'm still having difficulty getting the likes of:

elif is_chunk_key(key):
    key = self._swap_separator(key)
``
to trigger if you have any ideas.

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 17, 2021

@joshmoore I copied a TestN5Store method over into TestN5FSStore, which seems to have done the trick. An alternative to copying code would be to make TestN5FSStore inherit from TestFSStore and TestN5Store, but I'm nervous about using multiple inheritance (and when I ran tests with multiple inheritance, I didn't uncover any surprising failures).

@joshmoore
Copy link
Member

Sounds good. Three Friday evening points:

  • @zarr-developers/core-devs : any comments?
  • I'm still a tiny bit nervous about having two versions but I guess deprecation might eventually be our friend.
  • What version do you propose? patch or minor?

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 17, 2021

I'm still a tiny bit nervous about having two versions but I guess deprecation might eventually be our friend.

By "two versions" do you mean the N5Store and N5FSStore? Given that a) n5 is relatively niche, and this was flagged as experimental, and b) fsspec isn't a super heavy dependency, I agree that a quick deprecation path looks nice.

Version-wise I could think this is minor? since we are adding to the public API.

zarr/storage.py Outdated
@@ -1065,22 +1065,28 @@ class FSStore(MutableMapping):
Separator placed between the dimensions of a chunk.
storage_options : passed to the fsspec implementation
"""
array_meta_key = array_meta_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed these. Would it make sense to have these prefixed with _. Do you really mean these to be public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! I'm fine with making these private.

d-v-b and others added 3 commits September 17, 2021 19:06
FSStore only throws ModuleNotFoundError on initialization
rather than on import. Therefore N5FSStore does the same.
If this *weren't* the case, then the import in zarr/init
would need to test the import as well, which isn't the case.
@@ -9,7 +9,7 @@
zeros_like)
from zarr.errors import CopyError, MetadataError
from zarr.hierarchy import Group, group, open_group
from zarr.n5 import N5Store
from zarr.n5 import N5Store, N5FSStore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerned me that it would need protecting by try/except block. In testing it, I realized FSStore only throws on __init__ and therefore N5FSStore could be less conservative. I've pushed:

aa4a723

protocol, _ = fsspec.core.split_protocol(url)
# set auto_mkdir to True for local file system
if protocol in (None, "file") and not storage_options.get("auto_mkdir"):
storage_options["auto_mkdir"] = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @d-v-b! Pushing forward with a minor release.

@joshmoore joshmoore merged commit dca87fc into zarr-developers:master Sep 19, 2021
@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 19, 2021

Thanks for your help @joshmoore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants