Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle fsspec.FSMap using FSStore store #1304

Merged
merged 2 commits into from Dec 22, 2022

Conversation

ravwojdyla
Copy link
Contributor

@ravwojdyla ravwojdyla commented Dec 20, 2022

Re: #1296

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Dec 20, 2022
@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Merging #1304 (3a9fab9) into main (482dfe5) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #1304   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           35        35           
  Lines        14130     14148   +18     
=========================================
+ Hits         14130     14148   +18     
Impacted Files Coverage Δ
zarr/_storage/v3.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <100.00%> (ø)
zarr/tests/test_storage.py 100.00% <100.00%> (ø)
zarr/tests/test_storage_v3.py 100.00% <100.00%> (ø)

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle this can be OK, but please see my major comment.

I wonder why this is necessary? Currently the FSMap will be wrapped by KVStore and everything works fine.

check=store.check,
create=store.create,
missing_exceptions=store.missing_exceptions,
**(storage_options or {}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we switch from the already configured Map and internal FS instance we were given to making new instances with these storage_options, which will in general be different. This is missed in the test, which uses the local filesystem and needs no kwargs.

(same for the other copy of this call below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we switch from the already configured Map and internal FS instance we were given to making new instances with these storage_options, which will in general be different. This is missed in the test, which uses the local filesystem and needs no kwargs.

@martindurant what's your recommendation?

@ravwojdyla
Copy link
Contributor Author

I wonder why this is necessary? Currently the FSMap will be wrapped by KVStore and everything works fine.

@martindurant see #1296. afaiu an alternative to this could be to add set/get items to KVStore.

@martindurant
Copy link
Member

Ah I see, so previous store.getitems existed, but not when wrapped. So doing what this PR does is worthwhile, but my comment still needs to be solved. Can we have FSStore accept an FSMap instance as an alternative to creating one?

@ravwojdyla
Copy link
Contributor Author

Can we have FSStore accept an FSMap instance as an alternative to creating one?

@martindurant I don't have a strong opinion about this, but want to point out that right now FSStore has a single required argument url of type str. There's optional fs argument. If we add optional fs_map argument, that fs_map will already include an "url" (root), so it can be somewhat confusing API (why would the user have an option to pass both url and fs_map, and what happens when there's discrepancies between these two). wdyt?

@martindurant
Copy link
Member

OK, so we should pass (mapper.root, fs=mapper.fs), right?

@martindurant
Copy link
Member

I mean the point is, not to pass the storage_options. Maybe it makes no difference, so sorry if I am just confusing things!

@martindurant
Copy link
Member

@ravwojdyla
Copy link
Contributor Author

ravwojdyla commented Dec 20, 2022

@martindurant sorry, that link above doesn't resolve for me, you probably mean this:

zarr-python/zarr/storage.py

Lines 1329 to 1331 in 1af77b6

else:
if storage_options:
raise ValueError("Cannot specify both fs and storage_options")

So it's not possible to pass both fs and storage_options ^

@martindurant
Copy link
Member

Yes that's the one +1

@ravwojdyla
Copy link
Contributor Author

Actually just noticed that need to update this:

image

Should be FSStoreV3 ^

@rabernat
Copy link
Contributor

The use case here is that many users (e.g. @ravwojdyla, others in the Pangeosphere) are used to constructing stores via fsspec.get_mapper() -> FSMap and have existing production code that does this; but the FSMap feature set has diverged from FSStore and is therefore less performant.

I support this PR in general because it seems like a lightweight way of upgrading an FSSMap to an FSStore.

An alternative would be to bring FSMap into feature parity with FSStore.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @ravwojdyla! I see no major issues, just one small suggestion regarding coverage.

check=store.check,
create=store.create,
missing_exceptions=store.missing_exceptions,
**(storage_options or {}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually fine to pass storage_options through like this. If it is not empty, it will trigger an error via

zarr-python/zarr/storage.py

Lines 1329 to 1331 in 1af77b6

else:
if storage_options:
raise ValueError("Cannot specify both fs and storage_options")

This is the desired behavior. Storage options will already be configured on the FSMap, so specifying them twice would not make sense.

zarr/storage.py Outdated Show resolved Hide resolved
zarr/storage.py Show resolved Hide resolved
@joshmoore
Copy link
Member

@martindurant, are all your concerns addressed?

From my side, I do start to wonder just looking at the changes needed if fsspec isn't starting to become more than an optional dependency.

@martindurant
Copy link
Member

To be sure, fsspec is a very small dep and one that has none of its own. I cannot make that decision, however.

Yes, I am happy otherwise.

@rabernat
Copy link
Contributor

👍 to making fsspec a required dependency. However, let's address that in a separate PR. This is @ravwojdyla's first contribution to Zarr, so let's keep it simple.

@ravwojdyla - I invite you to give yourself appropriate credit for this bug fix in docs/release.rst. Once that is done we can merge.

@rabernat rabernat mentioned this pull request Dec 21, 2022
5 tasks
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Dec 21, 2022
@ravwojdyla
Copy link
Contributor Author

@rabernat added the release note in 3a9fab9

@rabernat rabernat merged commit 4e633ad into zarr-developers:main Dec 22, 2022
@joshmoore
Copy link
Member

🎉 Thanks, @ravwojdyla 🙏🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants