Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinning down zarr-python #1274

Open
MSanKeys963 opened this issue Nov 22, 2022 · 12 comments
Open

Thinning down zarr-python #1274

MSanKeys963 opened this issue Nov 22, 2022 · 12 comments
Labels
help wanted Issue could use help from someone with familiarity on the topic
Milestone

Comments

@MSanKeys963
Copy link
Member

MSanKeys963 commented Nov 22, 2022

This came up during one of the conversations with @joshmoore. The idea is to extract functionalities not in V2 Spec and their respective tests from zarr-python. E.g. Zarr Specification V2 has the following stores:

  • Directory Store
  • Nested Directory Store
  • Zip file Store
  • Object Store
  • HTTP Store

But in zarr-python, we have additional stores like SQLiteStore, MongoDBStore etc. If we could remove these additional stores from the zarr-python codebase and host them in a separate repo under zarr-developers, it'll help us:

  • Making the code run faster and more efficiently
  • Run fewer tests and faster completion of them
  • Reduce the noise from dependabot
  • Add maintainers for specific stores
  • Focus on a more concise codebase, thus helping us to add additional features when migrating from V2 to V3

If a user/developer wishes to use the removed functionality, they can shoot pip install zarr-sqlitestore or similar.

FYI: A similar repo is in the works for Numcodecs, which is taken care of by @joshmoore and @jakirkham.

Please let me know your thoughts on this. Thanks!

CC: @zarr-developers/python-core-devs

@MSanKeys963 MSanKeys963 added the help wanted Issue could use help from someone with familiarity on the topic label Nov 22, 2022
@jakirkham
Copy link
Member

Yeah we have had similar discussions before

@MSanKeys963
Copy link
Member Author

Thanks for linking the previous discussions, @jakirkham. Some interesting points there.

Also, we have PyData Global 2022 Sprints coming up next week so we're thinking to kick this off over there.

@jakirkham
Copy link
Member

The main thing to consider is currently we are testing these Stores pretty regularly (particularly with the general code here and dependency updates). Would be good to handle this somehow when refactoring (cron jobs on CI, running with all Store extensions installed here, etc.)

@joshmoore
Copy link
Member

Updates to the description:

  • s/eliminate/extract/
  • dropped FSStore from the list for the moment
  • Added benefits:
    • Reduce the noise from dependabot
    • Add maintainers for specific stores

To #1274 (comment), @jakirkham, I'd say with all the work on easing releases, we should really push for faster iterations, and then have dependabot test each of the refactor stores (daily?)

@rabernat
Copy link
Contributor

The main thing to consider is currently we are testing these Stores pretty regularly (particularly with the general code here and dependency updates)

If the interface between the Zarr core and the store layer is clearly and cleanly defined, it should in theory be sufficient to test each store thoroughly. If there are edge cases that only arise when testing a store against the full Zarr test suite, that might indicate leaky abstractions. (Partial chunk decompression comes to mind as a sketchy area.)

In any case, we should probably be setting up more integration testing of the entire stack, including testing against real cloud storage. I would be happy to work on that.

@martindurant
Copy link
Member

we should probably be setting up more integration testing of the entire stack

I think this is far more important than trimming the zarr-python codebase!

Would it be simpler to move the not-so-used storage classes into a contrib/ directory, as a mark that they are not maintained by the core team?

I definitely think we need to keep in all the stores that may be instantiated directly in the main open_* functions (DirStore, FSStore, ConsolidatedStore, ...?).

@jakirkham
Copy link
Member

Yeah we had a similar discussion about refactor contents out within Zarr-Python as different modules and adding code owners ( #764 )

Honestly agree this may be an easier first step to address the issue identified. Once complete this change may be useful for further library refactoring. It may also be easier for a new contributor to try than setting up a new repo

On a different note perhaps we should also think about an entry point mechanism like what was done in Numcodecs ( zarr-developers/numcodecs#300 ). This may make it easier for developers wanting to add their own Store without needing changes in Zarr-Python

@d-v-b
Copy link
Contributor

d-v-b commented Nov 29, 2022

I'd be happy seeing the n5 functionality pulled out into its own package.

@MSanKeys963
Copy link
Member Author

@MSanKeys963
Copy link
Member Author

@jhamman - do you think this should be a part of V3 design doc (#1583) goals?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 4, 2024

see https://github.com/zarr-developers/n5py (private atm) for a start

@jhamman
Copy link
Member

jhamman commented Apr 19, 2024

I've started taking steps to deprecate many of the more exotic stores. See #1756.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue could use help from someone with familiarity on the topic
Projects
Status: In Progress
Development

No branches or pull requests

7 participants