Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring stores into a separate library #414

Open
jakirkham opened this issue Mar 1, 2019 · 8 comments
Open

Refactoring stores into a separate library #414

jakirkham opened this issue Mar 1, 2019 · 8 comments

Comments

@jakirkham
Copy link
Member

Over time the Zarr library accumulated more compressors and even some Cython code related to that effort until it made sense for these to go out and live as their own library, Numcodecs. Similarly we seem to be accumulating many stores, perhaps they would be best suited in another library that Zarr depends on.

The stores themselves actually don't really need to know about compression, Groups, or Arrays (with the exception of some layers on top of the stores). Really stores are just key-value stores that allow associating some blob with some key. As such it may be possible for this library to have fewer requirements (if any) and be accessible to a larger group of people that are just interested in things that provide the MutableMapping interface to their storage format of choice.

By breaking out the stores as a separate library, we gain another benefit. Namely we can release the stores at a different time scale from Zarr itself. So if we get a new store or a bug fix for a store, we can do a new release even while Zarr itself has not changed much.

There are probably more reasons to consider this move. Would be curious to hear what others think about this?

@jakirkham
Copy link
Member Author

Thoughts @zarr-developers/core-devs?

@martindurant
Copy link
Member

https://github.com/martindurant/filesystem_spec/blob/master/fsspec/mapping.py ? :) Yes, I know, not all stores are file-systems, but I really have been trying to make the implementation more general.

@rabernat
Copy link
Contributor

rabernat commented Mar 4, 2019

I really like this idea.

@jakirkham
Copy link
Member Author

jakirkham commented Mar 23, 2019

Maybe @mbr (author of simplekv) would be interested in collaborating? 😉

@mbr
Copy link

mbr commented Mar 28, 2019

@jakirkham Well, hopefully simplekv does what you want already? =)

@alimanfoo
Copy link
Member

Hi @mbr, simplekv does look nice, and there's clearly a lot of duplication of functionality between that and what we've implemented in the zarr storage module.

There are two technical differences between simplekv's API and the zarr store API that I can see.

The first is that zarr uses the MutableMapping interface for getting and setting key/value pairs (i.e., __getitem__ and __setitem__) whereas simplekv uses get() and put() methods. That's a very small difference in naming only.

The second is that zarr stores supports some optional methods, including some which are "hierarchy-aware", i.e., when using keys that include a '/' character and wanting to list keys at a given level, akin to listing the contents of a directory if using a file system. Here's what we say about these optional methods in the docs:

In addition to the MutableMapping interface, store classes may also implement optional methods listdir (list members of a “directory”) and rmdir (remove all members of a “directory”). These methods should be implemented if the store class is aware of the hierarchical organisation of resources within the store and can provide efficient implementations. If these methods are not available, Zarr will fall back to slower implementations that work via the MutableMapping interface. Store classes may also optionally implement a rename method (rename all members under a given path) and a getsize method (return the size in bytes of a given value).

@jakirkham
Copy link
Member Author

The first is that zarr uses the MutableMapping interface for getting and setting key/value pairs (i.e., __getitem__ and __setitem__) whereas simplekv uses get() and put() methods. That's a very small difference in naming only.

It's worth pointing out this really helps as we can leverage other methods of MutableMapping like update or clear and they just work off of the handful of methods we define. These are pretty handy when copying or removing data.

@mbr
Copy link

mbr commented Apr 1, 2019

The second is that zarr stores supports some optional methods, including some which are "hierarchy-aware", i.e., when using keys that include a '/' character and wanting to list keys at a given level, akin to listing the contents of a directory if using a file system.

So far, I have deliberately omittied these kinds of hierarchies; / characters are also not allowed in keys; this is to keep things are portable and simple as possible. There is support for rolling this yourself in a way through the keys method which takes a prefix argument for this reason.

It's worth pointing out this really helps as we can leverage other methods of MutableMapping like update or clear and they just work off of the handful of methods we define. These are pretty handy when copying or removing data.

I believe that MutableMapping simply predates the development of simplekv, I am not against adding them. Some, like len, might be hard to implement for all backends though and I am not sure if it's okay to just raise something like NotImplemented there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants