Questions regarding the DirectoryStore design and expected functionality. #66

Carreau · 2020-05-11T22:22:14Z

Hi there,

I'm trying to understand the reasoning, choices and trade-off made when the DirectoryStore on disk layout was created.

As far as I can tell it has been done to be:

relatively simple
cose to the default zarr protocol.
internal meant to be inspected by humans, subgroup in hierarchy accessed independently without opening the root via the zarr lib.

Are these assumption of mine correct ? To what extent could they be changed – for the internal implementation of DirectoryStore for v3 spec assuming the end user API does not change ?

For multiple language implementation of the DirectoryStore v3, I'm supposing we also care about a few other things mainly:

On disk layout should be relatively friendly to machines and many languages.
robust to Key Cases (as the zarr protocol may allow unicode and enforce store to maybe be case sensitive?)
Efficient when possible

There are few other questions that I have not seen mentioned in discussions/spec of the DirectoryStore, mainly whether soft/hard links are allowed, how permission is handled and wether writing over a chunk keep the inodes and seek write or should replace the files.

I believe some of the current constraint on casing and efficiency of current DirectoryStore can be overcome with minimal loss of readability for human exploring the internal of such a datastore.

for Example, we could change the encoding of keys as follow.

in the Zarr Protocol allow arbitrary unicode for keys, or at least relax casing.
For the DS, encode the key as follow.

A key in the DS would HUMAN_PART-MACHINE_PART

the HUMAN_PART would be ascii-restricted, non empty version of the key, mostly informative for the user exploring the filesystem.
MACHINE_PART would be base32 encoded version of the key stripped of trailing =. This would ensure ability to store complex unicode keys without having any issues with casing, or reserved names (like COM on windows, names starting with dashes, dots.. etc.
The "MACHINE_PART" can also ends with d, or g depending on whether a key is a group or a dataset which should limit the number of stats/read when listing a store with large number of groups/datasets.

I'm happy to come up with a more detail description, but don't want to engage into this if I don't understand properly the tradeoff that need to be achieved.

The text was updated successfully, but these errors were encountered:

alimanfoo · 2020-05-12T16:58:10Z

Hi @Carreau,

I'm trying to understand the reasoning, choices and trade-off made when the DirectoryStore on disk layout was created.

As far as I can tell it has been done to be:

relatively simple

cose to the default zarr protocol.

internal meant to be inspected by humans, subgroup in hierarchy accessed independently without opening the root via the zarr lib.

Yes, it makes the data very "hackable" in the sense that they're easy to manipulate using generic tools, like file browsers, cp, rsync, text editors, etc.

There is another consideration which is that the cloud stores (GCS, S3, ABS) use the same mapping of storage keys to object paths. What this means is that data can be created on a local file system using DirectoryStore, then copied to cloud via a utility like gsutil.

Are these assumption of mine correct ? To what extent could they be changed – for the internal implementation of DirectoryStore for v3 spec assuming the end user API does not change?

That is a good question. I'm not sure, needs some careful consideration.

For multiple language implementation of the DirectoryStore v3, I'm supposing we also care about a few other things mainly:

On disk layout should be relatively friendly to machines and many languages.

Ideally, yes.

robust to Key Cases (as the zarr protocol may allow unicode and enforce store to maybe be case sensitive?)

Ideally, yes.

Efficient when possible.

Again, ideally, yes. Although efficiency for cloud object stores is likely a more important consideration than efficiency on a local file system. Much of the current v3 design of storage keys is there to accommodate performance and functionality limitations of cloud storage.

There are few other questions that I have not seen mentioned in discussions/spec of the DirectoryStore, mainly whether soft/hard links are allowed, how permission is handled and wether writing over a chunk keep the inodes and seek write or should replace the files.

I think these are all implementation choices. E.g., the current Python implementation replaces files to avoid files being left in a partially-written state. But that is not mandatory.

I believe some of the current constraint on casing and efficiency of current DirectoryStore can be overcome with minimal loss of readability for human exploring the internal of such a datastore.

for Example, we could change the encoding of keys as follow.

in the Zarr Protocol allow arbitrary unicode for keys, or at least relax casing.

For the DS, encode the key as follow.

A key in the DS would HUMAN_PART-MACHINE_PART

the HUMAN_PART would be ascii-restricted, non empty version of the key, mostly informative for the user exploring the filesystem.

MACHINE_PART would be base32 encoded version of the key stripped of trailing =. This would ensure ability to store complex unicode keys without having any issues with casing, or reserved names (like COM on windows, names starting with dashes, dots.. etc.

I've also been wondering about approaches like this, following your previous suggestions.

This is tricky, and I think this will need some time to work through. But FWIW, my current feeling is that we'll need to offer a choice.

I.e., I think it should be possible to use a natural approach to forming storage keys and translating these into file paths, similar to in the current zarr v2 implementations. This is best for hackability and for moving data between file systems and cloud stores. Lots of people are happy with this approach in v2, and I have only ever heard of one situation where a problem arose because of use on case insensitive file systems.

And I think it should also be possible to define and use a different approaches to forming storage keys such as the approach you suggest, which is less hackable but robust to case insensitive file systems.

Bottom line, this is one of those situations where it's very hard to see all possibilities, and so probably the best we can do is design some flexibility into the protocol to allow people to make choices and explore new options should problems arise or better ideas come along.

Thinking out loud, but I wonder if it would be better to deal with this within the v3 core protocol above the layer of the store interface. In other words, the v3 core protocol could offer a choice of different possible approaches regarding how storage keys are formed. The default would be to form storage keys as currently specified. That section of the spec defines rules for forming storage keys for metadata documents and data chunks, given a node at some hierarchy path P. But different approaches could be defined, either within the core protocol, or via protocol extensions.

alimanfoo · 2020-05-12T17:02:31Z

Thinking out loud, but I wonder if it would be better to deal with this within the v3 core protocol above the layer of the store interface. In other words, the v3 core protocol could offer a choice of different possible approaches regarding how storage keys are formed. The default would be to form storage keys as currently specified. That section of the spec defines rules for forming storage keys for metadata documents and data chunks, given a node at some hierarchy path P. But different approaches could be defined, either within the core protocol, or via protocol extensions.

Note that whether you want to use keys like "data/foo/baz/0.0.0" or "data/foo/baz/0/0/0" (a.k.a. "nested" directory store) would be an example of this kind of configuration choice. I.e., these are two different approaches regarding how to form storage keys for data chunks, and which approach is in use could be declared within the entry point metadata document. I.e., it is a choice above the level of the store interface.

Carreau · 2020-05-13T15:14:10Z

Thanks, I think one of the key point I was missing was the transfer from the filesystem to a cloud environment, and explain a lot.

I'm reticent to have too much flexibility on the store internal data structure, by experience with Jupyter Notebook format that leads to issue across implementation. At least there should be a "canonical" version, so that something can be normalized. "Be conservative in what you send be liberal in what you accept", with what you send be on on disk version. (Though I've head the opposite if good as well to detect misbehaving implementation, but that's mostly network protocol.

Note that whether you want to use keys like "data/foo/baz/0.0.0" or "data/foo/baz/0/0/0" (a.k.a. "nested" directory store)

I'm wondering about efficiency of those, with /0/0/0 it's hard to detect wether you are accessing a group or a chunk.

Thanks for the helpful insight, 'll try to send some updates to the spec v3 draft to that highlight those reasoning.

jstriebel · 2022-11-16T17:01:08Z

@Carreau Does the current extension points of storage transformers in the v3 spec provide the flexibility needed to adapt to different key-formats? If so, I'd close this issue for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding the DirectoryStore design and expected functionality. #66

Questions regarding the DirectoryStore design and expected functionality. #66

Carreau commented May 11, 2020

alimanfoo commented May 12, 2020

alimanfoo commented May 12, 2020

Carreau commented May 13, 2020

jstriebel commented Nov 16, 2022 •

edited

Questions regarding the DirectoryStore design and expected functionality. #66

Questions regarding the DirectoryStore design and expected functionality. #66

Comments

Carreau commented May 11, 2020

alimanfoo commented May 12, 2020

alimanfoo commented May 12, 2020

Carreau commented May 13, 2020

jstriebel commented Nov 16, 2022 • edited

jstriebel commented Nov 16, 2022 •

edited