Sharding array chunks across hashed sub-directories #115

shoyer · 2021-05-09T21:12:13Z

Consider the case where we want to concurrently store and read many array chunks (e.g., millions). This is inherently pretty reasonable with many distributed storage systems, but not with Zarr's default keys for chunks of the form {array_name}/{i}.{j}.{k}:

Distributed filesystems like Lustre recommend against storing more than thousands of files in a single directory: https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html
AWS S3 suggests splitting requests across multiple "prefixes" (aka directories): https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
Google Cloud Storage recommends avoiding sequential file names: https://cloud.google.com/storage/docs/request-rate

If we store a 10 TB array as a million 10 MB files in a single directory with sequential names, it would violate all of these guidelines!

It seems like a better strategy would be to store array chunks (and possibly other Zarr keys) across multiple sub-directories, where the sub-directory name is some apparently random but deterministic function of the keys, e.g., of the form {array_name}/{hash_value}/{i}.{j}.{k} or {hash_value}/{array_name}/{i}.{j}.{k}, where hash_value is produced by applying any reasonable hash function to the original key {array_name}/{i}.{j}.{k}.

The right number of hash buckets would depend on the performance characteristics of the underlying storage system. But the key feature is that the random prefixes/directory names make it easier to shard load, and avoid the typical performance bottleneck of reading/writing a bunch of nearby keys at the same time.

Ideally the specific hashing function / naming scheme (including the number of buckets) would be stored as part of the Zarr metadata in a standard way, so as to facilitate reading/writing data with different implementations.

Any thoughts? Has anyone considered these sort of solutions, or encountered these scaling challenges in the wild? I don't quite have a strong need for this feature yet, but I imagine I may soon.

The text was updated successfully, but these errors were encountered:

joshmoore · 2021-05-13T09:12:29Z

See the extended conversation starting at https://gitter.im/zarr-developers/community?at=601d471fc83ec358be27944f

Quick summary: @d-v-b points out that at least for S3 the nested storage strategy that has been put in place suffices to achieve the "5500 GET requests per second per prefix in a bucket" described under https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html

i.e. my hope with dimension_separator (zarr-developers/zarr-python#715) was certainly to achieve just that (on S3), but I would definitely not be surprised to learn that that doesn't hold for all storage backends.

shoyer · 2021-05-13T16:25:21Z

@joshmoore thanks for the pointers!

I agree, nested storage solves many of these issues. In fact, we have already been using it in some cases to solve exactly this problem.

My main concern is that it only works well if your arrays have many chunks along multiple dimensions. But it's not uncommon to have most or all your chunks along a single dimension, e.g., a bunch of images stacked only along the "time" dimension. In these cases, you would still end up with either a very large number of sequential sub-directories or filenames, depending on dimension order.

Hashing seems like a more comprehensive fix.

shoyer · 2021-06-02T01:18:05Z

As a point of reference, it looks like Neuroglancer & TensorStore align chunk file-names via a "compressed morton code":
https://github.com/google/neuroglancer/blob/v2.22/src/neuroglancer/datasource/precomputed/volume.md#sharded-chunk-storage

jakirkham · 2021-12-15T21:09:35Z

This has some similarities with the proposal in issue ( #82 )

shoyer changed the title ~~Sharding array chunks across many directories~~ Sharding array chunks across hashed sub-directories May 9, 2021

shoyer mentioned this issue Nov 19, 2021

Add Sharding Support zarr-developers/zarr-python#877

Closed

jstriebel mentioned this issue Feb 7, 2022

Allowing to add / Adding a sharding spec #127

Closed

jstriebel added the protocol-extension Protocol extension related issue label Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding array chunks across hashed sub-directories #115

Sharding array chunks across hashed sub-directories #115

shoyer commented May 9, 2021 •

edited

Loading

joshmoore commented May 13, 2021

shoyer commented May 13, 2021

shoyer commented Jun 2, 2021

jakirkham commented Dec 15, 2021

Sharding array chunks across hashed sub-directories #115

Sharding array chunks across hashed sub-directories #115

Comments

shoyer commented May 9, 2021 • edited Loading

joshmoore commented May 13, 2021

shoyer commented May 13, 2021

shoyer commented Jun 2, 2021

jakirkham commented Dec 15, 2021

shoyer commented May 9, 2021 •

edited

Loading