The Amazon S3 limit on the length of keys #77

DennisHeimbigner · 2020-06-07T21:18:37Z

I noticed that Amazon S3 (and apparently also Google) define a
limit of 1024 bytes for object keys. This limit apparently
applies to the whole key and not, say, segments of the key,
where segment is the name between '/' occurrences.

I know that for atmospheric sciences netcdf-4 datasets, variable
names are used to encode a variety of properties such as dates
and locations. This often results in long variable
names. Additionally, deeply nested groups are used to also
classify sets of variables. Bottom line: it is probable that
such datasets will run up against the 1024 byte limit in the
near future.

So my question to the community is: how do we deal with the 1024
byte limit? Or do we ignore it?

One might hope that Amazon will up that limit Real-Soon-Now. My
guess is that a limit of 4096 bytes would be adequate to push
the problem off to a more distant future.

If such a length increase does not happen, then we may need to
rethink the Zarr layout so that this limit is circumvented.
Below are some initial thoughts about this. I hope I am not
overthinking this and that there is some simpler approach that I
have not considered.

One possible proposal is to use a structure where
the long key is replaced with the hash of the long key.
This leads to an inode-like system with flat space of hash keys
and the objects for those hashkeys contain metadata and chunk-data.
In order to represent the group structure, one
would need to extend this to have some "inodes" be directory-like
objects that map a key segment to the hashkey of the inodes
"contained" in the directory.

I am sure there are other ways to do this. Is may also be worth
asking about the purpose of the groups. Right now they serve
as a namespace and as a primitive indexing mechanism for the leaf
content-bearing objects. Perhaps they are superfluous.

In any case, the 1024 byte key-length limit is likely
to be a problem for Zarr in the near future.
The community needs to decide if it wants to ignore this
limitation or address it in some general way.

=Dennis Heimbigner
Unidata

Carreau · 2020-06-09T17:49:02Z

Thanks I'll try to see if I can add some of that into the spec.

I think that the length limitation workaround might need to be on a pe-store basis. At least in spec v3 there is the data/ and meta/ prefix so it would be easy to have the equivalent of "mount points"/ references.

I'm not a huge fan of the hashing/ inode-like as this will likely mean a single place where we store the mapping which would require locking, amd make listing more difficult.

Note that some windows path APIs are also [limited to 260 char in length](https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#:~:text=In%20the%20Windows%20API%20(with,and%20a%20terminating%20null%20character.) and this has been a problem in the JS ecosystem with node_modules.

jstriebel · 2022-11-24T13:45:09Z

I added notes about this in #175.

jstriebel added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label Nov 16, 2022

This was referenced Nov 24, 2022

Node name & path updates from ZEP 1 review #175

Merged

Issue overview: From URI to open array #178

Closed

jstriebel closed this as completed in #175 Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Amazon S3 limit on the length of keys #77

The Amazon S3 limit on the length of keys #77

DennisHeimbigner commented Jun 7, 2020

Carreau commented Jun 9, 2020

jstriebel commented Nov 24, 2022

The Amazon S3 limit on the length of keys #77

The Amazon S3 limit on the length of keys #77

Comments

DennisHeimbigner commented Jun 7, 2020

Carreau commented Jun 9, 2020

jstriebel commented Nov 24, 2022