New bucket serialization format #18

zkat · 2019-10-20T04:39:35Z

The current bucket format is copied directly from what the JavaScript version of cacache does.

I no longer think it's worth trying to preserve compatibility, and the performance of index-related operations is kind of horrendous right now, so I think it's time to explore a new on-disk format for the index buckets.

My current thinking is to use serde more directly, and come up with a better strategy for the generic metadata field, as well.

And of course, if there's no actual perf difference, this issue should just be closed, but this is worth exploring anyway.

The text was updated successfully, but these errors were encountered:

isaacs · 2019-10-28T06:03:59Z

I'm curious if you plan to keep the whole "nested shasum parts" thing. I noticed that the JS cacache spends a fair bit of FS ops on that, and it seems like most file systems in use today can handle bazillions of files in a single dir.

On my machine, I'm seeing about a 2-5% performance boost in all the benchmarks that are sensitive to bucket loading efficiency when I make it just use a single layer of files in one big index folder. Not revolutionary, but not nothing. I haven't yet run it through a benchmark that creates millions of buckets, so it's possible that it's A Bad Idea, of course :)

zkat · 2019-11-07T07:36:28Z

I did a spike towards this and realized this is way more trouble than it's worth -- generally, if you want performance, cacache::read_hash(_sync) is the way to go, as it's just as fast as a regular filesystem read. Considering the general slowness is pretty much I/O bound, with only a bit of JSON parsing delay (and serde_json is very fast)... I think I'm just gonna go ahead and say this wouldn't be worth the trouble just to squeeze another couple of percentage points in.

zkat added enhancement New feature or request help wanted Extra attention is needed semver-major Semver-breaking change labels Oct 20, 2019

zkat added this to the 4.0.0 milestone Oct 20, 2019

zkat closed this as completed Nov 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New bucket serialization format #18

New bucket serialization format #18

zkat commented Oct 20, 2019

isaacs commented Oct 28, 2019

zkat commented Nov 7, 2019

New bucket serialization format #18

New bucket serialization format #18

Comments

zkat commented Oct 20, 2019

isaacs commented Oct 28, 2019

zkat commented Nov 7, 2019