Skip to content

Compress serialized indexers #2346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 16, 2022
Merged

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Jun 15, 2022

This works around a limitation of FlatBuffers: The maximum size of a FlatBuffers table is 2 GiB. Now while us running into this can be solved by moving indexers to separate files, we can get quite a ways by simply compressing the individual indexers.

In my initial measurements, I saw reductions between 2-10x in size of the partition FlatBuffers tables, with partitions containing mostly string indexes compressing best.

In limited testing, I was able to bump the max-partition-size from 4 Mi events to 64 Mi events without running into any size limitations, so with this change we should be good for a while, and can even consider bumping the size again.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

Test with old and newly created databases. Rebuild an older one. Review the code file-by-file.

This works around a limitation of FlatBuffers: The maximum size of a FlatBuffers
table is 2 GiB. Now while us running into this can be solved by moving indexers
to separate files, we can get quite a ways by simply compressing the individual
indexers.

In my initial measurements, I saw reductions between 2-10x in size of the
partition FlatBuffers tables, with partitions containing mostly string indexes
compressing best.

In limited testing, I was able to bump the max-partition-size from 4 Mi events
to 64 Mi events without running into any size limitations, so with this change
we should be good for a while, and can even consider bumping the size again.
@dominiklohmann dominiklohmann added performance Improvements or regressions of performance bug Incorrect behavior labels Jun 15, 2022
@dominiklohmann dominiklohmann requested a review from tobim June 15, 2022 10:58
@dominiklohmann dominiklohmann force-pushed the story/sc-34616/compress-indexers branch from 37ff36b to 00345e6 Compare June 15, 2022 15:04
Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Co-authored-by: tobim <tobim@fastmail.fm>
@dominiklohmann dominiklohmann merged commit fcf5322 into master Jun 16, 2022
@dominiklohmann dominiklohmann deleted the story/sc-34616/compress-indexers branch June 16, 2022 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior performance Improvements or regressions of performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants