-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a flatbuffer container class to hold excess table slices in segments #2449
Conversation
f8593b8
to
21d1291
Compare
0839b8f
to
1b70732
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments on the code. I just started another test round with a database that would be corrupt without this change and will let that run over night.
d971b49
to
31becd8
Compare
This implements a new `SegmentedFile` flatbuffer as well as a new `flatbuffer_container` utility class that uses this flatbuffer to provide access to multiple flatbuffers stored in the same file. This is similar in spirit to the size-prefixed flatbuffers provided by upstream, but allows random access into the contained flatbuffers which will become more relevant when we use the same approach for storing dense indices in partitions. The segment is updated to optionally allow an implementation based on flatbuffer containers, which should allow us to store segments that hold more than 2GiB worth of table slice data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving after an extensive pair-review session. We want to roll this out as VAST v2.3.0-rc2 for further testing.
Use the new flatbuffer_container to store data for large indices outside the main partition flatbuffer, in order to avoid running into the 2GiB limit in the case of string indices for very large columns.
* Remove special handling for files >= 2GiB, which is now actively harmful * Fix off-by-one error in indexer indices * Add more documentation to several places * Don't throw exceptions during index startup * Fix unit tests Co-Authored-By: Dominik Lohmann <mail@dominiklohmann.de>
* Update a few places in the code that were assuming that assumed they can cast a chunk to a `fbs::Partition` or `fbs::Segment`. * Externally stored table slices were accessed as a `FlatTableSlice`, but that is not a root type and thus cannot work. Fixing this causes some follow-up work because we cannot represent the internal list of table slices as a `vector<FlatTableSlice*>` anymore. * Fix 'lsvast' and introduce a new inner identifier to make it possible to deduce the type of content stored in a SegmentedFileHeader * Use fmt for printing partitions in lsvast
31becd8
to
0c67179
Compare
Running this overnight seemed to be stable, and I could query the ingested data, so merging this now. |
This implements a new
SegmentedFile
flatbuffer as well asa new
flatbuffer_container
utility class that uses thisflatbuffer to provide access to multiple flatbuffers stored
in the same file.
This is similar in spirit to the size-prefixed flatbuffers
provided by upstream, but allows random access into the
contained flatbuffers which will become more relevant when
we use the same approach for storing dense indices in partitions.
The segment is updated to optionally allow an implementation based
on flatbuffer containers, which should allow us to store segments
that hold more than 2GiB worth of table slice data.
📝 Checklist
🎯 Review Instructions
There are three natural parts building on each other, the segmented file, the flatbuffer container and the segment. Review bottom-up or top-down as you wish.