Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a flatbuffer container class to hold excess table slices in segments #2449

Merged
merged 12 commits into from
Aug 23, 2022

Conversation

lava
Copy link
Member

@lava lava commented Jul 20, 2022

This implements a new SegmentedFile flatbuffer as well as
a new flatbuffer_container utility class that uses this
flatbuffer to provide access to multiple flatbuffers stored
in the same file.

This is similar in spirit to the size-prefixed flatbuffers
provided by upstream, but allows random access into the
contained flatbuffers which will become more relevant when
we use the same approach for storing dense indices in partitions.

The segment is updated to optionally allow an implementation based
on flatbuffer containers, which should allow us to store segments
that hold more than 2GiB worth of table slice data.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on vast.io, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

There are three natural parts building on each other, the segmented file, the flatbuffer container and the segment. Review bottom-up or top-down as you wish.

@lava lava force-pushed the story/sc-35662/flatbuffer-container branch from f8593b8 to 21d1291 Compare August 17, 2022 16:46
@lava lava changed the title PoC: Implement a flatbuffer container class to hold excess segments Implement a flatbuffer container class to hold excess table slices in segments Aug 17, 2022
@lava lava marked this pull request as ready for review August 17, 2022 16:50
@lava lava force-pushed the story/sc-35662/flatbuffer-container branch 8 times, most recently from 0839b8f to 1b70732 Compare August 20, 2022 10:58
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments on the code. I just started another test round with a database that would be corrupt without this change and will let that run over night.

@lava lava force-pushed the story/sc-35662/flatbuffer-container branch 4 times, most recently from d971b49 to 31becd8 Compare August 22, 2022 16:08
This implements a new `SegmentedFile` flatbuffer as well as
a new `flatbuffer_container` utility class that uses this
flatbuffer to provide access to multiple flatbuffers stored
in the same file.

This is similar in spirit to the size-prefixed flatbuffers
provided by upstream, but allows random access into the
contained flatbuffers which will become more relevant when
we use the same approach for storing dense indices in partitions.

The segment is updated to optionally allow an implementation based
on flatbuffer containers, which should allow us to store segments
that hold more than 2GiB worth of table slice data.
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving after an extensive pair-review session. We want to roll this out as VAST v2.3.0-rc2 for further testing.

lava and others added 11 commits August 22, 2022 18:43
Use the new flatbuffer_container to store data for large indices
outside the main partition flatbuffer, in order to avoid running
into the 2GiB limit in the case of string indices for very large
columns.
 * Remove special handling for files >= 2GiB, which is now actively
   harmful
 * Fix off-by-one error in indexer indices
 * Add more documentation to several places
 * Don't throw exceptions during index startup
 * Fix unit tests

Co-Authored-By: Dominik Lohmann <mail@dominiklohmann.de>
 * Update a few places in the code that were assuming that
   assumed they can cast a chunk to a `fbs::Partition` or
   `fbs::Segment`.

 * Externally stored table slices were accessed as
   a `FlatTableSlice`, but that is not a root type and thus
   cannot work. Fixing this causes some follow-up work
   because we cannot represent the internal list of
   table slices as a `vector<FlatTableSlice*>` anymore.

 * Fix 'lsvast' and introduce a new inner identifier to
   make it possible to deduce the type of content stored
   in a SegmentedFileHeader

 * Use fmt for printing partitions in lsvast
@lava lava force-pushed the story/sc-35662/flatbuffer-container branch from 31becd8 to 0c67179 Compare August 22, 2022 16:43
@lava
Copy link
Member Author

lava commented Aug 23, 2022

Running this overnight seemed to be stable, and I could query the ingested data, so merging this now.

@lava lava merged commit e63611e into master Aug 23, 2022
@lava lava deleted the story/sc-35662/flatbuffer-container branch August 23, 2022 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants