-
-
Notifications
You must be signed in to change notification settings - Fork 95
Fix missing dense indexes and blocking import hangs #2848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What bug does this fix and why is this the correct solution? Edit: Thanks for updating the description. |
196cc76
to
a38e8dc
Compare
Some additional checks to make sure skipped indexes don't interfere with subsequent dense index creation. The last check is failing as of this commit, because the stream stage in the active partition does not iterate over slice columns correctly.
Dropping the `talbe_slice_column` wrapping for streaming data to active indexers simplifies the streaming logic and removes the need to filter outputs on the qualified record field. The indexers are now spawned with the index of the slice column instead and retrieve the correct data by themself.
a38e8dc
to
247a942
Compare
…partition-stream-routing
…partition-stream-routing
This fixes a bug that was only surfaced by #2807, so it's not really worth a separate changelog entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic. As we discussed earlier, I pushed a fix for the broken unit test and the merge conflicts with master directly to the branch. There's only minor feedback on the PR itself.
I ran the integration tests that failed frequently 100 times each with master and with this PR in a release build:
sh build/vast/integration-Release.sh -t "(Arrow Full Data Model|Heterogeneous JSONL Import)"
For master, I failed 4/100 times.
With this change, it did not fail a single time.
Table columns for which no dense indexes are created cause that the last columns in the same table would also not be indexed.
For example for a schema with fields:
<foo, bar #skip, baz, buz>
onlyfoo
andbaz
would have a dense index.If that resulted in no dense index creation at all the
import --blocking
mechanism also failed, causing the import process to hang forever.The solution is two fold:
For the index creation, we now avoid the stage filter and
table_slice_column
altogether and let the indexer retrieve the correct column from the slice. This should make the code easier to understand. Together with that we don't bypass the column counter increment for skipped fields any more, so indexers are spawned with the correct column index.For the blocking import we now wait for batches to be handled by the store instead of any of the connected indexers.