New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MoveTables: don't create unnecessary streams on the target for non-intersecting sources and targets #8090
MoveTables: don't create unnecessary streams on the target for non-intersecting sources and targets #8090
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think materialization intent is not needed. I feel like there should be a way to infer that the MoveTables
is not changing the sharding key, and therefore we don't need to create those extra streams. We should do a quick brainstorm.
Sugu has suggested looking at a more generic solution for this based on participating shards and not using the new flag. |
To make immediate forward progress to close a pending issue, I am proposing we merge this PR as is and implement the more generic approach suggested by Sugu later. The generic approach will also work for applicable Materialize streams, instead of just MoveTables. For this we need to parse the source expressions of Materialize settings to get the source table names and compare the sharding keys of the two tables to check that they match. This will also require more tests and possibly updating a whole lot of existing tests. Hence the plan to go with this approach for now. |
@rohit-nayak-ps can you resolve the merge conflicts? Let's merge this after that. |
5a7ea5d
to
bd15146
Compare
…tersection between the target and source shards Signed-off-by: Rohit Nayak <rohit@planetscale.com>
76d7690
to
74c83bb
Compare
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…r non-intersecting sources and targets vitessio#8090 Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>
…r non-intersecting sources and targets vitessio#8090 Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>
Description
Currently when we create a MoveTables workflow one stream is created on the target per source shard. However if the source shards and target shards don't intersect, i.e. the keyspace_ids in a source shard will not map to a target shard, the source will never provide data to be inserted in the target.
These extra streams are currently hugely wasteful: they end up pulling data from the source every hour during the copy phase: after an hour since no data was selected the LastPK in the copy_state does not get updated and in the next copy phase the whole streaming starts again. The workflow never shows any progress. The mitigation today is to explicitly stop or delete these streams manually.
This PR identifies such streams and does not insert them in the first place.
Note that it is possible such cases can occur even in custom Materialization flows depending on how the target's vindexes map to the source's. However the logic to determine this is non-trivial. So for now we just do this for MoveTables.
Signed-off-by: Rohit Nayak rohit@planetscale.com
Checklist