Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoveTables: don't create unnecessary streams on the target for non-intersecting sources and targets #8090

Merged

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented May 10, 2021

Description

Currently when we create a MoveTables workflow one stream is created on the target per source shard. However if the source shards and target shards don't intersect, i.e. the keyspace_ids in a source shard will not map to a target shard, the source will never provide data to be inserted in the target.

These extra streams are currently hugely wasteful: they end up pulling data from the source every hour during the copy phase: after an hour since no data was selected the LastPK in the copy_state does not get updated and in the next copy phase the whole streaming starts again. The workflow never shows any progress. The mitigation today is to explicitly stop or delete these streams manually.

This PR identifies such streams and does not insert them in the first place.

Note that it is possible such cases can occur even in custom Materialization flows depending on how the target's vindexes map to the source's. However the logic to determine this is non-trivial. So for now we just do this for MoveTables.

Signed-off-by: Rohit Nayak rohit@planetscale.com

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review May 10, 2021 17:51
@rohit-nayak-ps rohit-nayak-ps requested review from a team and sougou May 10, 2021 17:51
@rohit-nayak-ps rohit-nayak-ps changed the title MoveTables: don't create unnecessary streams on the target for non-intersecting sources MoveTables: don't create unnecessary streams on the target for non-intersecting sources and targets May 10, 2021
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think materialization intent is not needed. I feel like there should be a way to infer that the MoveTables is not changing the sharding key, and therefore we don't need to create those extra streams. We should do a quick brainstorm.

@rohit-nayak-ps rohit-nayak-ps marked this pull request as draft June 21, 2021 14:55
@rohit-nayak-ps
Copy link
Contributor Author

I think materialization intent is not needed. I feel like there should be a way to infer that the MoveTables is not changing the sharding key, and therefore we don't need to create those extra streams. We should do a quick brainstorm.

Sugu has suggested looking at a more generic solution for this based on participating shards and not using the new flag.

@rohit-nayak-ps
Copy link
Contributor Author

I think materialization intent is not needed. I feel like there should be a way to infer that the MoveTables is not changing the sharding key, and therefore we don't need to create those extra streams. We should do a quick brainstorm.

Sugu has suggested looking at a more generic solution for this based on participating shards and not using the new flag.

To make immediate forward progress to close a pending issue, I am proposing we merge this PR as is and implement the more generic approach suggested by Sugu later.

The generic approach will also work for applicable Materialize streams, instead of just MoveTables. For this we need to parse the source expressions of Materialize settings to get the source table names and compare the sharding keys of the two tables to check that they match. This will also require more tests and possibly updating a whole lot of existing tests. Hence the plan to go with this approach for now.

@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review July 6, 2021 08:57
@deepthi
Copy link
Member

deepthi commented Jul 26, 2021

@rohit-nayak-ps can you resolve the merge conflicts? Let's merge this after that.

@rohit-nayak-ps rohit-nayak-ps force-pushed the rn-materialize-unreacheable-streams branch from 5a7ea5d to bd15146 Compare August 2, 2021 16:28
…tersection between the target and source shards

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps force-pushed the rn-materialize-unreacheable-streams branch from 76d7690 to 74c83bb Compare August 4, 2021 13:25
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps merged commit ec131ed into vitessio:main Aug 5, 2021
@rohit-nayak-ps rohit-nayak-ps deleted the rn-materialize-unreacheable-streams branch August 5, 2021 09:58
DeathBorn added a commit to vinted/vitess that referenced this pull request Apr 22, 2024
…r non-intersecting sources and targets vitessio#8090

Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>
DeathBorn added a commit to vinted/vitess that referenced this pull request Apr 23, 2024
…r non-intersecting sources and targets vitessio#8090

Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants