Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the race between PromoteReplica and replication manager tick #9859

Merged

Conversation

GuptaManan100
Copy link
Member

Description

The order of operations for PromoteReplica before were -

  1. Fix MySQL by resetting replication and fix semi-sync settings
  2. Change Tablet type in the topo server (tablet record)
  3. update the local state of tablet manager to Primary.
  4. Stop replication manager
  5. Start a loop of shard sync which will update the topo server (shard record)
  6. Start replication

So any run of replication manager tick after the first step and before the 4th step caused the issue seen in #9819.

This PR fixes the race between PromoteReplica stopping the replication manager and the replication manager tick running after step 1.

After this PR, the order of operations for PromoteReplica are -

  1. Stop replication manager
  2. Fix MySQL by resetting replication and fix semi-sync settings
  3. Change Tablet type in the topo server (tablet record)
  4. update the local state of tablet manager to Primary.
  5. Start a loop of shard sync which will update the topo server (shard record)
  6. Start replication

Related Issue(s)

Fixes #9819

Checklist

  • Should this PR be backported? No
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

…e we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…-promote-replica

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100 GuptaManan100 force-pushed the replication-manager-promote-replica branch from 4209136 to 9c46153 Compare March 17, 2022 07:38
@GuptaManan100 GuptaManan100 merged commit 30f03c8 into vitessio:main Mar 20, 2022
@GuptaManan100 GuptaManan100 deleted the replication-manager-promote-replica branch March 20, 2022 03:43
GuptaManan100 added a commit to planetscale/vitess that referenced this pull request Apr 12, 2022
…essio#9859)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit to planetscale/vitess that referenced this pull request Apr 12, 2022
…essio#9859)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit to planetscale/vitess that referenced this pull request Apr 12, 2022
…essio#9859)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit that referenced this pull request Apr 12, 2022
…) (#10075)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
ameetkotian pushed a commit to tinyspeck/vitess that referenced this pull request Apr 12, 2022
…essio#9859)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit that referenced this pull request Apr 13, 2022
…) - release 11 (#10077)

* Fix the race between PromoteReplica and replication manager tick (#9859)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: change PRIMARY to MASTER to fix syntax error

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit that referenced this pull request Apr 14, 2022
…) (#10076)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
notfelineit pushed a commit to planetscale/vitess that referenced this pull request May 3, 2022
…essio#9859) (vitessio#552)

* test: improve test to check that replication manager is stopped before we make any changes in MySQL

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: stop the replication manager first in PromoteReplica call

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to replication manager setTabletType

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comment for SetReplicationSource in wrangler

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments to SetReplicationSource in reparentutil

Signed-off-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: race condition with PlannedReparentShard can leave a shard with no PRIMARY
2 participants