Wait for all replicas to reparent when running ERS from vtctl #9541

GuptaManan100 · 2022-01-20T07:44:26Z

Description

When running EmergencyReparentShard from the vtctl binary, we used to finish ERS when even 1 replica was successful in reparenting itself to the new primary. When the command finishes, the vtctl binary also finishes execution. This led to the grpc clients for all the other replicas's stopping execution, which in-turn caused the grpc servers to cancel their contexts. In some cases this led to the replication not being setup correctly on some replicas. This PR fixes that issue by adding an additional internal option to running EmergencyReparentShard. When running from vtctl we will wait for all the replicas to return, while we will continue to exit out early for running ERS from vtorc and vtctldserver.

Related Issue(s)

Fixes Bug in ERS with vtctl stopped replication on a tablet #9529

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

…vtctl binary Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>

GuptaManan100 · 2022-01-22T12:51:19Z

This bug fix will not work since vtctldserver is also using the same codepath as the vtctl binary when queried with vtctlclient. So, we also start waiting for all replicas on the vtctldserver side which isn't the expected behaviour

GuptaManan100 added 2 commits January 20, 2022 12:38

test: add a failing test for stopped replication on running ers from …

136ed1d

…vtctl binary Signed-off-by: Manan Gupta <manan@planetscale.com>

feat: wait for all replicas when running ers from vtctl

d8787c6

Signed-off-by: Manan Gupta <manan@planetscale.com>

GuptaManan100 added Type: Bug Component: Cluster management release notes labels Jan 20, 2022

GuptaManan100 requested review from deepthi and rohit-nayak-ps as code owners January 20, 2022 07:44

GuptaManan100 closed this Jan 22, 2022

GuptaManan100 mentioned this pull request Jan 22, 2022

Bug in ERS with vtctl stopped replication on a tablet #9529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for all replicas to reparent when running ERS from vtctl #9541

Wait for all replicas to reparent when running ERS from vtctl #9541

GuptaManan100 commented Jan 20, 2022

GuptaManan100 commented Jan 22, 2022

Wait for all replicas to reparent when running ERS from vtctl #9541

Wait for all replicas to reparent when running ERS from vtctl #9541

Conversation

GuptaManan100 commented Jan 20, 2022

Description

Related Issue(s)

Checklist

Deployment Notes

GuptaManan100 commented Jan 22, 2022