Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for all replicas to reparent when running ERS from vtctl #9541

Closed

Conversation

GuptaManan100
Copy link
Member

Description

When running EmergencyReparentShard from the vtctl binary, we used to finish ERS when even 1 replica was successful in reparenting itself to the new primary. When the command finishes, the vtctl binary also finishes execution. This led to the grpc clients for all the other replicas's stopping execution, which in-turn caused the grpc servers to cancel their contexts. In some cases this led to the replication not being setup correctly on some replicas. This PR fixes that issue by adding an additional internal option to running EmergencyReparentShard. When running from vtctl we will wait for all the replicas to return, while we will continue to exit out early for running ERS from vtorc and vtctldserver.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

…vtctl binary

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

This bug fix will not work since vtctldserver is also using the same codepath as the vtctl binary when queried with vtctlclient. So, we also start waiting for all replicas on the vtctldserver side which isn't the expected behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in ERS with vtctl stopped replication on a tablet
1 participant