Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a -flush_pool flag to the PlannedReparentShard command #2279

Closed
rnavarro opened this issue Nov 21, 2016 · 2 comments
Closed

Add a -flush_pool flag to the PlannedReparentShard command #2279

rnavarro opened this issue Nov 21, 2016 · 2 comments
Assignees

Comments

@rnavarro
Copy link
Contributor

We find ourselves doing PlannedReparentShard, a lot.....as it stands it's a pretty painful and disruptive process for us.

  1. Our Percona version crashes when doing performing this operation due to a Semi-Sync bug (not what I'm writing about) Detailed Here: https://bugs.launchpad.net/percona-server/+bug/1641193
  2. The vttablet healthchecks fail during this process when the vttablet process blocks while waiting for the transaction pool to flush (when we had our transaction timeout set to 10m this actually caused kubernetes to think that the health check at /debug/vars was "failed" and it took action to restart the tablet container mid Reparent)
  3. Our application experiences a fair amount of disruptive "SHUTTING DOWN" errors from the vtgates/vttablets while performing a PlannedReparentShard. We have our transaction timeout limits set to 5 minutes, so in the worst case scenario we have a tablet "down", throwing back "SHUTTING DOWN" errors upwards of 5 minutes.

The really important one is that last one. Given that no new queries can pass through the vttablet process anyways I propose that we add an optional, -flush_pool flag, to the PlannedReparentShard process to forcefully terminate and rollback transactions to immediately clear the transaction pool.

This should allow us to reparent more quickly, with less disruption to the services using Vitess.

@alainjobart
Copy link
Contributor

After talking to Sugu, he agreed to take this on.

We are also proposing a slightly different flag, that would be a duration, on how long to wait until we kill all existing transactions. With a value of 0s, it would be the same as the flag you propose. But it's more flexible with an actual duration, so you could use 5s or 10s to not kill ongoing short-lived transactions, but kill the long-lived ones.

@sougou
Copy link
Contributor

sougou commented Nov 28, 2016

In #2301, I've added a new flag transaction_shutdown_grace_period that will make vttablet shutdown sooner if there are lingering transactions.

@sougou sougou closed this as completed Nov 28, 2016
frouioui pushed a commit to planetscale/vitess that referenced this issue Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants