Add a -flush_pool flag to the PlannedReparentShard command #2279

rnavarro · 2016-11-21T18:36:42Z

We find ourselves doing PlannedReparentShard, a lot.....as it stands it's a pretty painful and disruptive process for us.

Our Percona version crashes when doing performing this operation due to a Semi-Sync bug (not what I'm writing about) Detailed Here: https://bugs.launchpad.net/percona-server/+bug/1641193
The vttablet healthchecks fail during this process when the vttablet process blocks while waiting for the transaction pool to flush (when we had our transaction timeout set to 10m this actually caused kubernetes to think that the health check at /debug/vars was "failed" and it took action to restart the tablet container mid Reparent)
Our application experiences a fair amount of disruptive "SHUTTING DOWN" errors from the vtgates/vttablets while performing a PlannedReparentShard. We have our transaction timeout limits set to 5 minutes, so in the worst case scenario we have a tablet "down", throwing back "SHUTTING DOWN" errors upwards of 5 minutes.

The really important one is that last one. Given that no new queries can pass through the vttablet process anyways I propose that we add an optional, -flush_pool flag, to the PlannedReparentShard process to forcefully terminate and rollback transactions to immediately clear the transaction pool.

This should allow us to reparent more quickly, with less disruption to the services using Vitess.

alainjobart · 2016-11-21T22:26:38Z

After talking to Sugu, he agreed to take this on.

We are also proposing a slightly different flag, that would be a duration, on how long to wait until we kill all existing transactions. With a value of 0s, it would be the same as the flag you propose. But it's more flexible with an actual duration, so you could use 5s or 10s to not kill ongoing short-lived transactions, but kill the long-lived ones.

sougou · 2016-11-28T20:49:37Z

In #2301, I've added a new flag transaction_shutdown_grace_period that will make vttablet shutdown sooner if there are lingering transactions.

enisoc mentioned this issue Nov 21, 2016

Limit random MySQL server-id to 2^31-1 #2280

Closed

sougou assigned alainjobart Nov 21, 2016

alainjobart assigned sougou and unassigned alainjobart Nov 21, 2016

sougou closed this as completed Nov 28, 2016

frouioui pushed a commit to planetscale/vitess that referenced this issue Nov 21, 2023

cherry pick of 13183 (vitessio#2279)

6c0ab57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a -flush_pool flag to the PlannedReparentShard command #2279

Add a -flush_pool flag to the PlannedReparentShard command #2279

rnavarro commented Nov 21, 2016

alainjobart commented Nov 21, 2016

sougou commented Nov 28, 2016

Add a -flush_pool flag to the PlannedReparentShard command #2279

Add a -flush_pool flag to the PlannedReparentShard command #2279

Comments

rnavarro commented Nov 21, 2016

alainjobart commented Nov 21, 2016

sougou commented Nov 28, 2016