You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vttablet healthchecks fail during this process when the vttablet process blocks while waiting for the transaction pool to flush (when we had our transaction timeout set to 10m this actually caused kubernetes to think that the health check at /debug/vars was "failed" and it took action to restart the tablet container mid Reparent)
Our application experiences a fair amount of disruptive "SHUTTING DOWN" errors from the vtgates/vttablets while performing a PlannedReparentShard. We have our transaction timeout limits set to 5 minutes, so in the worst case scenario we have a tablet "down", throwing back "SHUTTING DOWN" errors upwards of 5 minutes.
The really important one is that last one. Given that no new queries can pass through the vttablet process anyways I propose that we add an optional, -flush_pool flag, to the PlannedReparentShard process to forcefully terminate and rollback transactions to immediately clear the transaction pool.
This should allow us to reparent more quickly, with less disruption to the services using Vitess.
The text was updated successfully, but these errors were encountered:
We are also proposing a slightly different flag, that would be a duration, on how long to wait until we kill all existing transactions. With a value of 0s, it would be the same as the flag you propose. But it's more flexible with an actual duration, so you could use 5s or 10s to not kill ongoing short-lived transactions, but kill the long-lived ones.
We find ourselves doing PlannedReparentShard, a lot.....as it stands it's a pretty painful and disruptive process for us.
The really important one is that last one. Given that no new queries can pass through the vttablet process anyways I propose that we add an optional, -flush_pool flag, to the PlannedReparentShard process to forcefully terminate and rollback transactions to immediately clear the transaction pool.
This should allow us to reparent more quickly, with less disruption to the services using Vitess.
The text was updated successfully, but these errors were encountered: