You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a tablet is Unreachable, then VTOrc will keep running into the analysis UnreachablePrimary. FullStatus calls will keep timing out because the tablet is unreachable.
For UnreachablePrimary failure types, we run runEmergentOperations. We do this without acquiring a topo lock because all this function tries to do is reload the tablet information in a fast path. This function creates a go routine to reload the said tablet information - go emergentlyReadTopologyInstance(analysisEntry.AnalyzedInstanceAlias, analysisEntry.Analysis)
This is problematic because this means we are running a new go routine to reload the tablet information every second! Each of these go routines tries to run FullStatus RPC. This just leads to us exploding the go-routines which can cause VTOrc to OOM.
The go-routines do end up in a steady state eventually because even if we are creating a new go routine every second, all the go routines spawned 15 seconds ago would finish, so we'll end up with a steady state number. In my testing it was something like this -
Overview of the Issue
If a tablet is Unreachable, then VTOrc will keep running into the analysis
UnreachablePrimary
.FullStatus
calls will keep timing out because the tablet is unreachable.For
UnreachablePrimary
failure types, we runrunEmergentOperations
. We do this without acquiring a topo lock because all this function tries to do is reload the tablet information in a fast path. This function creates a go routine to reload the said tablet information -go emergentlyReadTopologyInstance(analysisEntry.AnalyzedInstanceAlias, analysisEntry.Analysis)
This is problematic because this means we are running a new go routine to reload the tablet information every second! Each of these go routines tries to run FullStatus RPC. This just leads to us exploding the go-routines which can cause VTOrc to OOM.
The go-routines do end up in a steady state eventually because even if we are creating a new go routine every second, all the go routines spawned 15 seconds ago would finish, so we'll end up with a steady state number. In my testing it was something like this -
This is still not the desired behaviour wherein we have so many go-routines all trying to call
FullStatus
. This increases the network traffic as well.Reproduction Steps
In the testing framework I was able to reproduce this by making
FullStatus
slow using a time.Sleep and then making the primary tablet unreachable.Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: