-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report lag stats in poller #7490
Conversation
Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/mysqlctl" | ||
vtrpcpb "vitess.io/vitess/go/vt/proto/vtrpc" | ||
"vitess.io/vitess/go/vt/vterrors" | ||
) | ||
|
||
var replicationLagGauges = stats.NewGaugesWithMultiLabels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the vttablet metrics have the keyspace/shard dimensions because by definition a tablet belongs to only one keyspace/shard. This can be a simple gauge (NewGauge
).
Also the name should include the units - replicationLagMs
or replicationLagNs
.
HeartbeatLag is being reported in nanoseconds so we should probably do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I renamed the gauge to replicationLagSec
since we always assume the lag in seconds (e.g., we have SecondsBehindMaster
and also cast the duration to sec on line 60)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes more sense than ns :)
Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/mysqlctl" | ||
vtrpcpb "vitess.io/vitess/go/vt/proto/vtrpc" | ||
"vitess.io/vitess/go/vt/vterrors" | ||
) | ||
|
||
var replicationLagGauges = stats.NewGauge("replicationLagSec", "replication lag in seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be nitpicky, but could you rename the variable? It can be the same: replicationLagSec
or even rename both the variable and gauge to replicationLagSeconds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch. I was going to do that initially as well :-)
Signed-off-by: crowu y.wu4515@gmail.com
Description
I think if polling is the default recommendation given how VTGate gateway works. This PR reports lag stats from poller so that we can track which replica is "unhealthy"
Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: