Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report lag stats in poller #7490

Merged
merged 3 commits into from
Feb 17, 2021
Merged

Conversation

5antelope
Copy link
Member

@5antelope 5antelope commented Feb 14, 2021

Signed-off-by: crowu y.wu4515@gmail.com

Description

I think if polling is the default recommendation given how VTGate gateway works. This PR reports lag stats from poller so that we can track which replica is "unhealthy"

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/mysqlctl"
vtrpcpb "vitess.io/vitess/go/vt/proto/vtrpc"
"vitess.io/vitess/go/vt/vterrors"
)

var replicationLagGauges = stats.NewGaugesWithMultiLabels(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the vttablet metrics have the keyspace/shard dimensions because by definition a tablet belongs to only one keyspace/shard. This can be a simple gauge (NewGauge).
Also the name should include the units - replicationLagMs or replicationLagNs.
HeartbeatLag is being reported in nanoseconds so we should probably do the same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I renamed the gauge to replicationLagSec since we always assume the lag in seconds (e.g., we have SecondsBehindMaster and also cast the duration to sec on line 60)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes more sense than ns :)

Signed-off-by: crowu <y.wu4515@gmail.com>
"vitess.io/vitess/go/vt/mysqlctl"
vtrpcpb "vitess.io/vitess/go/vt/proto/vtrpc"
"vitess.io/vitess/go/vt/vterrors"
)

var replicationLagGauges = stats.NewGauge("replicationLagSec", "replication lag in seconds")
Copy link
Member

@deepthi deepthi Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be nitpicky, but could you rename the variable? It can be the same: replicationLagSec or even rename both the variable and gauge to replicationLagSeconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch. I was going to do that initially as well :-)

Signed-off-by: crowu <y.wu4515@gmail.com>
@deepthi deepthi merged commit 84b5ab3 into vitessio:master Feb 17, 2021
@askdba askdba added this to the v10.0 milestone Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants