-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttler: stats in /debug/vars #10443
Throttler: stats in /debug/vars #10443
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Porting over my question from the previous PR:
|
To begin with, that's how they were named, imported from But, also, the metrics names are parameterized; The existing I do see examples of parameterized connection pool params. But I don't have clarity on how we might deal with parameterized app names. I also question the value of the current |
That is a valid question :)
What I haven't looked at is whether this conversion is something we have implemented in our own Prometheus backend code. |
Looks like we do convert |
This is what $ curl -s http://127.0.0.1:15100/metrics | grep throttler
# HELP vttablet_throttler_aggregated_mysql_self aggregated value for mysql.self
# TYPE vttablet_throttler_aggregated_mysql_self gauge
vttablet_throttler_aggregated_mysql_self 60620.693474
# HELP vttablet_throttler_aggregated_mysql_shard aggregated value for mysql.shard
# TYPE vttablet_throttler_aggregated_mysql_shard gauge
vttablet_throttler_aggregated_mysql_shard 60620.497982
# HELP vttablet_throttler_check_any_error total number of failed checks
# TYPE vttablet_throttler_check_any_error counter
vttablet_throttler_check_any_error 484982
# HELP vttablet_throttler_check_any_mysql_self_error
# TYPE vttablet_throttler_check_any_mysql_self_error counter
vttablet_throttler_check_any_mysql_self_error 242490
# HELP vttablet_throttler_check_any_mysql_self_total
# TYPE vttablet_throttler_check_any_mysql_self_total counter
vttablet_throttler_check_any_mysql_self_total 242518
# HELP vttablet_throttler_check_any_mysql_shard_error
# TYPE vttablet_throttler_check_any_mysql_shard_error counter
vttablet_throttler_check_any_mysql_shard_error 242492
# HELP vttablet_throttler_check_any_mysql_shard_total
# TYPE vttablet_throttler_check_any_mysql_shard_total counter
vttablet_throttler_check_any_mysql_shard_total 242518
# HELP vttablet_throttler_check_any_total total number of checks
# TYPE vttablet_throttler_check_any_total counter
vttablet_throttler_check_any_total 485036
# HELP vttablet_throttler_check_mysql_self_seconds_since_healthy seconds since last healthy cehck for mysql.self
# TYPE vttablet_throttler_check_mysql_self_seconds_since_healthy gauge
vttablet_throttler_check_mysql_self_seconds_since_healthy 60619
# HELP vttablet_throttler_check_mysql_shard_seconds_since_healthy seconds since last healthy cehck for mysql.shard
# TYPE vttablet_throttler_check_mysql_shard_seconds_since_healthy gauge
vttablet_throttler_check_mysql_shard_seconds_since_healthy 60619
# HELP vttablet_throttler_check_vitess_error total number of failed checks for vitess
# TYPE vttablet_throttler_check_vitess_error counter
vttablet_throttler_check_vitess_error 484982
# HELP vttablet_throttler_check_vitess_mysql_self_error
# TYPE vttablet_throttler_check_vitess_mysql_self_error counter
... |
TIL and I really wasn't aware of |
Let me look into camel casing metric names. |
Converting to Draft while looking into a few things |
@deepthi how about the following: $ curl -s http://127.0.0.1:15100/debug/vars | jq . | grep Throttler | grep -v Pool
"ThrottlerAggregatedMysqlSelf": 0.191718,
"ThrottlerAggregatedMysqlShard": 0.960054,
"ThrottlerCheckAnyError": 27,
"ThrottlerCheckAnyMysqlSelfError": 13,
"ThrottlerCheckAnyMysqlSelfTotal": 38,
"ThrottlerCheckAnyMysqlShardError": 14,
"ThrottlerCheckAnyMysqlShardTotal": 42,
"ThrottlerCheckAnyTotal": 80,
"ThrottlerCheckMysqlSelfSecondsSinceHealthy": 0,
"ThrottlerCheckMysqlShardSecondsSinceHealthy": 0,
"ThrottlerProbesLatency": 355523,
"ThrottlerProbesTotal": 74,
$ curl -s http://127.0.0.1:15100/metrics | grep -i throttler | grep -v pool
# HELP vttablet_throttler_aggregated_mysql_self aggregated value for mysql.self
# TYPE vttablet_throttler_aggregated_mysql_self gauge
vttablet_throttler_aggregated_mysql_self 0.827354
# HELP vttablet_throttler_aggregated_mysql_shard aggregated value for mysql.shard
# TYPE vttablet_throttler_aggregated_mysql_shard gauge
vttablet_throttler_aggregated_mysql_shard 0.591876
# HELP vttablet_throttler_check_any_error total number of failed checks
# TYPE vttablet_throttler_check_any_error counter
vttablet_throttler_check_any_error 27
# HELP vttablet_throttler_check_any_mysql_self_error
# TYPE vttablet_throttler_check_any_mysql_self_error counter
vttablet_throttler_check_any_mysql_self_error 13
# HELP vttablet_throttler_check_any_mysql_self_total
# TYPE vttablet_throttler_check_any_mysql_self_total counter
vttablet_throttler_check_any_mysql_self_total 57
# HELP vttablet_throttler_check_any_mysql_shard_error
# TYPE vttablet_throttler_check_any_mysql_shard_error counter
vttablet_throttler_check_any_mysql_shard_error 14
# HELP vttablet_throttler_check_any_mysql_shard_total
# TYPE vttablet_throttler_check_any_mysql_shard_total counter
vttablet_throttler_check_any_mysql_shard_total 64
# HELP vttablet_throttler_check_any_total total number of checks
# TYPE vttablet_throttler_check_any_total counter
vttablet_throttler_check_any_total 121
# HELP vttablet_throttler_check_mysql_self_seconds_since_healthy seconds since last healthy cehck for mysql.self
# TYPE vttablet_throttler_check_mysql_self_seconds_since_healthy gauge
vttablet_throttler_check_mysql_self_seconds_since_healthy 0
# HELP vttablet_throttler_check_mysql_shard_seconds_since_healthy seconds since last healthy cehck for mysql.shard
# TYPE vttablet_throttler_check_mysql_shard_seconds_since_healthy gauge
vttablet_throttler_check_mysql_shard_seconds_since_healthy 0
# HELP vttablet_throttler_probes_latency probes latency
# TYPE vttablet_throttler_probes_latency gauge
vttablet_throttler_probes_latency 347382
# HELP vttablet_throttler_probes_total total probes
# TYPE vttablet_throttler_probes_total counter
vttablet_throttler_probes_total 114 In the above:
I confess to dislike the CamelCase approach, because in a scenario where I need more than one word to describe something, such as in the above |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…trics Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
In terms of naming, this now looks good.
The stats package allows you to attach labels to the same metric. For example,
labels don't have to be any particular case as you can see from this example. Does this help? |
Yes, thank you! |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
go/stats/counter_map.go
Outdated
@@ -0,0 +1,95 @@ | |||
/* | |||
Copyright 2019 The Vitess Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
go/stats/counter_test.go
Outdated
t.Errorf("want %#v, got %#v", v, gotv) | ||
} | ||
v.Set(3.14) | ||
if v.Get() != 3.14 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I personally find stretchr assertions like assert.Equal(t, 3.14, v.Get())
much easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right; I copied+pasted existing tests and kept the original code conventions (I assume this was written way before testify was in use). I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
for _, tc := range tt { | ||
t.Run(tc.word, func(t *testing.T) { | ||
camel := SingleWordCamel(tc.word) | ||
assert.Equal(t, tc.expect, camel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤩
@@ -307,6 +307,10 @@ func (e *Exporter) NewGauge(name string, help string) *stats.Gauge { | |||
return lvar | |||
} | |||
|
|||
func (e *Exporter) NewGaugeFloat64(name string, help string) *stats.GaugeFloat64 { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow why this is returning nil
. Maybe add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See original comment, where this is explained. I'll also add as code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Description
This PR exposes throttler metrics on
/debug/vars
. For example:The above shows us the aggregated metrics for existing metrics (first two lines), then check results for each app:
Total
means how many checks were made by the appError
are how many times the throttler returned with non-success, out oftotal
Any
is a combination of all appsMysqlShard
are checks for shard lag (the standard/throttler/check
API call)MysqlSelf
are checks for the state of the specific tablet's MySQL (/throttler/check-self
API call)Implementation notes
The metrics exported here require a
float64
gauge. see for examplethrottler.aggregated.mysql.shard
, which tells us the replicatoin lag on a shard. It is imperative that we have subsecond resolution, and a fraction number makes sense (it would be possible to achieve the same withuint64
as nanoseconds, but we inherit the fraction behavior fromfreno
, and it's been invitess
for multiple versions now).To that effect, I created
Gauge64
, which in turn means changes inprometheus
,opentsdb
,statsd
,exporter
, ....Notable,
exporter
assumes all counters/gauges areint64
based; notice I haven't found a good solution and implemented like so:Please review carefully those parts and let me know if there is a risk in there.
Related Issue(s)
Checklist
Initial PR had these variables under a different format:
This comment is updated to reflect the new format.