server/schedule: refine metrics about store limit#2404
Conversation
|
I am not sure the metrics of cost or available token is useful (due to the time interval of metrics collection?), but we can record the capacity of the store limit to avoid using pd-ctl to find it. What do you think? |
Codecov Report
@@ Coverage Diff @@
## master #2404 +/- ##
==========================================
- Coverage 76.32% 76.28% -0.05%
==========================================
Files 206 206
Lines 22018 22035 +17
==========================================
+ Hits 16805 16809 +4
- Misses 3936 3941 +5
- Partials 1277 1285 +8
Continue to review full report at Codecov.
|
I add a metric pd_schedule_store_limit_rate to show the store limit setting of pd. |
| } | ||
|
|
||
| // CollectStoreLimitMetrics collects the metrics about store limit | ||
| func (oc *OperatorController) CollectStoreLimitMetrics() { |
There was a problem hiding this comment.
Where is the new added function called?
There was a problem hiding this comment.
Where is the new added function called?
It is called in server/cluster/cluster.go:115.
|
PTAL @rleungx |
| Subsystem: "schedule", | ||
| Name: "store_limit", | ||
| Help: "Limit of store.", | ||
| }, []string{"store", "type", "limit_type"}) |
There was a problem hiding this comment.
The metrics will not be compatible with the old version of grafana template, why can't it keep the original way?
There was a problem hiding this comment.
The metrics will not be compatible with the old version of grafana template, why can't it keep the original way?
The old way has two type, "take" and "available". I use a new metrics storeLimitAvailableGauge to record the "available" type storeLimitGauge, but use a different metrics storeLimitCostCounter (it is a counter, not gauge like before) to record the original "take" type storeLimitGauge metrics.
| } | ||
| storeIDStr := strconv.FormatUint(storeID, 10) | ||
| storeLimitAvailableGauge.WithLabelValues(storeIDStr, n).Set(float64(storeLimit.Available()) / float64(storelimit.RegionInfluence[v])) | ||
| storeLimitRateGauge.WithLabelValues(storeIDStr, n).Set(storeLimit.Rate() * StoreBalanceBaseTime) |
There was a problem hiding this comment.
Users may more likely want to know the store how many operators can be produced in one minute, for example, default is 15 opm.
There was a problem hiding this comment.
The metrics storeLimitRateGauge can show the operator producing rate.
7005cd3 to
b2dfaa0
Compare
|
/merge |
|
Your auto merge job has been accepted, waiting for:
|
|
/run-all-tests |
|
/merge |
|
/run-all-tests |
Codecov Report
@@ Coverage Diff @@
## master #2404 +/- ##
==========================================
+ Coverage 77.01% 77.07% +0.05%
==========================================
Files 205 205
Lines 21948 22016 +68
==========================================
+ Hits 16903 16968 +65
- Misses 3755 3768 +13
+ Partials 1290 1280 -10
Continue to review full report at Codecov.
|
|
/merge |
|
/run-all-tests |
|
@wangrzneu merge failed. |
|
/merge |
|
/run-all-tests |
|
/merge |
|
/run-all-tests |
|
@wangrzneu merge failed. |
|
/run-all-tests |
|
/merge |
|
/run-all-tests |
|
@wangrzneu merge failed. |
|
/merge |
|
/run-all-tests |
What problem does this PR solve?
refine metrics about store limit: add storeLimitCostCounter and storeLimitAvailableGauge to monitor the status of store limiter.
What is changed and how it works?
refine metrics about store limit
Check List
Tests
sysbench --mysql-host=127.0.0.1 --mysql-port=4000 --mysql-user=root --table-size=1000000 oltp_insert preparepd-ctl -u 0.0.0.0:32774 store limit all 1 region-addandpd-ctl -u 0.0.0.0:32774 store limit all 1 region-removeto set the store limitpd-ctl -u 0.0.0.0:32774 store delete 3057to remove one storecurl 0.0.0.0:32774/metrics | grep store_limitto get the metrics. The metrics should be like:Side effects
The metrics pd_schedule_store_limit is removed.