Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for expirationd stats #100

Closed
no1seman opened this issue Nov 23, 2021 · 17 comments · Fixed by #111
Closed

Add metrics for expirationd stats #100

no1seman opened this issue Nov 23, 2021 · 17 comments · Fixed by #111
Assignees
Labels
customer feature A new functionality

Comments

@no1seman
Copy link

It wiil be great to add expirationd stats metrics to have an availability to see them on timeline.

@filonenko-mikhail
Copy link

It's not our business, sorry. Gently kindly closing.

@opomuc
Copy link
Contributor

opomuc commented Jan 12, 2022

why is this not our business?

it can be used by our customers on various occasions.

@opomuc opomuc reopened this Jan 12, 2022
@filonenko-mikhail
Copy link

it's expirationd business

@sharonovd sharonovd transferred this issue from tarantool/metrics Jan 12, 2022
@Totktonada Totktonada added feature A new functionality teamE labels Jan 12, 2022
@ligurio ligurio closed this as completed Jan 12, 2022
@ligurio ligurio reopened this Jan 12, 2022
@opomuc
Copy link
Contributor

opomuc commented Jan 12, 2022

I think we should not forget dashboard for such metrics. what do you think @DifferentialOrange ? can we somehow make custom panels that are displayed only if certain metrics are in the result?

@DifferentialOrange
Copy link
Member

I think we should not forget dashboard for such metrics. what do you think @DifferentialOrange ? can we somehow make custom panels that are displayed only if certain metrics are in the result?

AFAIK, Grafana do not support this (and I doubt it ever will). But if we do tarantool/grafana-dashboard#128 , "excessive" panels will be less annoying.

@no1seman
Copy link
Author

Seems that professional likers needed: grafana/grafana#13468

@ArtDu ArtDu added the customer label Feb 2, 2022
@vrogach2020
Copy link

vrogach2020 commented Feb 2, 2022

Suggested RFC:

  1. Add export method for metric that returns stats of all active tasks in formatted form for metrics.
    For example get_all_tasks_stats():
task1:
    checked_count = 1000
    expired_count = 122
    restarts = 3
    working_time = 600

task2:
    checked_count = 2000
    expired_count = 12
    restarts = 0
    working_time = 30

Stats is already collected by expirationd so this data can be obtained via expirationd.stats method.

  1. Add a metric registration method to enable expiration stats publication.

This method should register a callback in metrics.
As a target metric type we can use gauge that represent a snapshot of the collected stats.

For example: expirationd.enable_metrics().

local metrics = require('metrics')

local expiration_gauge = metrics.gauge('expirationd', 'Expirationd statistics')

metrics.register_callback(function()
    local stats = json.encode( get_all_tasks_stats() )
    expiration_gauge:set(stats, {app = 'tarantool'})
end)
  1. This metric must support prometheus.

@vrogach2020
Copy link

@Totktonada please review

@Totktonada
Copy link
Member

Comments on the RFC

Main points

  • There should be a way to disable expirationd metrics (but AFAIU nothing prevent us from enabling them by default). I propose to add expirationd.cfg({metrics = <boolean..>}). expirationd.{enable,disable}_metrics() is okay, but expirationd.cfg() looks more natural for a tarantool module, where module level options are often managed this way.
  • (Possible out of scope here) It would be good to have an API from metrics to register a module callback to enable/disable its metrics. This way all metrics could be managed from one place (from metrics).
  • (Possible out of scope here) It worth to note that we'll possibly want to add the new expiration metrics support into https://github.com/tarantool/grafana-dashboard as well.

Minor/easy points

  • rworking_time -- typo? working_time should be here?
  • Is there a need to add get_all_tasks_stats() if we already have expirationd.stats()?

@DifferentialOrange, @ligurio, you may be interested about this RFC.

@vrogach2020
Copy link

@Totktonada thanks for comments.

  • I've corrected typo 'rworking_time'.
  • expirationd.stats( task_name) now returns stats only on a specified task. So we need a method to collect them all. Maybe it can have a different name not 'get_all_tasks_stats' or implemented somehow in a metrics callback it doesn't matter.

@Totktonada
Copy link
Member

expirationd.stats() returns a map of task name and task statistics if the name argument is omitted.

expirationd/expirationd.lua

Lines 800 to 804 in 838c2d1

--- Return task statistics in table.
--
-- @string[opt] name
-- Task name. If `name` is nil, then return map of `name`:`stats`, else
-- return map with stats.

@no1seman
Copy link
Author

no1seman commented Feb 2, 2022

Seems that this must be added to this issue: #98

@Totktonada
Copy link
Member

Thanks for noticing! I propose to track the problem here and close #98 as duplicate. Here we have more information.

I would highligh the comment from @savinov as possible follow up:

Tasks liveness and errors counter would be usefull too.

@no1seman
Copy link
Author

no1seman commented Feb 2, 2022

If there is no any important requirements - let's close duplicate

@DifferentialOrange
Copy link
Member

About Prometheus

3. This metric must support prometheus.

If it is in tarantool/metrics, it already supports Prometheus. The only point you may need to address is metrics naming policy, see tarantool/metrics#117 (although even incorrect names doesn't break anything, it's just a warning).

About filling metrics collectors

stats of all active tasks in formatted form for metrics.

local stats = json.encode( get_all_tasks_stats() )
expiration_gauge:set(stats, {app = 'tarantool'})

I don't know if it is a really rough draft or non-acquaintance with tarantool/metrics, but just in case of second bet I'll write some corrections here.

Each metrics collector represents metric of a different nature. Thus you'll need four (restarts, checked_records, expired_records, working_time) different collectors. (You may unite records collectors and use three instead of four, but I think it may confuse users while not being more convenient and cost-effective than using four collectors).

Each metrics collector may represent different time series depending on labels. (In fact, we will store a single value in Tarantool and DB like Prometheus will store a whole time series.) In case of expirationd, it seems that the only label separating different time series is task name (task1, task2).

Each metrics collector observation accepts a single numeric value and labels table. Thus, if input data is

task1:
    checked_count = 1000
    expired_count = 122
    restarts = 3
    working_time = 600

task2:
    checked_count = 2000
    expired_count = 12
    restarts = 0
    working_time = 30

the code will be as follows.

metrics.register_callback(function()
    local stats = get_all_tasks_stats()

    for task_name, task_stats in pairs(stats) do
        checked_count_gauge:set(stats.checked_count, { name = task_name })
        expired_count_gauge:set(stats.expired_count, { name = task_name })
        restarts_gauge:set(stats.restarts, { name = task_name })
        working_time_gauge:set(stats.working_time, { name = task_name })
    end
end)

I don't know what { app = 'tarantool' } label should mean. Maybe it's your client approach to separate metrics, but we don't use it anywhere in Tarantool. If you want to designate that this is Tarantool metrics and not a Java ones, we use tnt_ prefixes for metric name: https://github.com/tarantool/metrics/blob/d0f697de8d4c96253d4e46d75c707c003f175a07/metrics/utils.lua#L3

Also I would prefer to use name and not task_name because for Telegraph+InfluxDB you'll need to explicitly specify a list of expected labels and we already have name in default metrics (https://www.tarantool.io/ru/doc/latest/book/monitoring/grafana_dashboard/). If we add a new label, Telegraph+InfluxDB users which will try to us expirationd metrics will need to change Telegraph labels (and it could be so easily forgotten).

About the nature of metrics collectors

As a target metric type we can use gauge that represent a snapshot of the collected stats.

Maybe this is a matter of personal preferences, but I like metrics to be the type they fit to. checked_count, expired_count and restarts definitely seems like a counters: they can only increase or remain the same, thus being monotonic non-decreasing sequence (which could be reset, for example, on Tarantool restart, while I'm never used expirationd and do not know if expirationd stats reset between runs). working_time also seems like it is monotonic and non-decreasing.

So I would prefer to use a counter here. Also tarantool/metrics#117 works vice versa: if it is a counter, you're better name it with _total suffix.

The only drawback is set code: we do not provide a :set() method for a counter, but since metrics 0.11.0 you can do

c:reset(labels)
c:inc(value, labels)

to set values to counters.
https://github.com/tarantool/metrics/blob/d0f697de8d4c96253d4e46d75c707c003f175a07/metrics/utils.lua#L12-L18

About the callback

2. Add a metric registration method to enable expiration stats publication.

Just in case you may not know: since metrics 0.10.0 you can unregister_callback and it can be used in your disable_stats.

@vrogach2020
Copy link

vrogach2020 commented Feb 3, 2022

@DifferentialOrange thanks for clearing up the important details. You are right - it's a draft code. I've just copied the example code from here . Maybe we should fix the example if it's unusable in real life?

@DifferentialOrange
Copy link
Member

DifferentialOrange commented Feb 3, 2022

@DifferentialOrange thanks for clearing up the important details. You are right - it's a draft code. I've just copied the example code from here . Maybe we should fix the example if it's unusable in real life?

Well, { app = 'tarantool' } is a bit weird, but I won't say it is unusable. On the other hand,

local current_cpu_usage = math.random()
cpu_usage_gauge:set(current_cpu_usage, {app = 'tarantool'})

clearly shows that you need to pass a number here, not a json.

(Though I remember a time when a user from community tried to use exactly this code to measure CPU and then complained about math.random() not representing CPU load. But it is quite a different story.)

@Totktonada Totktonada added the 5sp label Jun 6, 2022
oleg-jukovec added a commit that referenced this issue Jun 17, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 20, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 21, 2022
The patch adds the ability to export statistics to metrics >= 0.10.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 24, 2022
The patch adds the ability to export statistics to metrics >= 0.11.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 24, 2022
The patch adds the ability to export statistics to metrics >= 0.11.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 24, 2022
Into the GitHub workflow and the Makefile.

Closes #100
oleg-jukovec added a commit that referenced this issue Jun 24, 2022
The patch adds the ability to export statistics to metrics >= 0.11.0.
expirationd does not require the metrics package itself and tries to
use an installed one.

It also adds a new API method expirationd.cfg({options}).

Part of #100
oleg-jukovec added a commit that referenced this issue Jun 24, 2022
Into the GitHub workflow and the Makefile.

Closes #100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer feature A new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants