Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose probe metrics to Prometheus #3600

Merged
merged 5 commits into from Aug 20, 2019
Merged

Expose probe metrics to Prometheus #3600

merged 5 commits into from Aug 20, 2019

Conversation

bboreham
Copy link
Collaborator

@bboreham bboreham commented May 10, 2019

Fixes #2318

We are already timing all report, tag and tick operations via the armon/go-metrics library, added in #658. If Prometheus is in use, expose those metrics that way.

Adjust metrics naming to fit with Prometheus norms.

The previous way these metrics were exposed was via SIGUSR1, and we can only have one "sink", so make it either-or.

Example output:

# HELP scope_probe_duration_seconds scope_probe_duration_seconds
# TYPE scope_probe_duration_seconds summary
scope_probe_duration_seconds{module="Docker",operation="reporter",quantile="0.5"} 0.0005071309860795736
scope_probe_duration_seconds{module="Docker",operation="reporter",quantile="0.9"} 0.0005071309860795736
scope_probe_duration_seconds{module="Docker",operation="reporter",quantile="0.99"} 0.0005071309860795736
scope_probe_duration_seconds_sum{module="Docker",operation="reporter"} 0.0005071309860795736
scope_probe_duration_seconds_count{module="Docker",operation="reporter"} 1
scope_probe_duration_seconds{module="Docker",operation="tagger",quantile="0.5"} 3.490999915811699e-06
scope_probe_duration_seconds{module="Docker",operation="tagger",quantile="0.9"} 3.626000079748337e-06
scope_probe_duration_seconds{module="Docker",operation="tagger",quantile="0.99"} 3.626000079748337e-06
scope_probe_duration_seconds_sum{module="Docker",operation="tagger"} 0.00013303000559972133
scope_probe_duration_seconds_count{module="Docker",operation="tagger"} 4
scope_probe_duration_seconds{module="Endpoint",operation="reporter",quantile="0.5"} 0.000113927002530545
scope_probe_duration_seconds{module="Endpoint",operation="reporter",quantile="0.9"} 0.000113927002530545
scope_probe_duration_seconds{module="Endpoint",operation="reporter",quantile="0.99"} 0.000113927002530545
scope_probe_duration_seconds_sum{module="Endpoint",operation="reporter"} 0.000113927002530545
scope_probe_duration_seconds_count{module="Endpoint",operation="reporter"} 1
...
scope_probe_duration_seconds{module="Process",operation="ticker",quantile="0.5"} 0.0027851660270243883
scope_probe_duration_seconds{module="Process",operation="ticker",quantile="0.9"} 0.0027851660270243883
scope_probe_duration_seconds{module="Process",operation="ticker",quantile="0.99"} 0.0027851660270243883
scope_probe_duration_seconds_sum{module="Process",operation="ticker"} 0.0027851660270243883
scope_probe_duration_seconds_count{module="Process",operation="ticker"} 1
# HELP scope_probe_procspy_namespaces scope_probe_procspy_namespaces
# TYPE scope_probe_procspy_namespaces gauge
scope_probe_procspy_namespaces 1

Copy link
Contributor

@fbarl fbarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! 👍

Two more comments:

  1. count of conntrack errors from instrument probes with some prometheus metrics #2318 - do we do that one already?
  2. It would be nice to update the docs somewhere to tell users they can change the flag to control get the exposed Prometheus metrics

cfg := &metrics.Config{
ServiceName: "scope-probe",
TimerGranularity: time.Second,
FilterDefault: true, // Don't filter metrics by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the comment here is a bit confusing as it reads as a negation of the code :)

metrics.MeasureSinceWithLabels([]string{"duration", "seconds"}, t, []metrics.Label{
{Name: "operation", Value: "ticker"},
{Name: "module", Value: ticker.Name()},
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @rade suggested in #2318 was to track failed events as well, but you don't seem to differentiate between error and success cases in any of the 3 measurements - would it make sense to split between the cases in all three?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I'm just exposing what we already had. I can change "Fixes" to "fixes part of".

Copy link
Contributor

@fbarl fbarl May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change "Fixes" to "fixes part of".

Yeah, either that or adding the cases if it's a small enough change - your call!

@ig0rsky
Copy link

ig0rsky commented Jul 1, 2019

Any news on this? @fbarl @bboreham

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
We are already timing all report, tag and tick operations.
If Prometheus is in use, expose those metrics that way.

Adjust metrics naming to fit with Prometheus norms.

The previous way these metrics were exposed was via SIGUSR1, and we
can only have one "sink", so make it either-or.

Signed-off-by: Bryan Boreham <bryan@weave.works>
@bboreham
Copy link
Collaborator Author

bboreham commented Jul 9, 2019

I added a few more metrics so I believe this can close #2318 now

Copy link
Contributor

@fbarl fbarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Thanks for the changes!

@bboreham bboreham merged commit 5cba126 into master Aug 20, 2019
@bboreham bboreham deleted the expose-probe-metrics branch August 20, 2019 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

instrument probes with some prometheus metrics
3 participants