Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

Merged
merged 4 commits into from
Nov 24, 2021

Conversation

tobz
Copy link
Contributor

@tobz tobz commented Nov 24, 2021

In #9178, we changed our internal histogram handle used for internal metrics, which caused an issue with how the Prometheus sinks (exporter and remote write) would encode aggregated histograms.

Previously, the Prometheus sinks compensated for aggregated histograms where the buckets were not cumulative, but now, after the change to the internal histogram handle, the buckets were already cumulative, leading to incorrectly rendered bucket counts.

This PR switches the Histogram handle used for internal metrics to be non-cumulative, which gets the Prometheus exporter/remote write sinks back to their expected output. We've also adjusted the AgentDDSketch histogram interpolation logic to compensate for the non-cumulative histograms, as well as added more robust unit tests for histogram interpolation.

Closes #10145.

Signed-off-by: Toby Lawrence toby@nuclearfurnace.com

@netlify
Copy link

netlify bot commented Nov 24, 2021

✔️ Deploy Preview for vector-project ready!

🔨 Explore the source changes: 3069e1b

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/619e7c0b4f4a9c00082290c8

😎 Browse the preview: https://deploy-preview-10165--vector-project.netlify.app

@github-actions github-actions bot added domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: sinks Anything related to the Vector's sinks labels Nov 24, 2021
@github-actions
Copy link

Soak Test Results

Baseline: 5f9b930
Comparison: 96570b3
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.


datadog_agent_remap_blackhole

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 10.45Mi 10.48Mi 10.48Mi 10.49Mi -0.25 -0.27
comparison 10.45Mi 10.48Mi 10.48Mi 10.48Mi -0.07 -0.94

datadog_agent_remap_datadog_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 19.51Mi 19.57Mi 19.58Mi 19.58Mi -0.41 -0.63
comparison 18.12Mi 19.22Mi 19.24Mi 19.24Mi 0.03 -1.71

fluent_elasticsearch

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 52.19Mi 52.64Mi 52.67Mi 52.68Mi 0.28 -0.76
comparison 54.02Mi 54.26Mi 54.30Mi 54.31Mi -0.53 -1.08

fluent_remap_aws_firehose

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 39.76Mi 39.98Mi 40.01Mi 40.02Mi 0.08 -1.48
comparison 39.30Mi 39.47Mi 39.50Mi 39.51Mi -0.14 -0.97

splunk_hec_route_s3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.44Mi 5.69Mi 5.81Mi 5.81Mi 0.38 -0.02
comparison 5.49Mi 5.67Mi 5.70Mi 5.72Mi -0.04 -0.52

splunk_transforms_splunk3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 2.40Mi 2.58Mi 2.60Mi 2.60Mi 0.13 -1.08
comparison 2.45Mi 2.57Mi 2.66Mi 2.68Mi 1.00 3.30

syslog_humio_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 6.96Mi 7.07Mi 7.08Mi 7.08Mi -0.02 -1.70
comparison 7.31Mi 7.44Mi 7.44Mi 7.44Mi 0.12 -1.51

syslog_log2metric_humio_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.01Mi 5.05Mi 5.05Mi 5.05Mi -0.05 -1.44
comparison 4.95Mi 4.98Mi 4.98Mi 4.98Mi -0.80 -0.13

syslog_log2metric_splunk_hec_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.16Mi 5.18Mi 5.18Mi 5.18Mi -0.04 -1.04
comparison 5.04Mi 5.08Mi 5.08Mi 5.08Mi -0.15 -1.24

syslog_loki

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 4.05Mi 4.27Mi 4.31Mi 4.33Mi -0.35 -0.43
comparison 4.14Mi 4.38Mi 4.39Mi 4.40Mi -0.13 -1.04

syslog_regex_logs2metric_ddmetrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 3.79Mi 3.81Mi 3.81Mi 3.81Mi -0.27 -0.58
comparison 3.76Mi 3.78Mi 3.79Mi 3.79Mi 0.12 -0.93

syslog_splunk_hec_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 7.06Mi 7.07Mi 7.07Mi 7.07Mi -0.33 -0.56
comparison 7.16Mi 7.19Mi 7.20Mi 7.20Mi -0.42 -1.08

@jszwedko jszwedko added this to the Vector 0.18.1 milestone Nov 24, 2021
@jszwedko
Copy link
Member

Thanks for looking into this and fixing it so quickly. We recently had the idea to start listing "known issues" on releases. Would you want to add this one to 0.18.0? That'll help people coming to it know if they might want to hold off upgrading.

I marked this to be released in 0.18.1 which isn't scheduled yet, but will probably be next week.

@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 24, 2021
@tobz tobz enabled auto-merge (squash) November 24, 2021 16:02
@github-actions
Copy link

Soak Test Results

Baseline: 808c1e3
Comparison: d0705f5
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.


datadog_agent_remap_blackhole

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 10.43Mi 10.47Mi 10.47Mi 10.47Mi -0.79 1.05
comparison 10.37Mi 10.39Mi 10.40Mi 10.40Mi -0.07 -0.32

datadog_agent_remap_datadog_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 18.51Mi 18.57Mi 18.58Mi 18.58Mi 0.22 -0.75
comparison 18.34Mi 18.39Mi 18.40Mi 18.40Mi -0.14 -0.82

fluent_elasticsearch

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 55.18Mi 55.47Mi 55.53Mi 55.53Mi -0.32 -0.96
comparison 53.97Mi 54.28Mi 54.32Mi 54.33Mi -0.42 -0.78

fluent_remap_aws_firehose

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 40.06Mi 40.22Mi 40.26Mi 40.27Mi 0.02 0.59
comparison 39.09Mi 39.32Mi 39.35Mi 39.36Mi 0.74 -0.49

http_pipelines_blackhole

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 0.00 556.00 1.59Ki 1.59Ki 2.01 3.19
comparison 0.00 1.05Ki 1.05Ki 1.05Ki 0.23 -1.95

splunk_hec_route_s3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.31Mi 5.50Mi 5.60Mi 5.60Mi 0.51 -0.12
comparison 5.41Mi 5.59Mi 5.62Mi 5.62Mi -0.33 -0.56

splunk_transforms_splunk3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 2.36Mi 2.59Mi 2.63Mi 2.64Mi 0.22 -0.64
comparison 2.40Mi 2.57Mi 2.59Mi 2.59Mi 0.07 -0.90

syslog_humio_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 7.32Mi 7.36Mi 7.36Mi 7.36Mi -1.34 1.46
comparison 7.14Mi 7.16Mi 7.17Mi 7.17Mi 0.20 -1.10

syslog_log2metric_humio_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 4.90Mi 4.91Mi 4.91Mi 4.92Mi 0.16 -0.65
comparison 4.96Mi 4.98Mi 4.99Mi 4.99Mi 0.38 0.19

syslog_log2metric_splunk_hec_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.16Mi 5.19Mi 5.19Mi 5.19Mi -0.05 -1.45
comparison 5.15Mi 5.17Mi 5.18Mi 5.18Mi 0.61 -0.81

syslog_loki

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 3.87Mi 4.01Mi 4.03Mi 4.04Mi -0.16 -1.31
comparison 4.09Mi 4.25Mi 4.32Mi 4.32Mi 0.92 -0.19

syslog_regex_logs2metric_ddmetrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 3.74Mi 3.75Mi 3.76Mi 3.76Mi 0.45 -0.51
comparison 3.73Mi 3.75Mi 3.75Mi 3.76Mi 0.30 -0.60

syslog_splunk_hec_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 7.11Mi 7.18Mi 7.19Mi 7.19Mi -0.13 -1.23
comparison 7.14Mi 7.16Mi 7.16Mi 7.16Mi 0.12 -0.66

…rectly

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>
…ketch interpolation

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>
Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>
Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>
@tobz tobz force-pushed the tobz/fix-prometheus-sink-agg-histograms branch from d0705f5 to 3069e1b Compare November 24, 2021 17:53
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Nov 24, 2021
@github-actions
Copy link

Soak Test Results

Baseline: 808c1e3
Comparison: 3069e1b
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.


datadog_agent_remap_blackhole

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 10.43Mi 10.46Mi 10.46Mi 10.46Mi 0.11 -0.63
comparison 10.45Mi 10.48Mi 10.49Mi 10.49Mi -0.10 0.08

datadog_agent_remap_datadog_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 18.75Mi 18.81Mi 18.82Mi 18.82Mi 0.16 -1.19
comparison 19.01Mi 19.16Mi 19.18Mi 19.18Mi -0.32 -1.13

fluent_elasticsearch

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 53.44Mi 53.68Mi 53.71Mi 53.73Mi -0.35 -0.12
comparison 53.09Mi 53.34Mi 53.37Mi 53.37Mi 0.16 -0.93

fluent_remap_aws_firehose

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 38.84Mi 39.17Mi 39.20Mi 39.20Mi -0.46 -1.15
comparison 39.22Mi 39.36Mi 39.39Mi 39.39Mi 0.23 -0.99

http_pipelines_blackhole

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 0.00 1.63Ki 1.63Ki 1.63Ki 1.66 0.91
comparison 0.00 3.15Ki 3.15Ki 3.20Ki 0.09 -1.62

splunk_hec_route_s3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.33Mi 5.54Mi 5.62Mi 5.62Mi 0.44 -0.05
comparison 5.32Mi 5.54Mi 5.58Mi 5.59Mi 0.22 -0.85

splunk_transforms_splunk3

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 2.45Mi 2.74Mi 2.78Mi 2.80Mi 0.09 -1.33
comparison 2.47Mi 2.57Mi 2.59Mi 2.60Mi 0.07 -0.78

syslog_humio_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 7.14Mi 7.19Mi 7.19Mi 7.19Mi -0.17 -0.90
comparison 7.01Mi 7.04Mi 7.05Mi 7.05Mi 0.13 -1.05

syslog_log2metric_humio_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 4.91Mi 4.93Mi 4.93Mi 4.93Mi 0.07 -0.71
comparison 5.04Mi 5.06Mi 5.06Mi 5.06Mi -0.94 0.61

syslog_log2metric_splunk_hec_metrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 5.11Mi 5.14Mi 5.15Mi 5.15Mi -0.03 -1.19
comparison 5.19Mi 5.21Mi 5.21Mi 5.21Mi -0.18 -0.99

syslog_loki

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 3.83Mi 4.61Mi 4.65Mi 4.66Mi 0.05 -1.57
comparison 4.00Mi 4.25Mi 4.27Mi 4.28Mi 0.10 -1.24

syslog_regex_logs2metric_ddmetrics

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 3.77Mi 3.79Mi 3.79Mi 3.79Mi -1.16 0.88
comparison 3.80Mi 3.82Mi 3.82Mi 3.82Mi -0.15 -1.24

syslog_splunk_hec_logs

EXPERIMENT VALUE_min VALUE_p90 VALUE_p99 VALUE_max VALUE_skewness VALUE_kurtosis
baseline 7.02Mi 7.10Mi 7.10Mi 7.10Mi 0.44 -1.36
comparison 7.36Mi 7.37Mi 7.37Mi 7.37Mi -0.26 -0.60

@tobz tobz merged commit 0fa7893 into master Nov 24, 2021
@tobz tobz deleted the tobz/fix-prometheus-sink-agg-histograms branch November 24, 2021 19:09
jszwedko pushed a commit that referenced this pull request Nov 29, 2021
…rectly (#10165)

* fix(prometheus_exporter sink): agg histograms dont encode buckets correctly

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram metric of adaptive concurrency rtt wrong
3 participants