fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

tobz · 2021-11-24T01:57:37Z

In #9178, we changed our internal histogram handle used for internal metrics, which caused an issue with how the Prometheus sinks (exporter and remote write) would encode aggregated histograms.

Previously, the Prometheus sinks compensated for aggregated histograms where the buckets were not cumulative, but now, after the change to the internal histogram handle, the buckets were already cumulative, leading to incorrectly rendered bucket counts.

This PR switches the Histogram handle used for internal metrics to be non-cumulative, which gets the Prometheus exporter/remote write sinks back to their expected output. We've also adjusted the AgentDDSketch histogram interpolation logic to compensate for the non-cumulative histograms, as well as added more robust unit tests for histogram interpolation.

Closes #10145.

Signed-off-by: Toby Lawrence toby@nuclearfurnace.com

netlify · 2021-11-24T01:57:43Z

✔️ Deploy Preview for vector-project ready!

🔨 Explore the source changes: 3069e1b

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/619e7c0b4f4a9c00082290c8

😎 Browse the preview: https://deploy-preview-10165--vector-project.netlify.app

github-actions · 2021-11-24T02:38:17Z

Soak Test Results

Baseline: 5f9b930
Comparison: 96570b3
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.

`datadog_agent_remap_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	10.45Mi	10.48Mi	10.48Mi	10.49Mi	-0.25	-0.27
comparison	10.45Mi	10.48Mi	10.48Mi	10.48Mi	-0.07	-0.94

`datadog_agent_remap_datadog_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	19.51Mi	19.57Mi	19.58Mi	19.58Mi	-0.41	-0.63
comparison	18.12Mi	19.22Mi	19.24Mi	19.24Mi	0.03	-1.71

`fluent_elasticsearch`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	52.19Mi	52.64Mi	52.67Mi	52.68Mi	0.28	-0.76
comparison	54.02Mi	54.26Mi	54.30Mi	54.31Mi	-0.53	-1.08

`fluent_remap_aws_firehose`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	39.76Mi	39.98Mi	40.01Mi	40.02Mi	0.08	-1.48
comparison	39.30Mi	39.47Mi	39.50Mi	39.51Mi	-0.14	-0.97

`splunk_hec_route_s3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.44Mi	5.69Mi	5.81Mi	5.81Mi	0.38	-0.02
comparison	5.49Mi	5.67Mi	5.70Mi	5.72Mi	-0.04	-0.52

`splunk_transforms_splunk3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	2.40Mi	2.58Mi	2.60Mi	2.60Mi	0.13	-1.08
comparison	2.45Mi	2.57Mi	2.66Mi	2.68Mi	1.00	3.30

`syslog_humio_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	6.96Mi	7.07Mi	7.08Mi	7.08Mi	-0.02	-1.70
comparison	7.31Mi	7.44Mi	7.44Mi	7.44Mi	0.12	-1.51

`syslog_log2metric_humio_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.01Mi	5.05Mi	5.05Mi	5.05Mi	-0.05	-1.44
comparison	4.95Mi	4.98Mi	4.98Mi	4.98Mi	-0.80	-0.13

`syslog_log2metric_splunk_hec_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.16Mi	5.18Mi	5.18Mi	5.18Mi	-0.04	-1.04
comparison	5.04Mi	5.08Mi	5.08Mi	5.08Mi	-0.15	-1.24

`syslog_loki`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	4.05Mi	4.27Mi	4.31Mi	4.33Mi	-0.35	-0.43
comparison	4.14Mi	4.38Mi	4.39Mi	4.40Mi	-0.13	-1.04

`syslog_regex_logs2metric_ddmetrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.79Mi	3.81Mi	3.81Mi	3.81Mi	-0.27	-0.58
comparison	3.76Mi	3.78Mi	3.79Mi	3.79Mi	0.12	-0.93

`syslog_splunk_hec_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.06Mi	7.07Mi	7.07Mi	7.07Mi	-0.33	-0.56
comparison	7.16Mi	7.19Mi	7.20Mi	7.20Mi	-0.42	-1.08

jszwedko · 2021-11-24T04:28:11Z

Thanks for looking into this and fixing it so quickly. We recently had the idea to start listing "known issues" on releases. Would you want to add this one to 0.18.0? That'll help people coming to it know if they might want to hold off upgrading.

I marked this to be released in 0.18.1 which isn't scheduled yet, but will probably be next week.

github-actions · 2021-11-24T16:49:56Z

Soak Test Results

Baseline: 808c1e3
Comparison: d0705f5
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.

`datadog_agent_remap_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	10.43Mi	10.47Mi	10.47Mi	10.47Mi	-0.79	1.05
comparison	10.37Mi	10.39Mi	10.40Mi	10.40Mi	-0.07	-0.32

`datadog_agent_remap_datadog_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	18.51Mi	18.57Mi	18.58Mi	18.58Mi	0.22	-0.75
comparison	18.34Mi	18.39Mi	18.40Mi	18.40Mi	-0.14	-0.82

`fluent_elasticsearch`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	55.18Mi	55.47Mi	55.53Mi	55.53Mi	-0.32	-0.96
comparison	53.97Mi	54.28Mi	54.32Mi	54.33Mi	-0.42	-0.78

`fluent_remap_aws_firehose`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	40.06Mi	40.22Mi	40.26Mi	40.27Mi	0.02	0.59
comparison	39.09Mi	39.32Mi	39.35Mi	39.36Mi	0.74	-0.49

`http_pipelines_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	0.00	556.00	1.59Ki	1.59Ki	2.01	3.19
comparison	0.00	1.05Ki	1.05Ki	1.05Ki	0.23	-1.95

`splunk_hec_route_s3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.31Mi	5.50Mi	5.60Mi	5.60Mi	0.51	-0.12
comparison	5.41Mi	5.59Mi	5.62Mi	5.62Mi	-0.33	-0.56

`splunk_transforms_splunk3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	2.36Mi	2.59Mi	2.63Mi	2.64Mi	0.22	-0.64
comparison	2.40Mi	2.57Mi	2.59Mi	2.59Mi	0.07	-0.90

`syslog_humio_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.32Mi	7.36Mi	7.36Mi	7.36Mi	-1.34	1.46
comparison	7.14Mi	7.16Mi	7.17Mi	7.17Mi	0.20	-1.10

`syslog_log2metric_humio_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	4.90Mi	4.91Mi	4.91Mi	4.92Mi	0.16	-0.65
comparison	4.96Mi	4.98Mi	4.99Mi	4.99Mi	0.38	0.19

`syslog_log2metric_splunk_hec_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.16Mi	5.19Mi	5.19Mi	5.19Mi	-0.05	-1.45
comparison	5.15Mi	5.17Mi	5.18Mi	5.18Mi	0.61	-0.81

`syslog_loki`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.87Mi	4.01Mi	4.03Mi	4.04Mi	-0.16	-1.31
comparison	4.09Mi	4.25Mi	4.32Mi	4.32Mi	0.92	-0.19

`syslog_regex_logs2metric_ddmetrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.74Mi	3.75Mi	3.76Mi	3.76Mi	0.45	-0.51
comparison	3.73Mi	3.75Mi	3.75Mi	3.76Mi	0.30	-0.60

`syslog_splunk_hec_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.11Mi	7.18Mi	7.19Mi	7.19Mi	-0.13	-1.23
comparison	7.14Mi	7.16Mi	7.16Mi	7.16Mi	0.12	-0.66

…rectly Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

…ketch interpolation Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

github-actions · 2021-11-24T18:33:54Z

Soak Test Results

Baseline: 808c1e3
Comparison: 3069e1b
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.

`datadog_agent_remap_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	10.43Mi	10.46Mi	10.46Mi	10.46Mi	0.11	-0.63
comparison	10.45Mi	10.48Mi	10.49Mi	10.49Mi	-0.10	0.08

`datadog_agent_remap_datadog_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	18.75Mi	18.81Mi	18.82Mi	18.82Mi	0.16	-1.19
comparison	19.01Mi	19.16Mi	19.18Mi	19.18Mi	-0.32	-1.13

`fluent_elasticsearch`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	53.44Mi	53.68Mi	53.71Mi	53.73Mi	-0.35	-0.12
comparison	53.09Mi	53.34Mi	53.37Mi	53.37Mi	0.16	-0.93

`fluent_remap_aws_firehose`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	38.84Mi	39.17Mi	39.20Mi	39.20Mi	-0.46	-1.15
comparison	39.22Mi	39.36Mi	39.39Mi	39.39Mi	0.23	-0.99

`http_pipelines_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	0.00	1.63Ki	1.63Ki	1.63Ki	1.66	0.91
comparison	0.00	3.15Ki	3.15Ki	3.20Ki	0.09	-1.62

`splunk_hec_route_s3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.33Mi	5.54Mi	5.62Mi	5.62Mi	0.44	-0.05
comparison	5.32Mi	5.54Mi	5.58Mi	5.59Mi	0.22	-0.85

`splunk_transforms_splunk3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	2.45Mi	2.74Mi	2.78Mi	2.80Mi	0.09	-1.33
comparison	2.47Mi	2.57Mi	2.59Mi	2.60Mi	0.07	-0.78

`syslog_humio_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.14Mi	7.19Mi	7.19Mi	7.19Mi	-0.17	-0.90
comparison	7.01Mi	7.04Mi	7.05Mi	7.05Mi	0.13	-1.05

`syslog_log2metric_humio_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	4.91Mi	4.93Mi	4.93Mi	4.93Mi	0.07	-0.71
comparison	5.04Mi	5.06Mi	5.06Mi	5.06Mi	-0.94	0.61

`syslog_log2metric_splunk_hec_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.11Mi	5.14Mi	5.15Mi	5.15Mi	-0.03	-1.19
comparison	5.19Mi	5.21Mi	5.21Mi	5.21Mi	-0.18	-0.99

`syslog_loki`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.83Mi	4.61Mi	4.65Mi	4.66Mi	0.05	-1.57
comparison	4.00Mi	4.25Mi	4.27Mi	4.28Mi	0.10	-1.24

`syslog_regex_logs2metric_ddmetrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.77Mi	3.79Mi	3.79Mi	3.79Mi	-1.16	0.88
comparison	3.80Mi	3.82Mi	3.82Mi	3.82Mi	-0.15	-1.24

`syslog_splunk_hec_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.02Mi	7.10Mi	7.10Mi	7.10Mi	0.44	-1.36
comparison	7.36Mi	7.37Mi	7.37Mi	7.37Mi	-0.26	-0.60

…rectly (#10165) * fix(prometheus_exporter sink): agg histograms dont encode buckets correctly Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

github-actions bot added domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: sinks Anything related to the Vector's sinks labels Nov 24, 2021

blt approved these changes Nov 24, 2021

View reviewed changes

jszwedko added this to the Vector 0.18.1 milestone Nov 24, 2021

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 24, 2021

tobz enabled auto-merge (squash) November 24, 2021 16:02

tobz added 4 commits November 24, 2021 12:53

fix(prometheus_exporter sink): agg histograms dont encode buckets cor…

a80ecf5

…rectly Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

switch back to non-cumulative internal metrics histo handle + fix dds…

1505b3c

…ketch interpolation Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

update known issues doc for 0.18.0 release

9a2ef56

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

smol test fix

3069e1b

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tobz force-pushed the tobz/fix-prometheus-sink-agg-histograms branch from d0705f5 to 3069e1b Compare November 24, 2021 17:53

github-actions bot added the domain: sources Anything related to the Vector's sources label Nov 24, 2021

tobz merged commit 0fa7893 into master Nov 24, 2021

tobz deleted the tobz/fix-prometheus-sink-agg-histograms branch November 24, 2021 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

tobz commented Nov 24, 2021

netlify bot commented Nov 24, 2021 •

edited

Loading

github-actions bot commented Nov 24, 2021

jszwedko commented Nov 24, 2021

github-actions bot commented Nov 24, 2021

github-actions bot commented Nov 24, 2021

fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

fix(prometheus_exporter sink): agg histograms dont encode buckets correctly #10165

Conversation

tobz commented Nov 24, 2021

netlify bot commented Nov 24, 2021 • edited Loading

github-actions bot commented Nov 24, 2021

Soak Test Results

datadog_agent_remap_blackhole

datadog_agent_remap_datadog_logs

fluent_elasticsearch

fluent_remap_aws_firehose

splunk_hec_route_s3

splunk_transforms_splunk3

syslog_humio_logs

syslog_log2metric_humio_metrics

syslog_log2metric_splunk_hec_metrics

syslog_loki

syslog_regex_logs2metric_ddmetrics

syslog_splunk_hec_logs

jszwedko commented Nov 24, 2021

github-actions bot commented Nov 24, 2021

Soak Test Results

datadog_agent_remap_blackhole

datadog_agent_remap_datadog_logs

fluent_elasticsearch

fluent_remap_aws_firehose

http_pipelines_blackhole

splunk_hec_route_s3

splunk_transforms_splunk3

syslog_humio_logs

syslog_log2metric_humio_metrics

syslog_log2metric_splunk_hec_metrics

syslog_loki

syslog_regex_logs2metric_ddmetrics

syslog_splunk_hec_logs

github-actions bot commented Nov 24, 2021

Soak Test Results

datadog_agent_remap_blackhole

datadog_agent_remap_datadog_logs

fluent_elasticsearch

fluent_remap_aws_firehose

http_pipelines_blackhole

splunk_hec_route_s3

splunk_transforms_splunk3

syslog_humio_logs

syslog_log2metric_humio_metrics

syslog_log2metric_splunk_hec_metrics

syslog_loki

syslog_regex_logs2metric_ddmetrics

syslog_splunk_hec_logs

netlify bot commented Nov 24, 2021 •

edited

Loading

`datadog_agent_remap_blackhole`

`datadog_agent_remap_datadog_logs`

`fluent_elasticsearch`

`fluent_remap_aws_firehose`

`splunk_hec_route_s3`

`splunk_transforms_splunk3`

`syslog_humio_logs`

`syslog_log2metric_humio_metrics`

`syslog_log2metric_splunk_hec_metrics`

`syslog_loki`

`syslog_regex_logs2metric_ddmetrics`

`syslog_splunk_hec_logs`

`datadog_agent_remap_blackhole`

`datadog_agent_remap_datadog_logs`

`fluent_elasticsearch`

`fluent_remap_aws_firehose`

`http_pipelines_blackhole`

`splunk_hec_route_s3`

`splunk_transforms_splunk3`

`syslog_humio_logs`

`syslog_log2metric_humio_metrics`

`syslog_log2metric_splunk_hec_metrics`

`syslog_loki`

`syslog_regex_logs2metric_ddmetrics`

`syslog_splunk_hec_logs`

`datadog_agent_remap_blackhole`

`datadog_agent_remap_datadog_logs`

`fluent_elasticsearch`

`fluent_remap_aws_firehose`

`http_pipelines_blackhole`

`splunk_hec_route_s3`

`splunk_transforms_splunk3`

`syslog_humio_logs`

`syslog_log2metric_humio_metrics`

`syslog_log2metric_splunk_hec_metrics`

`syslog_loki`

`syslog_regex_logs2metric_ddmetrics`

`syslog_splunk_hec_logs`