enhancement(throttle transform): Allow throttling by bytes #14280

jutley · 2022-09-04T20:24:52Z

Implements #11854.

This is my first time programming in rust so some things may be strange. Feedback is greatly appreciated.

I started looking into updating the .cue files for documentation. I saw there are reusable components here, such as the encoding configuration I would like to use. However, this currently is only scoped to sinks. The fact that I am using it here in a transform is totally new. I'm not sure whether this is a sign that I am doing something in a very strange way, or if I should just update the necessary pieces to support the encoding feature within transforms. If someone can help me answer this, it would be a big help for me.

bits-bot · 2022-09-04T20:24:55Z

All committers have signed the CLA.

netlify · 2022-09-04T20:26:01Z

✅ Deploy Preview for vector-project ready!

Name	Link
🔨 Latest commit	`4399f35`
🔍 Latest deploy log	https://app.netlify.com/sites/vector-project/deploys/63e3d80922e98d000897211f
😎 Deploy Preview	https://deploy-preview-14280--vector-project.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

github-actions · 2022-09-07T17:36:17Z

Soak Test Results

Baseline: 2586b52
Comparison: cb691eb
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
syslog_splunk_hec_logs	608.32KiB	3.77	100.00%	15.75MiB	1.03MiB	21.38KiB	0.0651153	16.34MiB	930.16KiB	18.96KiB	0.0555667	False	False
datadog_agent_remap_blackhole	2.15MiB	3.51	100.00%	61.35MiB	4.05MiB	84.38KiB	0.0659798	63.5MiB	3.23MiB	67.36KiB	0.0508445	False	False
datadog_agent_remap_blackhole_acks	1.83MiB	2.98	100.00%	61.19MiB	3.95MiB	82.27KiB	0.064524	63.02MiB	3.83MiB	79.82KiB	0.0607075	False	False
syslog_loki	321.13KiB	2.17	100.00%	14.47MiB	613.04KiB	12.56KiB	0.0413662	14.78MiB	918.5KiB	18.67KiB	0.0606632	False	False
syslog_log2metric_humio_metrics	248.02KiB	1.97	100.00%	12.3MiB	757.88KiB	15.46KiB	0.0601492	12.54MiB	759.87KiB	15.49KiB	0.059143	False	False
http_pipelines_blackhole_acks	22.79KiB	1.88	100.00%	1.18MiB	131.4KiB	2.67KiB	0.108566	1.2MiB	103.97KiB	2.12KiB	0.0843155	False	False
syslog_regex_logs2metric_ddmetrics	229.05KiB	1.8	100.00%	12.42MiB	728.46KiB	14.84KiB	0.0572878	12.64MiB	643.74KiB	13.12KiB	0.0497294	False	False
syslog_humio_logs	186.8KiB	1.14	100.00%	16.05MiB	833.03KiB	17.01KiB	0.0506664	16.24MiB	767.09KiB	15.71KiB	0.0461313	False	False
splunk_hec_route_s3	150.87KiB	0.78	98.20%	18.81MiB	2.17MiB	45.26KiB	0.115517	18.96MiB	2.15MiB	44.91KiB	0.113142	False	False
datadog_agent_remap_datadog_logs_acks	395.18KiB	0.63	99.95%	61.0MiB	3.34MiB	69.66KiB	0.0546646	61.39MiB	4.32MiB	89.87KiB	0.0703187	False	False
http_to_http_acks	92.46KiB	0.52	30.65%	17.32MiB	8.26MiB	172.63KiB	0.476605	17.41MiB	7.61MiB	158.9KiB	0.437065	True	True
http_pipelines_no_grok_blackhole	46.35KiB	0.41	96.34%	10.93MiB	55.81KiB	1.14KiB	0.00498346	10.98MiB	1.06MiB	22.13KiB	0.0967773	False	False
datadog_agent_remap_datadog_logs	78.02KiB	0.13	60.47%	59.95MiB	1.31MiB	27.45KiB	0.0218366	60.03MiB	4.21MiB	87.57KiB	0.0700408	False	False
splunk_hec_to_splunk_hec_logs_noack	6.65KiB	0.03	47.83%	23.83MiB	380.68KiB	7.78KiB	0.015596	23.84MiB	336.72KiB	6.87KiB	0.0137913	False	False
enterprise_http_to_http	-1.06KiB	-0	11.55%	23.85MiB	251.57KiB	5.14KiB	0.0103004	23.84MiB	254.99KiB	5.22KiB	0.010441	False	False
splunk_hec_indexer_ack_blackhole	517.05B	0	1.47%	23.74MiB	956.97KiB	19.46KiB	0.0393575	23.74MiB	951.56KiB	19.35KiB	0.0391342	False	False
splunk_hec_to_splunk_hec_logs_acks	-6.48KiB	-0.03	20.32%	23.75MiB	863.01KiB	17.55KiB	0.0354779	23.74MiB	886.87KiB	18.04KiB	0.0364683	False	False
file_to_blackhole	-53.15KiB	-0.05	49.64%	95.37MiB	2.59MiB	53.59KiB	0.0271021	95.31MiB	2.82MiB	58.67KiB	0.0295701	False	False
http_to_http_json	-49.9KiB	-0.2	99.97%	23.85MiB	328.73KiB	6.71KiB	0.013459	23.8MiB	585.38KiB	11.94KiB	0.0240158	False	False
fluent_elasticsearch	-209.01KiB	-0.26	100.00%	79.47MiB	53.75KiB	1.09KiB	0.00066033	79.27MiB	1.74MiB	35.84KiB	0.0219808	False	False
http_to_http_noack	-135.69KiB	-0.56	100.00%	23.84MiB	265.4KiB	5.43KiB	0.0108673	23.71MiB	1.24MiB	25.87KiB	0.052353	False	False
http_text_to_http_json	-317.18KiB	-0.79	100.00%	39.11MiB	1.4MiB	29.31KiB	0.0358426	38.8MiB	1.01MiB	21.12KiB	0.0260347	False	False
http_pipelines_blackhole	-14.65KiB	-0.87	100.00%	1.65MiB	10.51KiB	219.88B	0.00621472	1.64MiB	113.0KiB	2.3KiB	0.0674235	False	False
syslog_log2metric_splunk_hec_metrics	-257.92KiB	-1.42	100.00%	17.69MiB	884.08KiB	18.01KiB	0.0487961	17.44MiB	1.16MiB	24.17KiB	0.0664929	False	False
socket_to_socket_blackhole	-445.29KiB	-1.91	100.00%	22.82MiB	701.42KiB	14.32KiB	0.0300077	22.39MiB	642.29KiB	13.11KiB	0.0280121	False	False

github-actions · 2022-09-09T01:11:23Z

Soak Test Results

Baseline: c745d25
Comparison: f32ee88
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
datadog_agent_remap_blackhole	3.04MiB	5.1	100.00%	59.6MiB	4.05MiB	84.44KiB	0.0679976	62.64MiB	3.2MiB	66.73KiB	0.0510647	False	False
datadog_agent_remap_blackhole_acks	2.58MiB	4.22	100.00%	61.18MiB	4.25MiB	88.55KiB	0.0695017	63.76MiB	3.67MiB	76.57KiB	0.0575279	False	False
syslog_loki	435.66KiB	2.91	100.00%	14.61MiB	279.27KiB	5.72KiB	0.0186607	15.04MiB	757.43KiB	15.4KiB	0.0491792	False	False
syslog_log2metric_humio_metrics	317.2KiB	2.45	100.00%	12.66MiB	189.34KiB	3.86KiB	0.0146012	12.97MiB	514.69KiB	10.48KiB	0.038742	False	False
datadog_agent_remap_datadog_logs_acks	1.18MiB	2	100.00%	58.84MiB	3.54MiB	73.82KiB	0.0600825	60.02MiB	4.36MiB	90.7KiB	0.0725841	False	False
http_pipelines_no_grok_blackhole	155.56KiB	1.42	100.00%	10.72MiB	113.04KiB	2.31KiB	0.0102972	10.87MiB	1.06MiB	22.11KiB	0.0976729	False	False
datadog_agent_remap_datadog_logs	872.47KiB	1.4	100.00%	60.85MiB	935.09KiB	19.15KiB	0.0150034	61.7MiB	4.19MiB	87.31KiB	0.0679362	False	False
syslog_splunk_hec_logs	185.8KiB	1.11	100.00%	16.33MiB	750.24KiB	15.27KiB	0.0448626	16.51MiB	506.66KiB	10.35KiB	0.0299641	False	False
http_text_to_http_json	433.98KiB	1.1	100.00%	38.45MiB	852.36KiB	17.4KiB	0.0216424	38.88MiB	798.22KiB	16.29KiB	0.0200466	False	False
syslog_humio_logs	83.28KiB	0.5	100.00%	16.42MiB	424.24KiB	8.67KiB	0.0252312	16.5MiB	434.34KiB	8.9KiB	0.0257045	False	False
syslog_log2metric_splunk_hec_metrics	87.73KiB	0.48	100.00%	17.91MiB	615.53KiB	12.54KiB	0.0335531	18.0MiB	860.02KiB	17.49KiB	0.0466573	False	False
splunk_hec_route_s3	65.97KiB	0.34	67.42%	18.72MiB	2.29MiB	47.62KiB	0.1222	18.78MiB	2.26MiB	47.32KiB	0.120461	False	False
http_to_http_acks	53.46KiB	0.3	18.37%	17.39MiB	7.92MiB	165.53KiB	0.455213	17.45MiB	7.67MiB	159.94KiB	0.439708	True	True
http_pipelines_blackhole_acks	1.07KiB	0.09	23.84%	1.18MiB	133.71KiB	2.72KiB	0.110355	1.18MiB	109.26KiB	2.23KiB	0.0900974	False	False
splunk_hec_indexer_ack_blackhole	17.78KiB	0.07	49.42%	23.74MiB	959.33KiB	19.51KiB	0.039455	23.76MiB	896.93KiB	18.25KiB	0.0368617	False	False
splunk_hec_to_splunk_hec_logs_noack	4.12KiB	0.02	30.70%	23.83MiB	388.99KiB	7.94KiB	0.0159355	23.84MiB	331.58KiB	6.77KiB	0.0135813	False	False
enterprise_http_to_http	-1.72KiB	-0.01	18.38%	23.85MiB	253.93KiB	5.18KiB	0.0103973	23.84MiB	257.76KiB	5.27KiB	0.0105547	False	False
file_to_blackhole	-42.48KiB	-0.04	34.82%	95.34MiB	2.94MiB	61.01KiB	0.0308625	95.3MiB	3.44MiB	71.69KiB	0.0361286	False	False
splunk_hec_to_splunk_hec_logs_acks	-18.74KiB	-0.08	55.95%	23.76MiB	810.07KiB	16.48KiB	0.0332828	23.75MiB	877.27KiB	17.84KiB	0.0360716	False	False
http_to_http_json	-54.01KiB	-0.22	99.99%	23.85MiB	333.64KiB	6.81KiB	0.0136599	23.79MiB	605.15KiB	12.34KiB	0.024831	False	False
fluent_elasticsearch	-209.61KiB	-0.26	100.00%	79.47MiB	53.82KiB	1.09KiB	0.000661234	79.27MiB	1.89MiB	38.99KiB	0.0238994	False	False
http_to_http_noack	-69.04KiB	-0.28	99.67%	23.83MiB	522.25KiB	10.68KiB	0.0213981	23.76MiB	1.0MiB	20.92KiB	0.04219	False	False
http_pipelines_blackhole	-5.98KiB	-0.35	99.28%	1.68MiB	36.26KiB	758.45B	0.0210701	1.67MiB	102.86KiB	2.1KiB	0.0599896	False	False
syslog_regex_logs2metric_ddmetrics	-133.19KiB	-1.03	100.00%	12.59MiB	639.01KiB	13.02KiB	0.0495621	12.46MiB	727.28KiB	14.82KiB	0.0569971	False	False
socket_to_socket_blackhole	-739.41KiB	-3.18	100.00%	22.7MiB	895.33KiB	18.28KiB	0.038516	21.97MiB	1.08MiB	22.54KiB	0.0490571	False	False

bruceg

In addition to my comments on the unit/mode enums below, this needs to update the documentation in website/cue/reference/components/transforms/throttle.cue, along with a big warning about the increased CPU requirements the extra encoding step will cause.

bruceg · 2022-09-09T14:29:57Z

src/transforms/throttle.rs

+    #[serde(default = "default_unit")]
+    unit: ThrottleUnit,


I think mode would describe this setting better, as it is changing between different modes of operation, with the "bytes" mode turning on encoding.

This is how I originally named it, but I decided against it because I can see "mode" becoming overloaded. There is another ticket about throttling having different modes for when capacities are hit (dropping data vs. applying backpressure). I moved to "unit" because it was a bit more precise in terms of what the threshold ends up being a measure of.

I'm totally okay with making the change to unit if you still prefer that, but I wanted to offer the counterargument.

bruceg · 2022-09-09T23:31:17Z

src/transforms/throttle.rs

+    /// The throttling unit to use.
+    #[serde(default = "default_unit")]
+    unit: ThrottleUnit,
+
+    /// The encoding to use if throttling by bytes
+    encoding: Option<EncodingConfig>,


I think this could be done with an enum using serde's internally_tagged mode, which would make it impossible to configure the "bytes" mode without an encoding:

enum ThrottleMode { Events, Bytes { encoding: EncodingConfig, }, }

From what I understand, this is possible, with a caveat. If we want these fields at the top-level, then I believe we need to use the flatten serde config. However, this cannot be used in conjunction with deny_unknown_fields. Removing deny_unknown_fields seems to spoil the certainty that this change is supposed to introduce. It is very possible I am missing something though.

Alternatively, we can avoid flattening. Nothing wrong with this, but the name gets a little tricky because we may end up with structs that look like this. I can try to experiment with options, but any further suggestions are appreciated. I am learning as I am going.

mode: mode: events

Yes, serde enum encoding is tricky and, yes, dropping deny_unknown_fields is undesirable. I don't have a good suggestion here other than a custom deserializer, ~~so probably leaving the encoding as a separate parameter is the most reasonable path forward.~~ Edit: Since you have already made the change and we have other components with deny_unknown_fields, I'd say leave it here and we'll see what others think.

I happened to come across a similar scenario with the syslog source. It has a mode field which is an enum with a lot of configuration options. It uses flattening and internal tagging while accepting unknown fields, just as this PR's current state is doing. I think this is a sign that this is the preferred approach.
https://github.com/vectordotdev/vector/blob/master/src/sources/syslog.rs#L36-L37

src/transforms/throttle.rs

github-actions · 2022-09-14T00:43:57Z

Soak Test Results

Baseline: 2c58589
Comparison: b65250d
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
datadog_agent_remap_blackhole_acks	2.98MiB	4.84	100.00%	61.57MiB	4.13MiB	86.04KiB	0.067073	64.56MiB	2.13MiB	44.52KiB	0.0329122	False	False
datadog_agent_remap_blackhole	2.52MiB	4.14	100.00%	60.74MiB	4.28MiB	89.05KiB	0.0703917	63.25MiB	2.91MiB	60.8KiB	0.0460502	False	False
syslog_loki	529.09KiB	3.77	100.00%	13.72MiB	558.21KiB	11.43KiB	0.0397193	14.24MiB	882.18KiB	17.93KiB	0.0604933	False	False
http_text_to_http_json	1.35MiB	3.58	100.00%	37.62MiB	848.68KiB	17.32KiB	0.0220248	38.97MiB	835.33KiB	17.05KiB	0.0209285	False	False
datadog_agent_remap_datadog_logs	2.11MiB	3.46	100.00%	60.85MiB	3.22MiB	67.49KiB	0.0529424	62.95MiB	4.23MiB	88.18KiB	0.0672508	False	False
http_to_http_acks	559.44KiB	3.24	98.06%	16.84MiB	8.04MiB	168.12KiB	0.477456	17.38MiB	8.15MiB	170.15KiB	0.468428	True	True
syslog_splunk_hec_logs	458.66KiB	2.82	100.00%	15.88MiB	891.7KiB	18.16KiB	0.0548111	16.33MiB	708.77KiB	14.46KiB	0.042372	False	False
syslog_log2metric_splunk_hec_metrics	310.98KiB	1.75	100.00%	17.33MiB	722.75KiB	14.73KiB	0.0407113	17.64MiB	870.69KiB	17.73KiB	0.0482003	False	False
syslog_regex_logs2metric_ddmetrics	197.01KiB	1.55	100.00%	12.45MiB	581.01KiB	11.84KiB	0.0455601	12.64MiB	500.1KiB	10.2KiB	0.038619	False	False
syslog_humio_logs	254.0KiB	1.52	100.00%	16.27MiB	239.44KiB	4.89KiB	0.0143686	16.52MiB	153.4KiB	3.14KiB	0.00906698	False	False
splunk_hec_route_s3	183.93KiB	0.96	99.49%	18.64MiB	2.27MiB	47.19KiB	0.121524	18.82MiB	2.18MiB	45.55KiB	0.115646	False	False
http_pipelines_blackhole_acks	9.2KiB	0.74	99.97%	1.21MiB	103.33KiB	2.1KiB	0.0833401	1.22MiB	68.97KiB	1.41KiB	0.0552168	False	False
http_pipelines_no_grok_blackhole	46.47KiB	0.42	95.84%	10.77MiB	116.23KiB	2.37KiB	0.0105348	10.82MiB	1.09MiB	22.67KiB	0.10065	False	False
syslog_log2metric_humio_metrics	44.64KiB	0.35	98.76%	12.52MiB	565.53KiB	11.54KiB	0.0440999	12.56MiB	668.34KiB	13.62KiB	0.0519357	False	False
splunk_hec_indexer_ack_blackhole	23.33KiB	0.1	64.39%	23.74MiB	925.39KiB	18.82KiB	0.0380603	23.76MiB	829.82KiB	16.89KiB	0.0340967	False	False
splunk_hec_to_splunk_hec_logs_noack	10.4KiB	0.04	64.41%	23.83MiB	436.09KiB	8.9KiB	0.0178696	23.84MiB	337.74KiB	6.89KiB	0.0138336	False	False
http_pipelines_blackhole	-2.01B	-0	0.08%	1.66MiB	22.85KiB	478.44B	0.0134438	1.66MiB	87.96KiB	1.79KiB	0.0517506	False	False
splunk_hec_to_splunk_hec_logs_acks	-5.2KiB	-0.02	17.21%	23.76MiB	829.22KiB	16.87KiB	0.0340686	23.76MiB	834.22KiB	16.97KiB	0.0342815	False	False
enterprise_http_to_http	-3.71KiB	-0.02	38.83%	23.85MiB	251.4KiB	5.13KiB	0.010293	23.84MiB	253.88KiB	5.19KiB	0.0103961	False	False
file_to_blackhole	-35.65KiB	-0.04	31.40%	95.34MiB	3.18MiB	66.0KiB	0.0333852	95.3MiB	2.81MiB	58.5KiB	0.0295081	False	False
http_to_http_json	-41.32KiB	-0.17	99.80%	23.84MiB	345.71KiB	7.06KiB	0.0141567	23.8MiB	556.12KiB	11.34KiB	0.0228111	False	False
fluent_elasticsearch	-220.99KiB	-0.27	99.77%	79.47MiB	52.45KiB	1.06KiB	0.000644356	79.26MiB	3.52MiB	72.48KiB	0.0444091	False	False
http_to_http_noack	-86.2KiB	-0.35	99.94%	23.83MiB	515.08KiB	10.52KiB	0.0211056	23.74MiB	1.1MiB	22.88KiB	0.0462045	False	False
socket_to_socket_blackhole	-609.89KiB	-2.54	100.00%	23.42MiB	381.89KiB	7.8KiB	0.015919	22.83MiB	350.6KiB	7.16KiB	0.0149959	False	False
datadog_agent_remap_datadog_logs_acks	-4.59MiB	-7.44	100.00%	61.72MiB	3.55MiB	74.18KiB	0.0575641	57.13MiB	6.92MiB	144.08KiB	0.121131	False	False

jutley · 2022-09-16T18:23:36Z

I believe this PR is now ready for a more proper review. I have updated the cue docs and swapped out the naming from unit to mode.

JeanMertz · 2022-09-19T12:47:54Z

Hi @jutley,

We discussed this contribution during our team check-in. We are excited for this transform to receive this feature. One concern we have with this PR, is that the requirement to serialize the data to count the bytes can introduce a significant, and unexpected overhead to this transform.

While our concern is based on assumptions, those assumptions are based on our experiences dealing with– and past benchmarking of event serialization in the hot path.

One potential solution we could see is using work that is currently in-flight (and should land this week) to estimate the JSON encoded size of an event, with minimal overhead.

There are two downsides to this:

As mentioned, it is an estimation (for performance reasons, this implementation does not count added escape characters, or replaced invalid UTF-8 byte sequences during serialization).
It only works for estimating the JSON encoded size of an event, no other format is supported.

We were wondering if any of these two downsides would be an issue for your use-case? If not, we would probably prefer that direction for this PR, while keeping the possibility open (thus, baking this into the config API) to add exact byte size counting in the future, if enough users have a need for it, and accept the trade-off of the reduced throughput.

Additionally, we could introduce a new soak test in this PR, that takes an existing test, and adds the throttle transform in the middle, using byte-size throttling, to measure the impact on the overall throughput.

jutley · 2022-09-19T15:29:08Z

@JeanMertz These ideas sound totally reasonable. Using a true encoder has always felt simultaneously elegant and incredibly clumsy to me. I'll wait for this encoding estimation work to land and then continue from there.

I think the direction you are suggesting will work. That said, there is a bit of flexibility that the encoding provides that I was planning on exploring. In particular, I think it will be useful to be able to choose the fields to use in the encoding, or to choose the text encoding to get just the message field. I'm not sure what options the JSON encoding estimation will provide.

If this feature doesn't offer much configuration, I may look to implement different methods for getting a byte count for an event. If nothing else, I think it'll be useful (at least for me) to support the full JSON and just the message.

netlify · 2022-11-06T04:11:40Z

✅ Deploy Preview for vrl-playground ready!

Name	Link
🔨 Latest commit	`4399f35`
🔍 Latest deploy log	https://app.netlify.com/sites/vrl-playground/deploys/63e3d80909541f0008eabe98
😎 Deploy Preview	https://deploy-preview-14280--vrl-playground.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

jutley · 2022-11-06T17:52:38Z

@JeanMertz I managed to find a little time to get back to this PR. I rewrote it to use the estimated json size, and am also supporting the bytes of the message field. I haven't written a soak test, but I can figure it out if you think it is necessary.

jutley · 2022-11-17T21:56:58Z

@JeanMertz anything I can do to help move this along?

github-actions · 2023-02-06T14:50:36Z

Regression Test Results

Run ID: 9c2dc1af-ed9f-4eff-b0cf-ef77391eb92f
Baseline: 50933e9
Comparison: f0d84f0
Total vector CPUs: 7

Explanation

A regression test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their bytes_written_per_cpu_second performance between baseline and comparison SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5% change in mean bytes_written_per_cpu_second are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in bytes_written_per_cpu_second with confidence ≥ 90.00% and absolute Δ mean >= ±5%.

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
syslog_regex_logs2metric_ddmetrics	108.83KiB/CPU-s	2.96	100.00%	3.59MiB/CPU-s	487.13KiB/CPU-s	6.29KiB/CPU-s	0.132447	3.7MiB/CPU-s	461.85KiB/CPU-s	5.96KiB/CPU-s	0.121964	True	False
syslog_log2metric_humio_metrics	175.77KiB/CPU-s	2.8	100.00%	6.13MiB/CPU-s	191.95KiB/CPU-s	2.48KiB/CPU-s	0.030564	6.3MiB/CPU-s	101.11KiB/CPU-s	1.31KiB/CPU-s	0.015661	False	False
socket_to_socket_blackhole	298.14KiB/CPU-s	2.25	100.00%	12.91MiB/CPU-s	426.64KiB/CPU-s	5.51KiB/CPU-s	0.032264	13.2MiB/CPU-s	354.21KiB/CPU-s	4.58KiB/CPU-s	0.026196	False	False
http_text_to_http_json	366.06KiB/CPU-s	1.54	100.00%	23.18MiB/CPU-s	946.07KiB/CPU-s	12.21KiB/CPU-s	0.039855	23.54MiB/CPU-s	772.67KiB/CPU-s	9.97KiB/CPU-s	0.032056	False	False
http_to_http_acks	72.62KiB/CPU-s	1.36	84.23%	5.23MiB/CPU-s	2.73MiB/CPU-s	36.07KiB/CPU-s	0.52148	5.3MiB/CPU-s	2.77MiB/CPU-s	36.62KiB/CPU-s	0.522565	True	False
splunk_hec_route_s3	158.25KiB/CPU-s	1.36	100.00%	11.4MiB/CPU-s	619.44KiB/CPU-s	7.99KiB/CPU-s	0.053067	11.55MiB/CPU-s	671.22KiB/CPU-s	8.66KiB/CPU-s	0.056734	False	False
otlp_grpc_to_blackhole	13.41KiB/CPU-s	1.29	100.00%	1.02MiB/CPU-s	45.41KiB/CPU-s	600.2B/CPU-s	0.043545	1.03MiB/CPU-s	49.67KiB/CPU-s	656.36B/CPU-s	0.047027	False	False
syslog_humio_logs	118.22KiB/CPU-s	1.28	100.00%	9.03MiB/CPU-s	303.06KiB/CPU-s	3.91KiB/CPU-s	0.032762	9.15MiB/CPU-s	160.11KiB/CPU-s	2.07KiB/CPU-s	0.01709	False	False
syslog_splunk_hec_logs	73.06KiB/CPU-s	0.78	100.00%	9.09MiB/CPU-s	116.54KiB/CPU-s	1.5KiB/CPU-s	0.012518	9.16MiB/CPU-s	229.71KiB/CPU-s	2.96KiB/CPU-s	0.02448	False	False
datadog_agent_remap_datadog_logs	183.01KiB/CPU-s	0.53	100.00%	33.91MiB/CPU-s	1.35MiB/CPU-s	17.83KiB/CPU-s	0.039805	34.09MiB/CPU-s	1.22MiB/CPU-s	16.12KiB/CPU-s	0.035792	False	False
syslog_log2metric_splunk_hec_metrics	45.71KiB/CPU-s	0.47	100.00%	9.41MiB/CPU-s	137.63KiB/CPU-s	1.78KiB/CPU-s	0.01428	9.46MiB/CPU-s	275.34KiB/CPU-s	3.56KiB/CPU-s	0.028434	False	False
http_to_http_json	6.98KiB/CPU-s	0.05	90.39%	13.61MiB/CPU-s	239.51KiB/CPU-s	3.09KiB/CPU-s	0.017182	13.62MiB/CPU-s	219.95KiB/CPU-s	2.84KiB/CPU-s	0.015771	False	False
splunk_hec_indexer_ack_blackhole	3.9KiB/CPU-s	0.03	60.87%	13.62MiB/CPU-s	262.23KiB/CPU-s	3.38KiB/CPU-s	0.018806	13.62MiB/CPU-s	235.2KiB/CPU-s	3.03KiB/CPU-s	0.016863	False	False
splunk_hec_to_splunk_hec_logs_noack	3.58KiB/CPU-s	0.03	65.68%	13.62MiB/CPU-s	220.87KiB/CPU-s	2.85KiB/CPU-s	0.015833	13.63MiB/CPU-s	191.71KiB/CPU-s	2.48KiB/CPU-s	0.01374	False	False
file_to_blackhole	8.6KiB/CPU-s	0.02	31.77%	54.49MiB/CPU-s	1.12MiB/CPU-s	14.85KiB/CPU-s	0.020623	54.5MiB/CPU-s	1.13MiB/CPU-s	14.87KiB/CPU-s	0.020662	False	False
enterprise_http_to_http	2.02KiB/CPU-s	0.01	48.03%	13.62MiB/CPU-s	190.24KiB/CPU-s	2.46KiB/CPU-s	0.013637	13.62MiB/CPU-s	151.78KiB/CPU-s	1.96KiB/CPU-s	0.010879	False	False
fluent_elasticsearch	-100.71B/CPU-s	-0.0	14.31%	45.41MiB/CPU-s	30.03KiB/CPU-s	392.35B/CPU-s	0.000646	45.41MiB/CPU-s	30.42KiB/CPU-s	397.54B/CPU-s	0.000654	False	False
http_to_http_noack	118.85B/CPU-s	0.0	1.65%	13.61MiB/CPU-s	309.13KiB/CPU-s	3.99KiB/CPU-s	0.022173	13.61MiB/CPU-s	306.43KiB/CPU-s	3.95KiB/CPU-s	0.021979	False	False
splunk_hec_to_splunk_hec_logs_acks	-1.02KiB/CPU-s	-0.01	12.85%	13.62MiB/CPU-s	343.33KiB/CPU-s	4.43KiB/CPU-s	0.024621	13.62MiB/CPU-s	348.41KiB/CPU-s	4.5KiB/CPU-s	0.024987	False	False
datadog_agent_remap_blackhole	-78.45KiB/CPU-s	-0.25	99.85%	30.55MiB/CPU-s	1.12MiB/CPU-s	14.78KiB/CPU-s	0.036576	30.47MiB/CPU-s	1.5MiB/CPU-s	19.82KiB/CPU-s	0.049214	False	False
datadog_agent_remap_datadog_logs_acks	-127.75KiB/CPU-s	-0.37	100.00%	33.58MiB/CPU-s	1.28MiB/CPU-s	16.97KiB/CPU-s	0.038229	33.46MiB/CPU-s	1019.24KiB/CPU-s	13.16KiB/CPU-s	0.029748	False	False
datadog_agent_remap_blackhole_acks	-428.67KiB/CPU-s	-1.35	100.00%	30.95MiB/CPU-s	1.18MiB/CPU-s	15.58KiB/CPU-s	0.03807	30.53MiB/CPU-s	1.26MiB/CPU-s	16.59KiB/CPU-s	0.041103	False	False
syslog_loki	-127.98KiB/CPU-s	-1.41	100.00%	8.87MiB/CPU-s	236.6KiB/CPU-s	3.05KiB/CPU-s	0.026059	8.74MiB/CPU-s	324.56KiB/CPU-s	4.19KiB/CPU-s	0.036258	False	False
otlp_http_to_blackhole	-23.69KiB/CPU-s	-1.51	100.00%	1.53MiB/CPU-s	115.02KiB/CPU-s	1.48KiB/CPU-s	0.073357	1.51MiB/CPU-s	120.06KiB/CPU-s	1.55KiB/CPU-s	0.077744	False	False

jszwedko

Hi @jutley !

Thanks for this contribution! Picking this up since @JeanMertz is out on leave.

I think this is a great feature, but think we could model this a bit better by allowing specification of all three limits at once rather than a mode switch. This would let users, for example, throttle by both the number of events or their JSON size, which otherwise wouldn't be possible. I also think it makes understanding the threshold configuration field a little less confusing.

My suggested configuration UX would be something like:

[transforms.foo]
type = "throttle"
key_field = "service"
window_secs = 60
threshold.events = 1000 # default to unlimited
threshold.json_bytes = 100000 # default to unlimited
threshold.message_bytes = 1000000 # default to unlimited

Where the first threshold that was hit would start throttling the incoming messages. This would mean maintaining 3 quotas rather than just one.

What do you think of this UX? I realize you are new to Rust and so adding this functionality may be difficult but we can help guide.

jutley · 2023-02-08T17:16:29Z

@jszwedko That's a great idea. Seems like an obvious choice now that its been said! I think I can figure out the implementation by referencing what already exists. I'll reach out if I need help!

jutley · 2023-02-08T19:58:41Z

@jszwedko I started working on this (slowly), and I am running into a limitation with the RateLimiters. If one RateLimiter allows the message but the next RateLimiter does not allow the message, then the message should get dropped. However, in checking this, the message still counted against the first RateLimiter.

Looking at the RateLimiter docs, I'm not seeing any kind of dry-run functionality. Any suggestions?

jszwedko · 2023-02-10T10:57:12Z

@jszwedko I started working on this (slowly), and I am running into a limitation with the RateLimiters. If one RateLimiter allows the message but the next RateLimiter does not allow the message, then the message should get dropped. However, in checking this, the message still counted against the first RateLimiter.

Looking at the RateLimiter docs, I'm not seeing any kind of dry-run functionality. Any suggestions?

Aha, yes, you appear to be right. There isn't a way to check the quota without consuming it or to replace tokens outside of the refresh interval 🤔 Given that, I'm not seeing an easy way to maintain multiple quotas for the same input stream. I opened boinkor-net/governor#167 on the upstream crate to see if there is something we are missing.

Absent that support, I think the best we can do is something like the current implementation where you can only choose one of the characteristics to throttle on.

However! I was also thinking of a potentially more flexible interface for this though which would be to allow providing a VRL expression to express how many tokens are used by a given incoming event. The interface for this would be something like:

[transforms.foo]
type = "throttle"
key_field = "service"
window_secs = 60
threshold = 1000

tokens = "1" # the default, take one token for each event

# or

tokens = "len(.message)" # take N tokens based on the length of the message

# or

tokens = "len(encode_json(.))" # take N tokens based on the JSON-encoded length of the message

# or

tokens = """
if .status == "500" {
  0
} else {
  1
}
""" # for 500s take a different amount of tokens

The JSON one would be rather expensive since it involves encoding and I know that was the motivation for using the estimated JSON encoding in this PR. I think we could resolve that by adding a new estimated_json_size() method to VRL that returns the estimated size.

This also again has the advantage of having threshold make sense as just a "number of tokens" rather than having it represent different units depending on the mode.

jutley · 2023-02-13T17:04:12Z

That approach makes sense. I suspect that if someone wants to use the "multi-mode" approach, they can chain throttle transforms together. That should allow us to guarantee something like "no more then x events and y bytes per second per key". There are probably some subtle mathematical differences between the chained approach and the multi-mode approach, but I suspect this isn't going to be particularly important for most real world scenarios. We're dropping data intentionally either way.

It'll take me a little longer to get around to exploring this approach. I think the estimated_json_size method is completely orthogonal to this throttle tokens effort, so I'll not worry about that piece.

jszwedko · 2023-02-13T17:38:53Z

It'll take me a little longer to get around to exploring this approach. I think the estimated_json_size method is completely orthogonal to this throttle tokens effort, so I'll not worry about that piece.

Yeah, agreed, we can split that bit off into its own PR since it'll involve an addition to VRL.

jszwedko · 2023-02-15T12:10:26Z

Looks like there was a governor feature request that I missed that discusses our exact issue: boinkor-net/governor#156

jutley · 2023-05-09T23:05:43Z

Just leaving a note to say that I took a look at the recommendation approach, and while I do believe it should be possible, I am just not proficient enough in Rust to handle this in a timely manner. Small things take me a long time, and I just don't have the time for this.

If anyone else is interested in picking this up, feel free!

github-actions bot added the domain: transforms Anything related to Vector's transform components label Sep 4, 2022

bruceg added type: enhancement A value-adding code change that enhances its existing functionality. transform: throttle Anything `throttle` transform related labels Sep 7, 2022

bruceg requested changes Sep 9, 2022

View reviewed changes

bruceg previously approved these changes Sep 13, 2022

View reviewed changes

jszwedko requested review from JeanMertz and fuchsnj September 13, 2022 23:40

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Sep 15, 2022

jutley force-pushed the throttle-bytes branch from fbe5764 to e3d346b Compare September 15, 2022 23:26

jutley force-pushed the throttle-bytes branch from 23c74f1 to 0910f7e Compare November 6, 2022 04:11

github-actions bot removed the domain: external docs Anything related to Vector's external, public documentation label Nov 6, 2022

jutley force-pushed the throttle-bytes branch from 0910f7e to 3e5eea5 Compare November 6, 2022 17:38

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 6, 2022

jszwedko assigned JeanMertz Dec 2, 2022

jutley added 5 commits February 4, 2023 16:29

Introduce fixed throttle_count variable to use in limiter

2e1ee75

Introduce ThrottleMode struct defaulting to throttle by event count

ed5f561

Add logic to throttle by mode and add tests

9131a6f

Rename throttle modes to indicate limitation for log event type

c90e217

Update cue docs to include throttle mode

9782169

jutley added 3 commits February 4, 2023 16:29

Format throttle configs with cue-fmt

32c0908

Replace literal message string with function from log_schema

4febf5a

Fix formatting

6c04814

jutley force-pushed the throttle-bytes branch from f0460ad to 6c04814 Compare February 5, 2023 01:13

Tweaks to make check-clippy pass

f0d84f0

jszwedko assigned bruceg and unassigned JeanMertz Feb 7, 2023

jszwedko requested review from jszwedko and bruceg and removed request for JeanMertz and fuchsnj February 7, 2023 21:30

jszwedko reviewed Feb 8, 2023

View reviewed changes

Update generated cue docs

4399f35

jutley requested a review from a team February 8, 2023 17:12

jszwedko mentioned this pull request Feb 10, 2023

Combining rate limiters boinkor-net/governor#167

Closed

jszwedko assigned fuchsnj and unassigned bruceg Feb 13, 2023

jszwedko requested a review from fuchsnj February 13, 2023 13:43

jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Mar 24, 2023

jutley dismissed bruceg’s stale review via 4399f35 July 10, 2023 15:27

jszwedko marked this pull request as draft September 28, 2023 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(throttle transform): Allow throttling by bytes #14280

enhancement(throttle transform): Allow throttling by bytes #14280

jutley commented Sep 4, 2022

bits-bot commented Sep 4, 2022 •

edited

Loading

netlify bot commented Sep 4, 2022 •

edited

Loading

github-actions bot commented Sep 7, 2022

github-actions bot commented Sep 9, 2022

bruceg left a comment

bruceg Sep 9, 2022

jutley Sep 12, 2022

bruceg Sep 9, 2022

jutley Sep 13, 2022

bruceg Sep 13, 2022 •

edited

Loading

jutley Sep 15, 2022

github-actions bot commented Sep 14, 2022

jutley commented Sep 16, 2022

JeanMertz commented Sep 19, 2022

jutley commented Sep 19, 2022

netlify bot commented Nov 6, 2022 •

edited

Loading

jutley commented Nov 6, 2022

jutley commented Nov 17, 2022

github-actions bot commented Feb 6, 2023

jszwedko left a comment

jutley commented Feb 8, 2023

jutley commented Feb 8, 2023

jszwedko commented Feb 10, 2023 •

edited

Loading

jutley commented Feb 13, 2023

jszwedko commented Feb 13, 2023

jszwedko commented Feb 15, 2023

jutley commented May 9, 2023

enhancement(throttle transform): Allow throttling by bytes #14280

Are you sure you want to change the base?

enhancement(throttle transform): Allow throttling by bytes #14280

Conversation

jutley commented Sep 4, 2022

bits-bot commented Sep 4, 2022 • edited Loading

netlify bot commented Sep 4, 2022 • edited Loading

✅ Deploy Preview for vector-project ready!

github-actions bot commented Sep 7, 2022

Soak Test Results

github-actions bot commented Sep 9, 2022

Soak Test Results

bruceg left a comment

Choose a reason for hiding this comment

bruceg Sep 9, 2022

Choose a reason for hiding this comment

jutley Sep 12, 2022

Choose a reason for hiding this comment

bruceg Sep 9, 2022

Choose a reason for hiding this comment

jutley Sep 13, 2022

Choose a reason for hiding this comment

bruceg Sep 13, 2022 • edited Loading

Choose a reason for hiding this comment

jutley Sep 15, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 14, 2022

Soak Test Results

jutley commented Sep 16, 2022

JeanMertz commented Sep 19, 2022

jutley commented Sep 19, 2022

netlify bot commented Nov 6, 2022 • edited Loading

✅ Deploy Preview for vrl-playground ready!

jutley commented Nov 6, 2022

jutley commented Nov 17, 2022

github-actions bot commented Feb 6, 2023

Regression Test Results

jszwedko left a comment

Choose a reason for hiding this comment

jutley commented Feb 8, 2023

jutley commented Feb 8, 2023

jszwedko commented Feb 10, 2023 • edited Loading

jutley commented Feb 13, 2023

jszwedko commented Feb 13, 2023

jszwedko commented Feb 15, 2023

jutley commented May 9, 2023

bits-bot commented Sep 4, 2022 •

edited

Loading

netlify bot commented Sep 4, 2022 •

edited

Loading

bruceg Sep 13, 2022 •

edited

Loading

netlify bot commented Nov 6, 2022 •

edited

Loading

jszwedko commented Feb 10, 2023 •

edited

Loading