Skip to content

fix: actually batch TCP source decoder outputs #10506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 21, 2021

Conversation

lukesteensen
Copy link
Member

@lukesteensen lukesteensen commented Dec 17, 2021

This came out of some strange soak test behavior in #10432. Essentially, we had reduced the change down to what should have been insignificant performance-wise (roughly, one additional allocation per batch via ready_chunks vs enqueued), but were still seeing a ~10% penalty on certain soaks.

The theory was that for those tests to be so hyper-sensitive to the performance of send_all, they must be heavily bottlenecked by it. Given that it's not supposed to be a method that things should bottleneck on, I looked into why that might be. They all were based on the TcpSource trait and utilized our new codec work to implement their respective protocols.

The interesting bit ended up being roughly here where we treat the output of the decoder stream as a "batch", setting up acks, annotating events, emitting metrics, and sending upstream for each "batch". The problem is that these are often (always in the case of syslog) not actually batches at all but single events wrapped in a SmallVec, which means we're doing all of the work mentioned previously for every single event as it comes through. This is significantly less efficient than amortizing those costs over an actual good-sized batch of events.

The solution ended up being quite a bit more complicated than I'd have liked, primarily due to the way that TcpSource has grown into a bit of a nightmare-ish piece of complexity with all of our small additions over the years. The biggest hurdle way the way that it ties acknowledgements to the generic frame type, such that we can't accumulate events across frames without keeping those frames accessible to later build acks. This was addressed by tweaking the acking trait to to build acks from groups of frames instead of individual frames, allowing us to accumulate frames themselves in a new stream combinator (ReadyFrames) before passing them into the existing logic. This leaves the basic structure intact, but ensures that we're actually trying to group up a significant number of events per batch when we're under load.

I do want to emphasize that this is not a design flaw with our codec system, and the SmallVec pattern still seems to me like a good one. It's also very likely that we had essentially equivalent behavior prior to the introduction of codecs, so I don't believe they introduced a performance regression. The problem is really that the way we integrated codecs into sources like these gives the impression that batching is happening when it's not. This wouldn't have been a terribly surprising finding if we'd looked at the code and seen that we were obviously sending single events at a time, but at a glance the code here did look like it was doing the right thing with respect to batching.

It's likely that there's more relatively low-hanging performance fruit in some of the sources that have been around longest, and we'll just need to keep this pattern in mind as we look for it, since it's not as obvious as it would be otherwise. I'm thinking about patterns and tools for detecting situations like this automatically, but a lot of it would be significantly improved by simply revisiting some of these sources and straightening out some of the old-style code that's been added to incrementally over quite a long period.

@netlify
Copy link

netlify bot commented Dec 17, 2021

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 14c246d

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61c0ffe9aadc8000086f343d

@github-actions github-actions bot added domain: codecs Anything related to Vector's codecs (encoding/decoding) domain: sources Anything related to the Vector's sources labels Dec 17, 2021
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@lukesteensen lukesteensen force-pushed the tcp-source-batching-issue branch from 8d9d9ea to f8b2054 Compare December 17, 2021 15:28
@github-actions
Copy link

Soak Test Results

Baseline: bd8f097
Comparison: f8b2054
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Test units below are bytes/second/CPU, except for "skewness". The further "skewness" is from 0.0 the more indication that vector lacks consistency in behavior, making predictions of fitness in the field challenging.

The abbreviated table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. The abbreviated table will be omitted if no statistically interesting changes are observed.

experiment Δ mean Δ mean % confidence
fluent_elasticsearch 7.91MiB 13.82 100.00%
syslog_humio_logs 2.01MiB 27.62 100.00%
syslog_log2metric_splunk_hec_metrics 1.99MiB 29.11 100.00%
syslog_splunk_hec_logs 1.73MiB 24.03 100.00%
syslog_regex_logs2metric_ddmetrics 1.52MiB 24.87 100.00%
syslog_log2metric_humio_metrics 922.07KiB 12.46 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % baseline mean baseline stdev baseline outlier percentage comparison mean comparison stdev comparison outlier percentage t-statistic p-value erratic
fluent_elasticsearch 7.91MiB 13.82 57.24MiB 1.8MiB 0 65.16MiB 481.51KiB 0.276243 -78.7868 4.7414e-239 False
fluent_remap_aws_firehose 4.89MiB 8.09 60.51MiB 1.0MiB 0 65.41MiB 1.01MiB 0 -62.0404 2.02699e-269 False
syslog_humio_logs 2.01MiB 27.62 7.29MiB 244.51KiB 0 9.31MiB 272.28KiB 0 -103.997 0 False
syslog_log2metric_splunk_hec_metrics 1.99MiB 29.11 6.84MiB 96.62KiB 0 8.83MiB 68.32KiB 0 -322.121 0 False
syslog_splunk_hec_logs 1.73MiB 24.03 7.22MiB 142.43KiB 0 8.95MiB 102.12KiB 0 -192.994 0 False
syslog_regex_logs2metric_ddmetrics 1.52MiB 24.87 6.13MiB 187.3KiB 0 7.65MiB 546.95KiB 0 -51.4222 1.42829e-189 False
syslog_log2metric_humio_metrics 922.07KiB 12.46 7.23MiB 90.53KiB 0 8.13MiB 71.28KiB 1.86916 -148.788 0 False
datadog_agent_remap_datadog_logs_acks 777.42KiB 2.07 36.64MiB 1.05MiB 0 37.4MiB 394.71KiB 0.314465 -12.4884 7.87639e-31 False
http_to_http_acks 273.33KiB 5.07 5.26MiB 2.54MiB 0 5.53MiB 2.35MiB 0 -1.46546 0.143234 True
syslog_loki 225.75KiB 3.21 6.87MiB 75.79KiB 5.49451 7.09MiB 258.63KiB 0 -15.0342 4.06327e-40 True
http_to_http_noack 189.76KiB 0.89 20.85MiB 1.54MiB 1.38504 21.04MiB 1.7MiB 0 -1.47862 0.139733 False
splunk_transforms_splunk3 94.5KiB 2.36 3.91MiB 1.44MiB 0.275482 4.0MiB 1.45MiB 0.554017 -0.859569 0.390312 False
http_pipelines_blackhole_acks 84.22KiB 10.95 769.35KiB 699.6KiB 1.63488 853.57KiB 747.94KiB 2.05882 -1.54313 0.123258 False
http_pipelines_no_grok_blackhole 63.57KiB 3.35 1.85MiB 956.16KiB 0 1.92MiB 1.1MiB 0.58309 -0.807043 0.419927 False
splunk_hec_route_s3 24.5KiB 0.31 7.74MiB 2.08MiB 1.65746 7.76MiB 2.57MiB 1.64835 -0.137906 0.890354 False
http_pipelines_blackhole 14.72KiB 1.8 815.85KiB 623.41KiB 1.10497 830.58KiB 659.31KiB 0.940439 -0.298306 0.765564 False
http_datadog_filter_blackhole -35.28KiB -2.62 1.32MiB 818.5KiB 0.31348 1.28MiB 892.82KiB 1.89873 0.518833 0.60406 False
splunk_hec_indexer_ack_blackhole -90.83KiB -0.4 22.08MiB 1.27MiB 0 21.99MiB 1.41MiB 0.277778 0.886741 0.375519 False
datadog_agent_remap_datadog_logs -254.34KiB -0.71 34.99MiB 1.12MiB 0 34.74MiB 1.07MiB 0 3.05223 0.00235466 False
splunk_hec_to_splunk_hec_logs_acks -443.31KiB -2.46 17.63MiB 1.44MiB 0.554017 17.2MiB 1.17MiB 0.552486 4.44091 1.04255e-05 False
datadog_agent_remap_blackhole -612.54KiB -2.16 27.65MiB 378.88KiB 0.527704 27.05MiB 381.28KiB 0.78534 22.2291 1.05876e-84 False
splunk_hec_to_splunk_hec_logs_noack -731.35KiB -4.06 17.57MiB 1.16MiB 0.58651 16.86MiB 1.21MiB 1.38122 7.98502 5.76164e-15 False
datadog_agent_remap_blackhole_acks -742.42KiB -2.55 28.48MiB 843.38KiB 0 27.75MiB 445.96KiB 1.78042 14.9925 2.55018e-43 False
Fine details of each soak run.
(experiment, variant) total samples mean std min average p90 p95 p99 max skewness
('fluent_remap_aws_firehose', 'comparison') 294 65.41MiB 1.01MiB 63.57MiB 65.51MiB 66.69MiB 66.91MiB 67.19MiB 67.52MiB -0.01438
('fluent_elasticsearch', 'comparison') 362 65.16MiB 481.51KiB 63.8MiB 65.16MiB 65.76MiB 65.86MiB 66.16MiB 66.37MiB -0.132971
('fluent_remap_aws_firehose', 'baseline') 363 60.51MiB 1.0MiB 58.3MiB 60.51MiB 61.85MiB 62.07MiB 62.61MiB 62.79MiB 0.028074
('fluent_elasticsearch', 'baseline') 341 57.24MiB 1.8MiB 54.0MiB 56.59MiB 59.75MiB 60.13MiB 60.56MiB 60.91MiB 0.216809
('datadog_agent_remap_datadog_logs_acks', 'comparison') 318 37.4MiB 394.71KiB 36.34MiB 37.41MiB 37.93MiB 38.05MiB 38.22MiB 38.26MiB -0.0360316
('datadog_agent_remap_datadog_logs_acks', 'baseline') 343 36.64MiB 1.05MiB 34.4MiB 36.4MiB 38.0MiB 38.18MiB 38.36MiB 38.62MiB 0.0448121
('datadog_agent_remap_datadog_logs', 'baseline') 364 34.99MiB 1.12MiB 32.69MiB 34.83MiB 36.42MiB 36.62MiB 37.01MiB 37.42MiB 0.0604584
('datadog_agent_remap_datadog_logs', 'comparison') 362 34.74MiB 1.07MiB 32.68MiB 34.58MiB 36.22MiB 36.59MiB 37.02MiB 37.39MiB 0.298029
('datadog_agent_remap_blackhole_acks', 'baseline') 382 28.48MiB 843.38KiB 26.84MiB 28.41MiB 29.54MiB 29.67MiB 29.98MiB 30.34MiB 0.0778349
('datadog_agent_remap_blackhole_acks', 'comparison') 337 27.75MiB 445.96KiB 26.42MiB 27.75MiB 28.29MiB 28.44MiB 28.76MiB 28.91MiB -0.208816
('datadog_agent_remap_blackhole', 'baseline') 379 27.65MiB 378.88KiB 26.75MiB 27.68MiB 28.12MiB 28.23MiB 28.51MiB 28.95MiB 0.0718451
('datadog_agent_remap_blackhole', 'comparison') 382 27.05MiB 381.28KiB 25.97MiB 27.05MiB 27.49MiB 27.64MiB 27.88MiB 28.39MiB -0.0669603
('splunk_hec_indexer_ack_blackhole', 'baseline') 360 22.08MiB 1.27MiB 19.04MiB 22.05MiB 23.6MiB 23.9MiB 24.74MiB 25.15MiB -0.0879681
('splunk_hec_indexer_ack_blackhole', 'comparison') 360 21.99MiB 1.41MiB 18.62MiB 21.93MiB 23.75MiB 24.44MiB 25.19MiB 26.49MiB 0.156346
('http_to_http_noack', 'comparison') 317 21.04MiB 1.7MiB 16.86MiB 20.97MiB 23.29MiB 23.99MiB 24.74MiB 25.55MiB 0.056651
('http_to_http_noack', 'baseline') 361 20.85MiB 1.54MiB 16.21MiB 20.88MiB 22.85MiB 23.41MiB 24.2MiB 24.87MiB -0.142867
('splunk_hec_to_splunk_hec_logs_acks', 'baseline') 361 17.63MiB 1.44MiB 13.23MiB 17.82MiB 19.37MiB 19.7MiB 20.39MiB 21.34MiB -0.415939
('splunk_hec_to_splunk_hec_logs_noack', 'baseline') 341 17.57MiB 1.16MiB 14.31MiB 17.64MiB 19.11MiB 19.4MiB 19.85MiB 20.83MiB -0.18024
('splunk_hec_to_splunk_hec_logs_acks', 'comparison') 362 17.2MiB 1.17MiB 12.63MiB 17.19MiB 18.66MiB 18.86MiB 19.73MiB 20.41MiB -0.29965
('splunk_hec_to_splunk_hec_logs_noack', 'comparison') 362 16.86MiB 1.21MiB 13.44MiB 16.85MiB 18.38MiB 18.86MiB 19.59MiB 20.43MiB -0.0823417
('syslog_humio_logs', 'comparison') 340 9.31MiB 272.28KiB 8.84MiB 9.47MiB 9.6MiB 9.63MiB 9.65MiB 9.72MiB -0.145463
('syslog_splunk_hec_logs', 'comparison') 364 8.95MiB 102.12KiB 8.65MiB 8.96MiB 9.08MiB 9.1MiB 9.13MiB 9.14MiB -0.243342
('syslog_log2metric_splunk_hec_metrics', 'comparison') 363 8.83MiB 68.32KiB 8.65MiB 8.83MiB 8.92MiB 8.94MiB 8.97MiB 9.01MiB 0.004002
('syslog_log2metric_humio_metrics', 'comparison') 321 8.13MiB 71.28KiB 7.81MiB 8.13MiB 8.22MiB 8.23MiB 8.28MiB 8.31MiB -0.588717
('splunk_hec_route_s3', 'comparison') 364 7.76MiB 2.57MiB 701.35KiB 7.54MiB 11.25MiB 12.26MiB 14.53MiB 15.61MiB 0.377828
('splunk_hec_route_s3', 'baseline') 362 7.74MiB 2.08MiB 2.74MiB 7.48MiB 10.28MiB 11.16MiB 13.69MiB 15.2MiB 0.576924
('syslog_regex_logs2metric_ddmetrics', 'comparison') 363 7.65MiB 546.95KiB 6.41MiB 7.87MiB 8.22MiB 8.25MiB 8.3MiB 8.33MiB -0.73841
('syslog_humio_logs', 'baseline') 341 7.29MiB 244.51KiB 6.9MiB 7.41MiB 7.56MiB 7.58MiB 7.63MiB 7.64MiB -0.117517
('syslog_log2metric_humio_metrics', 'baseline') 363 7.23MiB 90.53KiB 7.04MiB 7.23MiB 7.35MiB 7.36MiB 7.4MiB 7.41MiB 0.0217777
('syslog_splunk_hec_logs', 'baseline') 362 7.22MiB 142.43KiB 6.97MiB 7.21MiB 7.4MiB 7.42MiB 7.45MiB 7.46MiB 0.0258115
('syslog_loki', 'comparison') 319 7.09MiB 258.63KiB 6.58MiB 7.04MiB 7.48MiB 7.53MiB 7.59MiB 7.61MiB 0.352337
('syslog_loki', 'baseline') 364 6.87MiB 75.79KiB 6.58MiB 6.87MiB 6.95MiB 6.97MiB 7.0MiB 7.02MiB -1.20552
('syslog_log2metric_splunk_hec_metrics', 'baseline') 343 6.84MiB 96.62KiB 6.65MiB 6.84MiB 6.97MiB 6.98MiB 7.01MiB 7.06MiB 0.0633796
('syslog_regex_logs2metric_ddmetrics', 'baseline') 362 6.13MiB 187.3KiB 5.74MiB 6.09MiB 6.36MiB 6.39MiB 6.43MiB 6.48MiB -0.0512587
('http_to_http_acks', 'comparison') 360 5.53MiB 2.35MiB 0B 5.12MiB 8.97MiB 9.53MiB 10.43MiB 11.56MiB 0.122181
('http_to_http_acks', 'baseline') 362 5.26MiB 2.54MiB 1004.86KiB 4.78MiB 9.07MiB 9.83MiB 10.53MiB 10.96MiB 0.255567
('splunk_transforms_splunk3', 'comparison') 361 4.0MiB 1.45MiB 1.07MiB 3.93MiB 5.89MiB 6.64MiB 7.67MiB 8.16MiB 0.424385
('splunk_transforms_splunk3', 'baseline') 363 3.91MiB 1.44MiB 457.58KiB 3.78MiB 5.72MiB 6.59MiB 7.51MiB 7.96MiB 0.35125
('http_pipelines_no_grok_blackhole', 'comparison') 343 1.92MiB 1.1MiB 0B 1.82MiB 3.47MiB 3.79MiB 4.68MiB 5.79MiB 0.483692
('http_pipelines_no_grok_blackhole', 'baseline') 363 1.85MiB 956.16KiB 0B 1.78MiB 3.09MiB 3.49MiB 4.09MiB 4.43MiB 0.367945
('http_datadog_filter_blackhole', 'baseline') 319 1.32MiB 818.5KiB 0B 1.2MiB 2.38MiB 2.97MiB 3.32MiB 4.03MiB 0.610547
('http_datadog_filter_blackhole', 'comparison') 316 1.28MiB 892.82KiB 0B 1.11MiB 2.54MiB 2.93MiB 3.66MiB 3.97MiB 0.827861
('http_pipelines_blackhole_acks', 'comparison') 340 853.57KiB 747.94KiB 0B 737.88KiB 1.87MiB 2.15MiB 3.15MiB 3.4MiB 0.99065
('http_pipelines_blackhole', 'comparison') 319 830.58KiB 659.31KiB 0B 768.02KiB 1.68MiB 2.03MiB 2.49MiB 3.04MiB 0.702382
('http_pipelines_blackhole', 'baseline') 362 815.85KiB 623.41KiB 0B 692.96KiB 1.58MiB 2.0MiB 2.46MiB 3.04MiB 0.835605
('http_pipelines_blackhole_acks', 'baseline') 367 769.35KiB 699.6KiB 0B 553.97KiB 1.69MiB 2.13MiB 2.7MiB 3.11MiB 1.00811

Copy link
Contributor

@blt blt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very happy with this. I have some comments around constants mostly but this strikes me as very close to merging.

Poll::Ready(Some(Ok((frame, size)))) => {
self.enqueued.push(frame);
self.enqueued_size += size;
if self.enqueued.len() >= 1024 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this constant pulled in as a struct member, maybe not configurable? Or maybe we allow the max enqueued size to be set by the user, feed that to Vec::with_capacity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it as a member and made it match the passed capacity. I'm now slightly meh about the name capacity since it's also a soft limit, but will probably leave it unless someone has a better idea.

}
}

struct LogstashAcker {
protocol: LogstashProtocolVersion,
sequence_number: u32,
// TODO: this is very likely overkill, since there are only two protocol versions and it seems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something you intend to resolve in this PR? If not, I'd be satisfied to see an issue created, referenced in this comment block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't think it's necessary to address, but I was hoping that someone with more knowledge of the logstash protocol would be able to offer an opinion (/cc @jszwedko, since you're on much of the blame). My goal with this version was to be as defensive as possible, but that doesn't lead to the most straightforward implementation and I'd prefer not to leave something that's misleading (i.e. implies that we'll ever actually have mixed protocol versions, if we won't).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it'd be unlikely, but is possible. I'd be OK disallowing mixed protocols on a single connection if it would simplify things here, but we should have an explicit assertion that it never changes.

I think you could probably simplify this to just be a tuple of version number and sequence number rather than storing the sequence numbers per protocol version. I think we can assume, even if multiple protocol versions are being used, that a single TCP stream represents one stream of events (rather than a stream per protocol version).

bytes.push(self.protocol.into());
bytes.push(LogstashFrameType::Ack.into());
bytes.extend(self.sequence_number.to_be_bytes().iter());
let mut bytes: Vec<u8> = Vec::with_capacity(6 * self.sequence_numbers.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm, why 6 in the first place?

Copy link
Member Author

@lukesteensen lukesteensen Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 6 bytes is for protocol (u8) + frame type (u8) + sequence number (u32), so a known fixed size of an ack frame in this protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future reader would probably appreciate having a constant along the lines of

const ACK_FRAME_SIZE: usize = 6; // protocol (u8) + frame type (u8) + sequence number (u32)

or using std::mem::sizeof::<Protocol>() + std::mem::sizeof::<FrameType>() + std::mem::sizeof::<SequenceNumber>() for that matter 😄

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@lukesteensen lukesteensen marked this pull request as ready for review December 17, 2021 19:48
@github-actions
Copy link

Soak Test Results

Baseline: 21e2525
Comparison: 8bd1799
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Test units below are bytes/second/CPU, except for "skewness". The further "skewness" is from 0.0 the more indication that vector lacks consistency in behavior, making predictions of fitness in the field challenging.

The abbreviated table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. The abbreviated table will be omitted if no statistically interesting changes are observed.

experiment Δ mean Δ mean % confidence
fluent_remap_aws_firehose 8.25MiB 14.31 100.00%
fluent_elasticsearch 7.82MiB 13.88 100.00%
syslog_humio_logs 1.87MiB 25.55 100.00%
syslog_log2metric_splunk_hec_metrics 1.74MiB 24.82 100.00%
syslog_splunk_hec_logs 1.45MiB 19.68 100.00%
syslog_log2metric_humio_metrics 1.26MiB 17.59 100.00%
syslog_regex_logs2metric_ddmetrics 1.25MiB 19.8 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % baseline mean baseline stdev baseline outlier percentage comparison mean comparison stdev comparison outlier percentage t-statistic p-value erratic
fluent_remap_aws_firehose 8.25MiB 14.31 57.64MiB 542.94KiB 0.274725 65.88MiB 842.27KiB 0 -161.841 0 False
fluent_elasticsearch 7.82MiB 13.88 56.3MiB 980.81KiB 0 64.12MiB 709.3KiB 0 -125.979 0 False
syslog_humio_logs 1.87MiB 25.55 7.3MiB 320.29KiB 0 9.17MiB 379.24KiB 0 -73.2979 0 False
syslog_log2metric_splunk_hec_metrics 1.74MiB 24.82 7.02MiB 53.12KiB 0.26738 8.76MiB 80.27KiB 0 -354.199 0 False
syslog_splunk_hec_logs 1.45MiB 19.68 7.38MiB 210.85KiB 0 8.84MiB 106.78KiB 0 -117.149 0 False
syslog_log2metric_humio_metrics 1.26MiB 17.59 7.18MiB 170.65KiB 0 8.44MiB 70.1KiB 1.37741 -133.674 0 False
syslog_regex_logs2metric_ddmetrics 1.25MiB 19.8 6.31MiB 220.81KiB 0.826446 7.55MiB 551.02KiB 0 -41.2216 4.5295e-160 False
syslog_loki 405.41KiB 5.91 6.7MiB 287.83KiB 0 7.1MiB 325.67KiB 6.15836 -17.4581 1.17999e-56 True
datadog_agent_remap_blackhole 364.69KiB 1.3 27.43MiB 374.43KiB 0.526316 27.79MiB 436.73KiB 0 -11.9059 9.41357e-30 False
splunk_hec_to_splunk_hec_logs_noack 152.07KiB 0.85 17.46MiB 1.33MiB 0.879765 17.61MiB 1.21MiB 0.277008 -1.54261 0.123386 False
splunk_hec_route_s3 67.83KiB 0.86 7.71MiB 2.21MiB 1.38122 7.78MiB 2.2MiB 1.38122 -0.404865 0.685697 False
http_pipelines_blackhole 11.07KiB 1.34 826.41KiB 675.09KiB 1.36612 837.49KiB 614.94KiB 0.273224 -0.231962 0.816633 False
splunk_transforms_splunk3 -5.84KiB -0.14 4.03MiB 1.43MiB 2.76243 4.03MiB 1.45MiB 0.828729 0.0531493 0.957628 False
http_datadog_filter_blackhole -15.22KiB -1.12 1.33MiB 942.32KiB 2.50784 1.31MiB 859.86KiB 0.550964 0.219174 0.826583 False
datadog_agent_remap_blackhole_acks -18.36KiB -0.06 28.28MiB 1.08MiB 0 28.26MiB 354.83KiB 1.31579 0.289457 0.772385 False
http_pipelines_no_grok_blackhole -43.31KiB -2.25 1.88MiB 956.7KiB 2.18579 1.83MiB 1.08MiB 1.09589 0.566567 0.571186 False
http_pipelines_blackhole_acks -44.91KiB -5.22 859.77KiB 709.52KiB 0.549451 814.86KiB 737.45KiB 1.88679 0.80751 0.419664 False
http_to_http_noack -209.07KiB -0.95 21.53MiB 1.48MiB 0 21.32MiB 1.54MiB 1.0989 1.82156 0.0689338 False
http_to_http_acks -332.07KiB -5.44 5.97MiB 2.35MiB 0 5.64MiB 2.42MiB 0 1.82673 0.0681536 True
datadog_agent_remap_datadog_logs_acks -591.14KiB -1.52 37.89MiB 428.47KiB 1.17647 37.31MiB 4.27MiB 1.65746 2.55635 0.0109782 False
splunk_hec_indexer_ack_blackhole -1.09MiB -4.83 22.53MiB 1.27MiB 0.277778 21.44MiB 1.23MiB 1.875 11.3521 1.88376e-27 False
splunk_hec_to_splunk_hec_logs_acks -1.23MiB -6.78 18.12MiB 1.24MiB 1.65289 16.9MiB 1.51MiB 0.3125 11.5376 5.03278e-28 False
datadog_agent_remap_datadog_logs -2.05MiB -5.79 35.43MiB 522.37KiB 0 33.37MiB 427.38KiB 0.573066 57.6547 2.81685e-258 False
Fine details of each soak run.
(experiment, variant) total samples mean std min average p90 p95 p99 max skewness
('fluent_remap_aws_firehose', 'comparison') 371 65.88MiB 842.27KiB 64.12MiB 65.8MiB 66.98MiB 67.19MiB 67.56MiB 67.73MiB 0.148997
('fluent_elasticsearch', 'comparison') 363 64.12MiB 709.3KiB 62.29MiB 64.16MiB 65.0MiB 65.13MiB 65.5MiB 66.16MiB -0.14142
('fluent_remap_aws_firehose', 'baseline') 364 57.64MiB 542.94KiB 55.96MiB 57.68MiB 58.25MiB 58.4MiB 58.63MiB 58.86MiB -0.449529
('fluent_elasticsearch', 'baseline') 363 56.3MiB 980.81KiB 54.02MiB 56.26MiB 57.51MiB 57.8MiB 58.41MiB 58.91MiB 0.121542
('datadog_agent_remap_datadog_logs_acks', 'baseline') 340 37.89MiB 428.47KiB 36.73MiB 37.92MiB 38.4MiB 38.57MiB 38.77MiB 39.16MiB -0.147368
('datadog_agent_remap_datadog_logs_acks', 'comparison') 362 37.31MiB 4.27MiB 0B 37.75MiB 38.64MiB 38.8MiB 39.14MiB 39.36MiB -8.05656
('datadog_agent_remap_datadog_logs', 'baseline') 339 35.43MiB 522.37KiB 33.96MiB 35.42MiB 36.1MiB 36.28MiB 36.53MiB 36.66MiB -0.0647993
('datadog_agent_remap_datadog_logs', 'comparison') 349 33.37MiB 427.38KiB 31.99MiB 33.34MiB 33.93MiB 34.05MiB 34.31MiB 34.5MiB -0.0294002
('datadog_agent_remap_blackhole_acks', 'baseline') 333 28.28MiB 1.08MiB 26.16MiB 28.79MiB 29.48MiB 29.62MiB 29.8MiB 30.01MiB -0.318381
('datadog_agent_remap_blackhole_acks', 'comparison') 380 28.26MiB 354.83KiB 27.03MiB 28.27MiB 28.69MiB 28.91MiB 29.1MiB 29.21MiB 0.127406
('datadog_agent_remap_blackhole', 'comparison') 335 27.79MiB 436.73KiB 26.67MiB 27.77MiB 28.33MiB 28.52MiB 28.85MiB 28.97MiB 0.184956
('datadog_agent_remap_blackhole', 'baseline') 380 27.43MiB 374.43KiB 26.29MiB 27.44MiB 27.92MiB 28.03MiB 28.21MiB 28.33MiB -0.0598193
('splunk_hec_indexer_ack_blackhole', 'baseline') 360 22.53MiB 1.27MiB 19.28MiB 22.45MiB 24.32MiB 24.67MiB 25.42MiB 26.19MiB 0.189923
('http_to_http_noack', 'baseline') 363 21.53MiB 1.48MiB 17.63MiB 21.59MiB 23.41MiB 23.94MiB 24.94MiB 25.17MiB -0.0444062
('splunk_hec_indexer_ack_blackhole', 'comparison') 320 21.44MiB 1.23MiB 17.97MiB 21.45MiB 23.03MiB 23.34MiB 24.16MiB 24.82MiB -0.117444
('http_to_http_noack', 'comparison') 364 21.32MiB 1.54MiB 15.82MiB 21.48MiB 23.25MiB 23.75MiB 24.76MiB 26.51MiB -0.0976546
('splunk_hec_to_splunk_hec_logs_acks', 'baseline') 363 18.12MiB 1.24MiB 13.65MiB 18.24MiB 19.59MiB 19.82MiB 20.54MiB 20.98MiB -0.605247
('splunk_hec_to_splunk_hec_logs_noack', 'comparison') 361 17.61MiB 1.21MiB 14.63MiB 17.52MiB 19.16MiB 19.55MiB 20.45MiB 21.39MiB 0.10398
('splunk_hec_to_splunk_hec_logs_noack', 'baseline') 341 17.46MiB 1.33MiB 12.88MiB 17.52MiB 19.16MiB 19.46MiB 20.21MiB 21.04MiB -0.216603
('splunk_hec_to_splunk_hec_logs_acks', 'comparison') 320 16.9MiB 1.51MiB 11.95MiB 16.99MiB 18.75MiB 19.11MiB 19.59MiB 19.88MiB -0.371902
('syslog_humio_logs', 'comparison') 362 9.17MiB 379.24KiB 8.05MiB 9.38MiB 9.58MiB 9.61MiB 9.67MiB 9.69MiB -0.0727721
('syslog_splunk_hec_logs', 'comparison') 364 8.84MiB 106.78KiB 8.6MiB 8.83MiB 8.97MiB 9.0MiB 9.02MiB 9.06MiB 0.0120929
('syslog_log2metric_splunk_hec_metrics', 'comparison') 362 8.76MiB 80.27KiB 8.57MiB 8.76MiB 8.86MiB 8.87MiB 8.91MiB 8.94MiB -0.17242
('syslog_log2metric_humio_metrics', 'comparison') 363 8.44MiB 70.1KiB 8.24MiB 8.44MiB 8.52MiB 8.55MiB 8.59MiB 8.69MiB -0.0151594
('splunk_hec_route_s3', 'comparison') 362 7.78MiB 2.2MiB 2.26MiB 7.64MiB 10.55MiB 11.67MiB 14.44MiB 15.14MiB 0.470176
('splunk_hec_route_s3', 'baseline') 362 7.71MiB 2.21MiB 2.44MiB 7.57MiB 10.63MiB 11.59MiB 13.49MiB 15.29MiB 0.4304
('syslog_regex_logs2metric_ddmetrics', 'comparison') 367 7.55MiB 551.02KiB 6.41MiB 7.87MiB 8.03MiB 8.06MiB 8.11MiB 8.2MiB -0.839017
('syslog_splunk_hec_logs', 'baseline') 342 7.38MiB 210.85KiB 7.05MiB 7.28MiB 7.63MiB 7.66MiB 7.7MiB 7.71MiB 0.104227
('syslog_humio_logs', 'baseline') 363 7.3MiB 320.29KiB 6.82MiB 7.48MiB 7.66MiB 7.67MiB 7.7MiB 7.71MiB -0.0527372
('syslog_log2metric_humio_metrics', 'baseline') 364 7.18MiB 170.65KiB 6.9MiB 7.14MiB 7.38MiB 7.4MiB 7.44MiB 7.46MiB 0.0445765
('syslog_loki', 'comparison') 341 7.1MiB 325.67KiB 6.27MiB 7.15MiB 7.4MiB 7.42MiB 7.45MiB 7.51MiB -1.09891
('syslog_log2metric_splunk_hec_metrics', 'baseline') 374 7.02MiB 53.12KiB 6.86MiB 7.02MiB 7.08MiB 7.1MiB 7.12MiB 7.19MiB -0.0345396
('syslog_loki', 'baseline') 363 6.7MiB 287.83KiB 6.14MiB 6.84MiB 6.98MiB 7.0MiB 7.05MiB 7.09MiB -0.609052
('syslog_regex_logs2metric_ddmetrics', 'baseline') 363 6.31MiB 220.81KiB 5.77MiB 6.3MiB 6.56MiB 6.58MiB 6.63MiB 6.67MiB -0.622926
('http_to_http_acks', 'baseline') 362 5.97MiB 2.35MiB 0B 6.52MiB 9.23MiB 9.53MiB 10.22MiB 11.2MiB -0.153873
('http_to_http_acks', 'comparison') 362 5.64MiB 2.42MiB 1.46MiB 5.99MiB 8.97MiB 9.53MiB 10.16MiB 12.72MiB 0.0729791
('splunk_transforms_splunk3', 'baseline') 362 4.03MiB 1.43MiB 274.35KiB 4.08MiB 5.69MiB 6.25MiB 8.36MiB 9.62MiB 0.372521
('splunk_transforms_splunk3', 'comparison') 362 4.03MiB 1.45MiB 915.16KiB 3.87MiB 5.76MiB 6.55MiB 7.88MiB 9.24MiB 0.49948
('http_pipelines_no_grok_blackhole', 'baseline') 366 1.88MiB 956.7KiB 61.78KiB 1.77MiB 3.01MiB 3.52MiB 4.51MiB 5.08MiB 0.516165
('http_pipelines_no_grok_blackhole', 'comparison') 365 1.83MiB 1.08MiB 0B 1.71MiB 3.35MiB 3.92MiB 4.57MiB 6.2MiB 0.779809
('http_datadog_filter_blackhole', 'baseline') 319 1.33MiB 942.32KiB 0B 1.2MiB 2.52MiB 3.22MiB 4.19MiB 4.42MiB 0.968471
('http_datadog_filter_blackhole', 'comparison') 363 1.31MiB 859.86KiB 0B 1.14MiB 2.44MiB 2.84MiB 3.55MiB 4.54MiB 0.711481
('http_pipelines_blackhole_acks', 'baseline') 364 859.77KiB 709.52KiB 0B 786.16KiB 1.91MiB 2.09MiB 2.57MiB 3.8MiB 0.875235
('http_pipelines_blackhole', 'comparison') 366 837.49KiB 614.94KiB 0B 755.36KiB 1.56MiB 1.9MiB 2.42MiB 2.84MiB 0.538965
('http_pipelines_blackhole', 'baseline') 366 826.41KiB 675.09KiB 0B 734.57KiB 1.67MiB 2.02MiB 2.88MiB 3.46MiB 0.933518
('http_pipelines_blackhole_acks', 'comparison') 318 814.86KiB 737.45KiB 0B 586.84KiB 1.89MiB 2.16MiB 2.99MiB 3.64MiB 1.14067

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a couple of comments. Nice bump to performance for these TCP sources.

}
}

struct LogstashAcker {
protocol: LogstashProtocolVersion,
sequence_number: u32,
// TODO: this is very likely overkill, since there are only two protocol versions and it seems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it'd be unlikely, but is possible. I'd be OK disallowing mixed protocols on a single connection if it would simplify things here, but we should have an explicit assertion that it never changes.

I think you could probably simplify this to just be a tuple of version number and sequence number rather than storing the sequence numbers per protocol version. I think we can assume, even if multiple protocol versions are being used, that a single TCP stream represents one stream of events (rather than a stream per protocol version).

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Copy link
Contributor

@pablosichert pablosichert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome catch and deep dive, @lukesteensen!

This wouldn't have been a terribly surprising finding if we'd looked at the code and seen that we were obviously sending single events at a time, but at a glance the code here did look like it was doing the right thing with respect to batching.

Fully agree, this has been quite a pitfall. I'm assuming it'll be worth it to survey if other sources share the same flaw?

bytes.push(self.protocol.into());
bytes.push(LogstashFrameType::Ack.into());
bytes.extend(self.sequence_number.to_be_bytes().iter());
let mut bytes: Vec<u8> = Vec::with_capacity(6 * self.sequence_numbers.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future reader would probably appreciate having a constant along the lines of

const ACK_FRAME_SIZE: usize = 6; // protocol (u8) + frame type (u8) + sequence number (u32)

or using std::mem::sizeof::<Protocol>() + std::mem::sizeof::<FrameType>() + std::mem::sizeof::<SequenceNumber>() for that matter 😄

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@lukesteensen lukesteensen enabled auto-merge (squash) December 20, 2021 22:44
@binarylogic
Copy link
Contributor

Yes, excellent find. Impressive performance improvements.

@github-actions
Copy link

Soak Test Results

Baseline: 03416da
Comparison: 14c246d
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Test units below are bytes/second/CPU, except for "skewness". The further "skewness" is from 0.0 the more indication that vector lacks consistency in behavior, making predictions of fitness in the field challenging.

The abbreviated table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. The abbreviated table will be omitted if no statistically interesting changes are observed.

experiment Δ mean Δ mean % confidence
fluent_remap_aws_firehose 5.83MiB 9.78 100.00%
fluent_elasticsearch 5.48MiB 9.23 100.00%
syslog_humio_logs 2.31MiB 32.01 100.00%
syslog_log2metric_splunk_hec_metrics 2.09MiB 29.63 100.00%
syslog_splunk_hec_logs 1.95MiB 27.53 100.00%
syslog_regex_logs2metric_ddmetrics 1.43MiB 23.22 100.00%
syslog_log2metric_humio_metrics 1.13MiB 16.07 100.00%
http_pipelines_no_grok_blackhole -196.67KiB -9.43 99.62%
splunk_transforms_splunk3 -552.71KiB -11.62 100.00%
http_datadog_filter_blackhole -859.74KiB -39.51 100.00%
splunk_hec_to_splunk_hec_logs_acks -2.13MiB -11.62 100.00%
http_to_http_noack -2.27MiB -9.98 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % baseline mean baseline stdev baseline outlier percentage comparison mean comparison stdev comparison outlier percentage t-statistic p-value erratic
fluent_remap_aws_firehose 5.83MiB 9.78 59.58MiB 1.73MiB 0 65.4MiB 701.65KiB 0 -59.6495 3.88426e-225 False
fluent_elasticsearch 5.48MiB 9.23 59.37MiB 1.86MiB 0 64.85MiB 799.53KiB 0.271003 -51.8881 5.84845e-200 False
syslog_humio_logs 2.31MiB 32.01 7.22MiB 293.25KiB 0 9.53MiB 81.3KiB 0 -148.199 0 False
syslog_log2metric_splunk_hec_metrics 2.09MiB 29.63 7.04MiB 167.41KiB 0 9.13MiB 313.16KiB 0 -114.9 0 False
syslog_splunk_hec_logs 1.95MiB 27.53 7.07MiB 66.05KiB 1.93906 9.01MiB 203.9KiB 0 -172.142 0 False
syslog_regex_logs2metric_ddmetrics 1.43MiB 23.22 6.17MiB 201.17KiB 0 7.61MiB 536.06KiB 0 -48.8451 1.59508e-184 False
syslog_log2metric_humio_metrics 1.13MiB 16.07 7.03MiB 63.98KiB 0.276243 8.16MiB 242.8KiB 0 -87.6487 4.79415e-268 False
http_to_http_acks 615.04KiB 11.46 5.24MiB 2.43MiB 0 5.84MiB 2.47MiB 0 -3.29979 0.00101509 True
http_pipelines_blackhole_acks 17.78KiB 2.21 804.66KiB 640.11KiB 2.86624 822.44KiB 725.97KiB 2.18069 -0.327493 0.743404 False
syslog_loki -43.9KiB -0.61 7.08MiB 130.53KiB 0.273224 7.03MiB 251.09KiB 10.2639 2.88555 0.00407502 True
http_pipelines_blackhole -73.5KiB -8.12 904.78KiB 703.36KiB 2.76243 831.28KiB 743.13KiB 2.77008 1.36575 0.172446 False
http_pipelines_no_grok_blackhole -196.67KiB -9.43 2.04MiB 992.42KiB 0.831025 1.85MiB 821.02KiB 0 2.90442 0.00379593 False
splunk_transforms_splunk3 -552.71KiB -11.62 4.65MiB 1.68MiB 0.872093 4.11MiB 1.48MiB 0 4.4537 9.87863e-06 False
splunk_hec_route_s3 -553.44KiB -6.42 8.42MiB 2.46MiB 0.544959 7.88MiB 2.39MiB 0.831025 3.00225 0.00277187 False
datadog_agent_remap_datadog_logs_acks -811.59KiB -2.08 38.04MiB 6.16MiB 4.14365 37.25MiB 388.18KiB 0.550964 2.44405 0.0149979 False
http_datadog_filter_blackhole -859.74KiB -39.51 2.13MiB 941.2KiB 0.277778 1.29MiB 795.23KiB 1.11111 13.2388 7.39225e-36 False
datadog_agent_remap_blackhole -1.26MiB -4.32 29.13MiB 481.52KiB 0.262467 27.87MiB 325.97KiB 0.263158 43.248 8.04558e-196 False
splunk_hec_to_splunk_hec_logs_noack -1.46MiB -7.83 18.59MiB 1.34MiB 0 17.13MiB 1.28MiB 0.552486 14.9299 4.11814e-44 False
splunk_hec_indexer_ack_blackhole -1.75MiB -7.56 23.2MiB 1.31MiB 0.828729 21.45MiB 1.2MiB 0.828729 18.7892 2.42878e-64 False
splunk_hec_to_splunk_hec_logs_acks -2.13MiB -11.62 18.34MiB 1.62MiB 0.828729 16.21MiB 1.12MiB 1.47059 20.3917 1.01725e-71 False
http_to_http_noack -2.27MiB -9.98 22.78MiB 1.78MiB 0.826446 20.51MiB 1.8MiB 0 17.0869 3.09642e-55 False
datadog_agent_remap_blackhole_acks -2.32MiB -7.61 30.54MiB 1.36MiB 0 28.22MiB 1.12MiB 0 25.0166 1.73209e-99 False
datadog_agent_remap_datadog_logs -3.07MiB -8.27 37.1MiB 380.64KiB 0.854701 34.03MiB 1007.08KiB 0 55.5011 1.4042e-207 False
Fine details of each soak run.
(experiment, variant) total samples mean std min average p90 p95 p99 max skewness
('fluent_remap_aws_firehose', 'comparison') 341 65.4MiB 701.65KiB 63.67MiB 65.33MiB 66.3MiB 66.57MiB 66.88MiB 67.14MiB 0.156151
('fluent_elasticsearch', 'comparison') 369 64.85MiB 799.53KiB 62.23MiB 64.84MiB 65.81MiB 65.99MiB 66.32MiB 66.68MiB -0.281525
('fluent_remap_aws_firehose', 'baseline') 366 59.58MiB 1.73MiB 56.29MiB 60.11MiB 61.57MiB 61.79MiB 62.05MiB 62.3MiB -0.171968
('fluent_elasticsearch', 'baseline') 363 59.37MiB 1.86MiB 56.14MiB 59.55MiB 61.72MiB 61.97MiB 62.59MiB 63.0MiB 0.111965
('datadog_agent_remap_datadog_logs_acks', 'baseline') 362 38.04MiB 6.16MiB 0B 39.1MiB 39.64MiB 39.73MiB 40.02MiB 40.68MiB -5.71735
('datadog_agent_remap_datadog_logs_acks', 'comparison') 363 37.25MiB 388.18KiB 35.83MiB 37.24MiB 37.72MiB 37.86MiB 38.07MiB 38.35MiB -0.0780825
('datadog_agent_remap_datadog_logs', 'baseline') 351 37.1MiB 380.64KiB 35.7MiB 37.11MiB 37.54MiB 37.65MiB 37.82MiB 37.88MiB -0.532857
('datadog_agent_remap_datadog_logs', 'comparison') 363 34.03MiB 1007.08KiB 32.22MiB 33.81MiB 35.35MiB 35.84MiB 36.38MiB 37.09MiB 0.506074
('datadog_agent_remap_blackhole_acks', 'baseline') 380 30.54MiB 1.36MiB 27.9MiB 30.53MiB 32.12MiB 32.29MiB 32.61MiB 32.88MiB -0.0899886
('datadog_agent_remap_blackhole', 'baseline') 381 29.13MiB 481.52KiB 27.73MiB 29.11MiB 29.78MiB 29.95MiB 30.18MiB 30.34MiB 0.0437608
('datadog_agent_remap_blackhole_acks', 'comparison') 334 28.22MiB 1.12MiB 26.14MiB 27.78MiB 29.65MiB 29.82MiB 30.19MiB 30.65MiB 0.175675
('datadog_agent_remap_blackhole', 'comparison') 380 27.87MiB 325.97KiB 27.04MiB 27.88MiB 28.28MiB 28.38MiB 28.57MiB 28.92MiB -0.0085109
('splunk_hec_indexer_ack_blackhole', 'baseline') 362 23.2MiB 1.31MiB 19.31MiB 23.24MiB 24.88MiB 25.27MiB 26.16MiB 26.49MiB -0.106325
('http_to_http_noack', 'baseline') 363 22.78MiB 1.78MiB 16.74MiB 22.94MiB 24.93MiB 25.52MiB 26.35MiB 26.51MiB -0.333903
('splunk_hec_indexer_ack_blackhole', 'comparison') 362 21.45MiB 1.2MiB 18.45MiB 21.42MiB 22.97MiB 23.45MiB 24.45MiB 25.24MiB 0.178517
('http_to_http_noack', 'comparison') 362 20.51MiB 1.8MiB 15.64MiB 20.42MiB 22.73MiB 23.24MiB 24.75MiB 25.8MiB 0.0673862
('splunk_hec_to_splunk_hec_logs_noack', 'baseline') 362 18.59MiB 1.34MiB 14.81MiB 18.65MiB 20.32MiB 20.8MiB 21.55MiB 22.2MiB -0.0279382
('splunk_hec_to_splunk_hec_logs_acks', 'baseline') 362 18.34MiB 1.62MiB 13.11MiB 18.33MiB 20.35MiB 21.09MiB 21.83MiB 23.15MiB -0.0904685
('splunk_hec_to_splunk_hec_logs_noack', 'comparison') 362 17.13MiB 1.28MiB 13.74MiB 17.16MiB 18.8MiB 19.22MiB 20.14MiB 20.35MiB -0.0030082
('splunk_hec_to_splunk_hec_logs_acks', 'comparison') 340 16.21MiB 1.12MiB 11.74MiB 16.24MiB 17.64MiB 18.09MiB 18.73MiB 19.28MiB -0.151024
('syslog_humio_logs', 'comparison') 374 9.53MiB 81.3KiB 9.31MiB 9.54MiB 9.64MiB 9.66MiB 9.69MiB 9.73MiB -0.0541147
('syslog_log2metric_splunk_hec_metrics', 'comparison') 362 9.13MiB 313.16KiB 8.67MiB 9.06MiB 9.48MiB 9.5MiB 9.54MiB 9.57MiB 0.00743233
('syslog_splunk_hec_logs', 'comparison') 341 9.01MiB 203.9KiB 8.66MiB 9.06MiB 9.25MiB 9.28MiB 9.39MiB 9.59MiB 0.118676
('splunk_hec_route_s3', 'baseline') 367 8.42MiB 2.46MiB 3.52MiB 8.28MiB 11.63MiB 12.63MiB 15.04MiB 16.42MiB 0.355939
('syslog_log2metric_humio_metrics', 'comparison') 362 8.16MiB 242.8KiB 7.77MiB 8.17MiB 8.44MiB 8.45MiB 8.5MiB 8.54MiB 0.00303026
('splunk_hec_route_s3', 'comparison') 361 7.88MiB 2.39MiB 2.94MiB 7.57MiB 11.11MiB 12.29MiB 14.13MiB 14.78MiB 0.464269
('syslog_regex_logs2metric_ddmetrics', 'comparison') 363 7.61MiB 536.06KiB 6.4MiB 7.89MiB 8.09MiB 8.14MiB 8.2MiB 8.25MiB -0.779266
('syslog_humio_logs', 'baseline') 362 7.22MiB 293.25KiB 6.76MiB 7.17MiB 7.55MiB 7.56MiB 7.58MiB 7.61MiB -0.0441193
('syslog_loki', 'baseline') 366 7.08MiB 130.53KiB 6.67MiB 7.05MiB 7.26MiB 7.28MiB 7.33MiB 7.35MiB 0.14927
('syslog_splunk_hec_logs', 'baseline') 361 7.07MiB 66.05KiB 6.82MiB 7.07MiB 7.14MiB 7.16MiB 7.2MiB 7.24MiB -0.455878
('syslog_log2metric_splunk_hec_metrics', 'baseline') 375 7.04MiB 167.41KiB 6.79MiB 7.05MiB 7.23MiB 7.26MiB 7.29MiB 7.33MiB -0.00549784
('syslog_loki', 'comparison') 341 7.03MiB 251.09KiB 6.26MiB 7.1MiB 7.25MiB 7.26MiB 7.3MiB 7.33MiB -1.5364
('syslog_log2metric_humio_metrics', 'baseline') 362 7.03MiB 63.98KiB 6.87MiB 7.03MiB 7.11MiB 7.14MiB 7.19MiB 7.21MiB 0.203079
('syslog_regex_logs2metric_ddmetrics', 'baseline') 364 6.17MiB 201.17KiB 5.75MiB 6.12MiB 6.43MiB 6.46MiB 6.5MiB 6.53MiB -0.0954363
('http_to_http_acks', 'comparison') 361 5.84MiB 2.47MiB 883.88KiB 6.31MiB 9.26MiB 9.74MiB 10.36MiB 11.05MiB -0.0453454
('http_to_http_acks', 'baseline') 363 5.24MiB 2.43MiB 0B 4.94MiB 8.01MiB 9.5MiB 10.03MiB 10.75MiB 0.0692661
('splunk_transforms_splunk3', 'baseline') 344 4.65MiB 1.68MiB 1.13MiB 4.5MiB 6.91MiB 7.68MiB 8.81MiB 10.34MiB 0.515863
('splunk_transforms_splunk3', 'comparison') 341 4.11MiB 1.48MiB 793.2KiB 4.12MiB 6.08MiB 6.59MiB 7.68MiB 8.37MiB 0.197361
('http_datadog_filter_blackhole', 'baseline') 360 2.13MiB 941.2KiB 0B 2.16MiB 3.26MiB 3.56MiB 4.29MiB 5.73MiB 0.170801
('http_pipelines_no_grok_blackhole', 'baseline') 361 2.04MiB 992.42KiB 0B 1.92MiB 3.22MiB 3.94MiB 4.38MiB 4.93MiB 0.400566
('http_pipelines_no_grok_blackhole', 'comparison') 363 1.85MiB 821.02KiB 93.77KiB 1.84MiB 2.89MiB 3.21MiB 3.78MiB 4.06MiB 0.168355
('http_datadog_filter_blackhole', 'comparison') 360 1.29MiB 795.23KiB 0B 1.24MiB 2.39MiB 2.66MiB 3.21MiB 3.75MiB 0.485969
('http_pipelines_blackhole', 'baseline') 362 904.78KiB 703.36KiB 0B 768.02KiB 1.91MiB 2.15MiB 2.74MiB 3.7MiB 1.04316
('http_pipelines_blackhole', 'comparison') 361 831.28KiB 743.13KiB 0B 616.77KiB 1.9MiB 2.18MiB 2.9MiB 3.85MiB 1.05259
('http_pipelines_blackhole_acks', 'comparison') 321 822.44KiB 725.97KiB 0B 616.77KiB 1.91MiB 2.16MiB 2.9MiB 3.47MiB 1.03347
('http_pipelines_blackhole_acks', 'baseline') 314 804.66KiB 640.11KiB 0B 631.16KiB 1.55MiB 1.92MiB 2.77MiB 3.4MiB 1.12571

@lukesteensen lukesteensen merged commit e4d391e into master Dec 21, 2021
@lukesteensen lukesteensen deleted the tcp-source-batching-issue branch December 21, 2021 04:22
@lukesteensen
Copy link
Member Author

I'm hoping that last set of numbers post-auto-merge are just a weird artifact of something else landing on master, but I'm rerunning soaks locally between the merge commit and its parent to be sure.

@lukesteensen
Copy link
Member Author

Here are those local results, for the record:

Soak Test Results

Baseline: 03416da
Comparison: e4d391e
Total Vector CPUs: 7

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Test units below are bytes/second/CPU, except for "skewness". The further "skewness" is from 0.0 the more indication that vector lacks consistency in behavior, making predictions of fitness in the field challenging.

The abbreviated table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 95.0% confidence. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. The abbreviated table will be omitted if no statistically interesting changes are observed.

experiment Δ mean Δ mean % confidence
fluent_remap_aws_firehose 12.35MiB 46.44 100.00%
fluent_elasticsearch 5.42MiB 24.56 100.00%
syslog_log2metric_splunk_hec_metrics 1.35MiB 60.73 100.00%
syslog_log2metric_humio_metrics 1.3MiB 39.81 100.00%
syslog_regex_logs2metric_ddmetrics 1.15MiB 43.59 100.00%
syslog_splunk_hec_logs 656.17KiB 29.46 100.00%
syslog_humio_logs 637.68KiB 28.51 100.00%
syslog_loki 370.61KiB 9.76 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % baseline mean baseline stdev baseline outlier percentage comparison mean comparison stdev comparison outlier percentage t-statistic p-value erratic
fluent_remap_aws_firehose 12.35MiB 46.44 26.6MiB 2.17MiB 0 38.95MiB 1.68MiB 0 -59.9269 2.98695e-181 False
fluent_elasticsearch 5.42MiB 24.56 22.08MiB 943.01KiB 1.11111 27.5MiB 1.25MiB 0.531915 -47.6388 1.30043e-153 False
syslog_log2metric_splunk_hec_metrics 1.35MiB 60.73 2.23MiB 116.62KiB 2.77778 3.58MiB 268.17KiB 4.41989 -63.766 5.34174e-155 False
syslog_log2metric_humio_metrics 1.3MiB 39.81 3.27MiB 146.91KiB 1.65746 4.57MiB 102.3KiB 2.77778 -100.143 2.00587e-244 False
syslog_regex_logs2metric_ddmetrics 1.15MiB 43.59 2.63MiB 255.32KiB 0 3.78MiB 186.94KiB 0 -49.8126 4.81936e-155 False
syslog_splunk_hec_logs 656.17KiB 29.46 2.17MiB 58.59KiB 3.33333 2.82MiB 67.61KiB 7.22222 -98.4047 1.32787e-257 False
syslog_humio_logs 637.68KiB 28.51 2.18MiB 66.71KiB 5 2.81MiB 88.47KiB 8.93855 -77.0757 1.69467e-213 False
syslog_loki 370.61KiB 9.76 3.71MiB 104.53KiB 1.11111 4.07MiB 104.27KiB 0.555556 -33.6778 5.18423e-113 False
http_to_http_acks 326.86KiB 9.71 3.29MiB 1.6MiB 0 3.61MiB 1.59MiB 0 -1.89782 0.0585246 False
datadog_agent_remap_blackhole 246.5KiB 1.57 15.36MiB 334.06KiB 1.0582 15.61MiB 370.83KiB 1.5873 -6.78969 4.4584e-11 False
splunk_hec_route_s3 20.09KiB 0.38 5.18MiB 1.21MiB 0 5.2MiB 1.19MiB 0 -0.145644 0.884299 False
splunk_hec_indexer_ack_blackhole 15.16KiB 0.11 13.63MiB 467.45KiB 0 13.64MiB 593.53KiB 2.22222 -0.269261 0.787893 False
http_datadog_filter_blackhole 7.8KiB 0.76 1.01MiB 624.41KiB 2.22222 1.01MiB 519.03KiB 0 -0.128923 0.897493 False
http_pipelines_blackhole 6.62KiB 1.33 497.31KiB 402.78KiB 2.76243 503.93KiB 406.01KiB 0 -0.143768 0.885785 False
http_pipelines_no_grok_blackhole -5.78KiB -0.53 1.06MiB 546.6KiB 2.77778 1.06MiB 622.18KiB 1.10497 0.0938182 0.925307 False
http_pipelines_blackhole_acks -8.25KiB -1.66 496.5KiB 421.33KiB 2.77778 488.25KiB 572.26KiB 2.22222 0.155766 0.876313 False
splunk_transforms_splunk3 -19.16KiB -0.72 2.6MiB 867.45KiB 1.10497 2.58MiB 913.63KiB 0 0.20435 0.838196 False
splunk_hec_to_splunk_hec_logs_noack -59.9KiB -0.53 10.98MiB 742.32KiB 1.11111 10.92MiB 807.44KiB 1.11111 0.732754 0.464191 False
datadog_agent_remap_datadog_logs -119.04KiB -0.56 20.91MiB 387.42KiB 0 20.79MiB 341.82KiB 0 3.09126 0.0021518 False
datadog_agent_remap_blackhole_acks -125.07KiB -0.75 16.34MiB 381.84KiB 1.0582 16.22MiB 347.66KiB 1.57068 3.33752 0.000930141 False
splunk_hec_to_splunk_hec_logs_acks -128.42KiB -1.18 10.62MiB 738.19KiB 0.531915 10.49MiB 935.29KiB 0 1.46047 0.14508 False
http_to_http_noack -156.32KiB -1.12 13.61MiB 531.09KiB 3.14465 13.45MiB 837.87KiB 1.65746 2.07913 0.0384308 False
datadog_agent_remap_datadog_logs_acks -230.02KiB -1.08 20.81MiB 395.31KiB 0 20.59MiB 343.49KiB 0.555556 5.89293 8.8983e-09 False
Fine details of each soak run.
(experiment, variant) total samples mean std min average p90 p95 p99 max skewness
('fluent_remap_aws_firehose', 'comparison') 171 38.95MiB 1.68MiB 34.82MiB 38.89MiB 41.12MiB 42.04MiB 42.51MiB 42.7MiB -0.0799153
('fluent_elasticsearch', 'comparison') 188 27.5MiB 1.25MiB 24.88MiB 27.4MiB 29.11MiB 29.78MiB 30.62MiB 31.56MiB 0.439941
('fluent_remap_aws_firehose', 'baseline') 180 26.6MiB 2.17MiB 20.91MiB 26.75MiB 29.19MiB 29.9MiB 30.74MiB 31.62MiB -0.201598
('fluent_elasticsearch', 'baseline') 180 22.08MiB 943.01KiB 20.26MiB 21.88MiB 23.22MiB 23.79MiB 24.47MiB 25.41MiB 0.738621
('datadog_agent_remap_datadog_logs', 'baseline') 180 20.91MiB 387.42KiB 19.97MiB 20.91MiB 21.42MiB 21.46MiB 21.68MiB 21.77MiB -0.108593
('datadog_agent_remap_datadog_logs_acks', 'baseline') 180 20.81MiB 395.31KiB 19.8MiB 20.83MiB 21.33MiB 21.43MiB 21.61MiB 21.69MiB -0.178576
('datadog_agent_remap_datadog_logs', 'comparison') 180 20.79MiB 341.82KiB 19.89MiB 20.78MiB 21.22MiB 21.33MiB 21.55MiB 21.63MiB 0.0196883
('datadog_agent_remap_datadog_logs_acks', 'comparison') 180 20.59MiB 343.49KiB 19.54MiB 20.6MiB 21.04MiB 21.11MiB 21.27MiB 21.32MiB -0.24677
('datadog_agent_remap_blackhole_acks', 'baseline') 189 16.34MiB 381.84KiB 15.33MiB 16.34MiB 16.83MiB 16.95MiB 17.1MiB 17.26MiB -0.163057
('datadog_agent_remap_blackhole_acks', 'comparison') 191 16.22MiB 347.66KiB 15.19MiB 16.23MiB 16.69MiB 16.76MiB 16.91MiB 16.95MiB -0.203762
('datadog_agent_remap_blackhole', 'comparison') 189 15.61MiB 370.83KiB 14.13MiB 15.64MiB 16.02MiB 16.18MiB 16.29MiB 16.45MiB -0.73033
('datadog_agent_remap_blackhole', 'baseline') 189 15.36MiB 334.06KiB 14.47MiB 15.38MiB 15.73MiB 15.83MiB 16.09MiB 16.48MiB -0.0764159
('splunk_hec_indexer_ack_blackhole', 'comparison') 180 13.64MiB 593.53KiB 11.95MiB 13.67MiB 14.37MiB 14.46MiB 15.04MiB 15.6MiB 0.0736474
('splunk_hec_indexer_ack_blackhole', 'baseline') 180 13.63MiB 467.45KiB 12.36MiB 13.6MiB 14.23MiB 14.32MiB 14.59MiB 14.81MiB -0.0339293
('http_to_http_noack', 'baseline') 159 13.61MiB 531.09KiB 11.73MiB 13.57MiB 14.26MiB 14.45MiB 14.87MiB 15.22MiB -0.110941
('http_to_http_noack', 'comparison') 181 13.45MiB 837.87KiB 11.05MiB 13.36MiB 14.52MiB 14.72MiB 15.0MiB 15.06MiB -0.23034
('splunk_hec_to_splunk_hec_logs_noack', 'baseline') 180 10.98MiB 742.32KiB 8.65MiB 11.01MiB 11.85MiB 12.02MiB 12.53MiB 12.97MiB -0.32287
('splunk_hec_to_splunk_hec_logs_noack', 'comparison') 180 10.92MiB 807.44KiB 9.21MiB 10.83MiB 12.08MiB 12.33MiB 12.71MiB 13.2MiB 0.428717
('splunk_hec_to_splunk_hec_logs_acks', 'baseline') 188 10.62MiB 738.19KiB 8.67MiB 10.68MiB 11.48MiB 11.62MiB 11.95MiB 12.8MiB -0.315129
('splunk_hec_to_splunk_hec_logs_acks', 'comparison') 181 10.49MiB 935.29KiB 8.27MiB 10.35MiB 11.7MiB 12.12MiB 12.68MiB 13.09MiB 0.412669
('splunk_hec_route_s3', 'comparison') 180 5.2MiB 1.19MiB 2.28MiB 5.32MiB 6.66MiB 6.98MiB 7.38MiB 7.46MiB -0.299011
('splunk_hec_route_s3', 'baseline') 144 5.18MiB 1.21MiB 2.76MiB 5.13MiB 6.76MiB 7.08MiB 8.0MiB 8.17MiB 0.258007
('syslog_log2metric_humio_metrics', 'comparison') 180 4.57MiB 102.3KiB 4.2MiB 4.58MiB 4.68MiB 4.72MiB 4.76MiB 4.77MiB -0.838085
('syslog_loki', 'comparison') 180 4.07MiB 104.27KiB 3.73MiB 4.08MiB 4.19MiB 4.21MiB 4.26MiB 4.27MiB -0.4487
('syslog_regex_logs2metric_ddmetrics', 'comparison') 180 3.78MiB 186.94KiB 3.32MiB 3.81MiB 3.99MiB 4.03MiB 4.11MiB 4.13MiB -0.53829
('syslog_loki', 'baseline') 180 3.71MiB 104.53KiB 3.35MiB 3.72MiB 3.82MiB 3.85MiB 3.87MiB 3.88MiB -0.709051
('http_to_http_acks', 'comparison') 180 3.61MiB 1.59MiB 801.27KiB 3.85MiB 5.56MiB 5.74MiB 6.36MiB 6.66MiB -0.178902
('syslog_log2metric_splunk_hec_metrics', 'comparison') 181 3.58MiB 268.17KiB 2.84MiB 3.61MiB 3.88MiB 3.93MiB 4.0MiB 4.03MiB -0.780749
('http_to_http_acks', 'baseline') 180 3.29MiB 1.6MiB 0B 3.48MiB 5.4MiB 5.75MiB 6.57MiB 6.95MiB 0.102644
('syslog_log2metric_humio_metrics', 'baseline') 181 3.27MiB 146.91KiB 2.78MiB 3.29MiB 3.44MiB 3.47MiB 3.52MiB 3.58MiB -0.648935
('syslog_splunk_hec_logs', 'comparison') 180 2.82MiB 67.61KiB 2.62MiB 2.81MiB 2.89MiB 2.92MiB 2.99MiB 3.02MiB 0.0177744
('syslog_humio_logs', 'comparison') 179 2.81MiB 88.47KiB 2.6MiB 2.8MiB 2.92MiB 3.01MiB 3.07MiB 3.09MiB 1.01465
('syslog_regex_logs2metric_ddmetrics', 'baseline') 180 2.63MiB 255.32KiB 2.03MiB 2.62MiB 2.97MiB 3.07MiB 3.2MiB 3.21MiB 0.134854
('splunk_transforms_splunk3', 'baseline') 181 2.6MiB 867.45KiB 296.27KiB 2.55MiB 3.76MiB 4.05MiB 4.56MiB 4.84MiB 0.193485
('splunk_transforms_splunk3', 'comparison') 180 2.58MiB 913.63KiB 889.02KiB 2.48MiB 3.78MiB 4.24MiB 4.75MiB 4.85MiB 0.431641
('syslog_log2metric_splunk_hec_metrics', 'baseline') 180 2.23MiB 116.62KiB 2.06MiB 2.2MiB 2.41MiB 2.45MiB 2.55MiB 2.65MiB 1.08469
('syslog_humio_logs', 'baseline') 180 2.18MiB 66.71KiB 1.99MiB 2.19MiB 2.26MiB 2.28MiB 2.32MiB 2.36MiB -0.589208
('syslog_splunk_hec_logs', 'baseline') 180 2.17MiB 58.59KiB 2.0MiB 2.18MiB 2.24MiB 2.25MiB 2.29MiB 2.33MiB -0.598986
('http_pipelines_no_grok_blackhole', 'baseline') 180 1.06MiB 546.6KiB 35.31KiB 1.03MiB 1.76MiB 1.92MiB 2.6MiB 2.7MiB 0.577393
('http_pipelines_no_grok_blackhole', 'comparison') 181 1.06MiB 622.18KiB 0B 964.11KiB 1.79MiB 2.0MiB 2.39MiB 5.03MiB 1.7931
('http_datadog_filter_blackhole', 'comparison') 180 1.01MiB 519.03KiB 71.26KiB 973.93KiB 1.72MiB 1.88MiB 2.16MiB 2.27MiB 0.31899
('http_datadog_filter_blackhole', 'baseline') 180 1.01MiB 624.41KiB 35.31KiB 912.13KiB 1.86MiB 2.22MiB 2.9MiB 3.09MiB 0.970683
('http_pipelines_blackhole', 'comparison') 135 503.93KiB 406.01KiB 0B 420.59KiB 1.02MiB 1.23MiB 1.57MiB 1.64MiB 0.742422
('http_pipelines_blackhole', 'baseline') 181 497.31KiB 402.78KiB 0B 474.17KiB 1015.23KiB 1.12MiB 1.69MiB 1.96MiB 0.946044
('http_pipelines_blackhole_acks', 'baseline') 180 496.5KiB 421.33KiB 0B 378.64KiB 984.16KiB 1.36MiB 1.69MiB 2.18MiB 1.18981
('http_pipelines_blackhole_acks', 'comparison') 180 488.25KiB 572.26KiB 0B 318.72KiB 1.12MiB 1.36MiB 2.08MiB 4.91MiB 3.36829

jdrouet pushed a commit that referenced this pull request Dec 27, 2021
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants