Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datadog_logs sink): retry HTTP requests and improve datadog sink error handling consistency #13130

Merged
merged 12 commits into from
Jun 23, 2022

Conversation

neuronull
Copy link
Contributor

@neuronull neuronull commented Jun 13, 2022

Closes #12859

  • Some HttpErrors for datadog_logs sink are now retry-able.
  • Added a unit test to validate the retry behavior.
  • Conformed datadog_metrics sink to have the same error reporting approach as datadog_logs sink.

Note: Now that the two sinks have similar error reporting behavior, there is a little bit more repeated code. I stopped from more refactoring to pull out common code into the sinks/datadog/mod.rs as I wasn't sure how far we wanted to take it... For example the RetryLogic implementations are almost the same. And the Service's call() function have some commonalities. I'm generally a fan of reducing as much redundant code as possible but I realize it may not be desirable in all scenarios.

@neuronull neuronull added sink: datadog_metrics Anything `datadog_metrics` sink related sink: datadog_logs Anything `datadog_logs` sink related ci-condition: integration tests enable Run integration tests on this PR labels Jun 13, 2022
@neuronull neuronull self-assigned this Jun 13, 2022
@netlify
Copy link

netlify bot commented Jun 13, 2022

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 3e03766
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/62b38cb1c49c860008c3a929
😎 Deploy Preview https://deploy-preview-13130--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Jun 13, 2022
@github-actions
Copy link

Soak Test Results

Baseline: 8046713
Comparison: 211d651
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_no_grok_blackhole 1.15MiB 6.78 100.00% 17.0MiB 1.85MiB 38.69KiB 0 0.108642 18.15MiB 1.84MiB 38.35KiB 0 0.101194 False False
socket_to_socket_blackhole 360.11KiB 2.53 100.00% 13.9MiB 602.77KiB 12.27KiB 0 0.0423261 14.26MiB 411.48KiB 8.4KiB 0 0.0281811 False False
datadog_agent_remap_blackhole 1021.64KiB 1.7 100.00% 58.77MiB 8.46MiB 177.51KiB 0 0.14395 59.77MiB 8.52MiB 178.17KiB 0 0.142434 False False
syslog_loki 266.12KiB 1.68 100.00% 15.45MiB 500.48KiB 10.2KiB 0 0.0316191 15.71MiB 177.52KiB 3.63KiB 0 0.01103 False False
http_to_http_acks 166.8KiB 0.93 51.72% 17.52MiB 7.91MiB 165.34KiB 0 0.45135 17.68MiB 8.17MiB 170.74KiB 0 0.461805 True True
http_pipelines_blackhole_acks 31.02KiB 0.7 93.27% 4.33MiB 594.35KiB 12.17KiB 0 0.133926 4.36MiB 577.68KiB 11.8KiB 0 0.129267 False False
splunk_transforms_splunk3 65.32KiB 0.5 68.78% 12.76MiB 2.18MiB 45.59KiB 0 0.171217 12.82MiB 2.19MiB 45.8KiB 0 0.170644 False False
http_to_http_noack 69.37KiB 0.28 99.97% 23.78MiB 909.2KiB 18.53KiB 0 0.0373342 23.85MiB 240.3KiB 4.91KiB 0 0.00983933 False False
datadog_agent_remap_datadog_logs 179.38KiB 0.26 94.20% 67.13MiB 2.41MiB 50.27KiB 0 0.0358539 67.31MiB 3.84MiB 80.15KiB 0 0.0570316 False False
http_pipelines_blackhole 11.29KiB 0.25 48.38% 4.49MiB 588.88KiB 12.07KiB 0 0.128095 4.5MiB 614.24KiB 12.53KiB 0 0.133284 False False
datadog_agent_remap_blackhole_acks 129.44KiB 0.19 64.86% 65.43MiB 4.79MiB 99.87KiB 0 0.073126 65.56MiB 4.62MiB 96.52KiB 0 0.0704521 False False
fluent_elasticsearch 114.51KiB 0.14 97.80% 79.36MiB 2.43MiB 49.95KiB 0 0.0306188 79.47MiB 52.5KiB 1.05KiB 0 0.000645006 False False
datadog_agent_remap_datadog_logs_acks 55.23KiB 0.08 64.25% 64.78MiB 1.06MiB 22.2KiB 0 0.0163851 64.84MiB 2.67MiB 55.75KiB 0 0.0411325 False False
splunk_hec_indexer_ack_blackhole 10.14KiB 0.04 35.26% 23.76MiB 789.56KiB 16.06KiB 0 0.0324478 23.77MiB 750.07KiB 15.26KiB 0 0.030812 False False
splunk_hec_to_splunk_hec_logs_acks 8.92KiB 0.04 32.82% 23.77MiB 750.24KiB 15.27KiB 0 0.030818 23.78MiB 712.13KiB 14.5KiB 0 0.0292419 False False
http_to_http_json 1.35KiB 0.01 11.21% 23.84MiB 330.59KiB 6.75KiB 0 0.0135375 23.84MiB 330.61KiB 6.75KiB 0 0.0135376 False False
splunk_hec_to_splunk_hec_logs_noack -125.05B -0 1.05% 23.84MiB 321.94KiB 6.59KiB 0 0.0131874 23.84MiB 321.36KiB 6.58KiB 0 0.0131637 False False
file_to_blackhole -174.69KiB -0.18 95.44% 95.32MiB 2.67MiB 55.5KiB 0 0.0279633 95.15MiB 3.27MiB 67.45KiB 0 0.0343274 False False
syslog_regex_logs2metric_ddmetrics -72.38KiB -0.55 99.95% 12.75MiB 722.77KiB 14.7KiB 0 0.0553272 12.68MiB 721.26KiB 14.67KiB 0 0.0555194 False False
syslog_log2metric_humio_metrics -89.9KiB -0.62 100.00% 14.11MiB 257.73KiB 5.26KiB 0 0.0178306 14.03MiB 174.23KiB 3.57KiB 0 0.012129 False False
syslog_humio_logs -173.41KiB -1 100.00% 16.98MiB 359.15KiB 7.36KiB 0 0.0206531 16.81MiB 377.92KiB 7.73KiB 0 0.0219518 False False
splunk_hec_route_s3 -207.83KiB -1.02 99.88% 19.84MiB 2.1MiB 43.93KiB 0 0.105888 19.64MiB 2.25MiB 47.0KiB 0 0.114462 False False
syslog_splunk_hec_logs -203.52KiB -1.15 100.00% 17.23MiB 244.33KiB 4.99KiB 0 0.0138468 17.03MiB 300.77KiB 6.15KiB 0 0.017244 False False
syslog_log2metric_splunk_hec_metrics -460.06KiB -2.49 100.00% 18.01MiB 943.92KiB 19.2KiB 0 0.0511613 17.56MiB 813.0KiB 16.57KiB 0 0.0451924 False False

@binarylogic
Copy link
Contributor

Agree with sharing code. I'm also curious about this behavior across other HTTP-based sinks? We should be retrying these errors for all sinks.

@jszwedko
Copy link
Member

Agree with sharing code. I'm also curious about this behavior across other HTTP-based sinks? We should be retrying these errors for all sinks.

Agreed. We could take a similar tact as we have for the AWS components where they all have custom retry behavior but then fallback to shared retry logic:

vector/src/aws/mod.rs

Lines 36 to 73 in f6f38d0

pub fn is_retriable_error<T>(error: &SdkError<T>) -> bool {
match error {
SdkError::TimeoutError(_) | SdkError::DispatchFailure(_) => true,
SdkError::ConstructionFailure(_) => false,
SdkError::ResponseError { err: _, raw } | SdkError::ServiceError { err: _, raw } => {
// This header is a direct indication that we should retry the request. Eventually it'd
// be nice to actually schedule the retry after the given delay, but for now we just
// check that it contains a positive value.
let retry_header = raw.http().headers().get("x-amz-retry-after").is_some();
// Certain 400-level responses will contain an error code indicating that the request
// should be retried. Since we don't retry 400-level responses by default, we'll look
// for these specifically before falling back to more general heuristics. Because AWS
// services use a mix of XML and JSON response bodies and the AWS SDK doesn't give us
// a parsed representation, we resort to a simple string match.
//
// S3: RequestTimeout
// SQS: RequestExpired, ThrottlingException
// ECS: RequestExpired, ThrottlingException
// Kinesis: RequestExpired, ThrottlingException
// Cloudwatch: RequestExpired, ThrottlingException
//
// Now just look for those when it's a client_error
let re = RETRIABLE_CODES.get_or_init(|| {
RegexSet::new(&["RequestTimeout", "RequestExpired", "ThrottlingException"])
.expect("invalid regex")
});
let status = raw.http().status();
let response_body = String::from_utf8_lossy(raw.http().body().bytes().unwrap_or(&[]));
retry_header
|| status.is_server_error()
|| status == http::StatusCode::TOO_MANY_REQUESTS
|| (status.is_client_error() && re.is_match(response_body.as_ref()))
}
}
}

@neuronull
Copy link
Contributor Author

neuronull commented Jun 14, 2022

I'm also curious about this behavior across other HTTP-based sinks? We should be retrying these errors for all sinks.

It looks like a lot of them (splunk_hec_metrics , splunk_hec_logs, clickhouse, influxdb, sematext, prometheus) use the generic HttpRetryLogic , which retries everything...

Some others which don't use HttpRetryLogic:

  • elasticsearch: retries everything
  • loki: retries all http::HttpError
  • gcs_common retries everything , in fact it has a comment to merge with HttpRetryLogic
  • azure blob retries only server errors or too many request errors

That's not an exhaustive analysis but, it seems that many of the sinks retry all HttpErrors , rather than being selective, as was suggested in the comments here in this PR ...

Perhaps HttpRetryLogic should not be retrying everything (instead adopting something like what is in this PR) and should all the http based sinks be using the HttpRetryLogic (at least for http::HttpErrors) ?

@github-actions
Copy link

Soak Test Results

Baseline: 3210771
Comparison: 4e1e889
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_no_grok_blackhole 1.4MiB 8.49 100.00% 16.44MiB 1.29MiB 27.04KiB 0 0.0786202 17.84MiB 1.48MiB 30.79KiB 0 0.0827329 False False
splunk_transforms_splunk3 497.3KiB 3.9 100.00% 12.44MiB 2.16MiB 45.08KiB 0 0.173477 12.93MiB 2.19MiB 45.88KiB 0 0.169527 False False
splunk_hec_route_s3 694.4KiB 3.59 100.00% 18.91MiB 2.23MiB 46.68KiB 0 0.118033 19.59MiB 2.18MiB 45.66KiB 0 0.111474 False False
http_pipelines_blackhole_acks 152.55KiB 3.5 100.00% 4.26MiB 578.77KiB 11.83KiB 0 0.1326 4.41MiB 586.3KiB 11.98KiB 0 0.129788 False False
datadog_agent_remap_blackhole_acks 1.77MiB 2.94 100.00% 60.33MiB 3.2MiB 66.66KiB 0 0.0529718 62.1MiB 2.1MiB 43.81KiB 0 0.0337704 False False
syslog_log2metric_humio_metrics 401.32KiB 2.82 100.00% 13.89MiB 312.48KiB 6.37KiB 0 0.0219632 14.28MiB 167.8KiB 3.43KiB 0 0.0114708 False False
syslog_loki 408.32KiB 2.63 100.00% 15.18MiB 523.08KiB 10.66KiB 0 0.0336517 15.58MiB 259.31KiB 5.3KiB 0 0.016255 False False
socket_to_socket_blackhole 346.7KiB 2.48 100.00% 13.66MiB 559.52KiB 11.4KiB 0 0.0399945 14.0MiB 410.67KiB 8.38KiB 0 0.0286447 False False
syslog_log2metric_splunk_hec_metrics 422.35KiB 2.47 100.00% 16.73MiB 1.07MiB 22.28KiB 0 0.0639507 17.14MiB 1.04MiB 21.67KiB 0 0.0605989 False False
syslog_splunk_hec_logs 386.21KiB 2.41 100.00% 15.68MiB 1.21MiB 25.27KiB 0 0.0770958 16.06MiB 1.26MiB 26.27KiB 0 0.0781932 False False
syslog_humio_logs 412.6KiB 2.4 100.00% 16.78MiB 125.2KiB 2.57KiB 0 0.00728505 17.18MiB 126.21KiB 2.58KiB 0 0.00717123 False False
syslog_regex_logs2metric_ddmetrics 207.0KiB 1.73 100.00% 11.71MiB 1.0MiB 20.91KiB 0 0.0856348 11.91MiB 1.04MiB 21.62KiB 0 0.0871245 False False
datadog_agent_remap_datadog_logs_acks 909.97KiB 1.38 100.00% 64.22MiB 1.21MiB 25.34KiB 0 0.0188576 65.11MiB 2.93MiB 61.36KiB 0 0.0450646 False False
datadog_agent_remap_datadog_logs 652.95KiB 1 100.00% 63.95MiB 2.27MiB 47.43KiB 0 0.0355096 64.59MiB 2.88MiB 60.35KiB 0 0.0446583 False False
http_pipelines_blackhole 34.67KiB 0.82 88.60% 4.13MiB 714.63KiB 14.64KiB 0 0.168929 4.16MiB 801.62KiB 16.33KiB 0 0.187951 False False
datadog_agent_remap_blackhole 380.49KiB 0.59 100.00% 63.5MiB 2.3MiB 48.32KiB 0 0.0362878 63.87MiB 2.6MiB 54.36KiB 0 0.0406303 False False
http_to_http_acks 105.73KiB 0.58 34.60% 17.79MiB 8.09MiB 169.14KiB 0 0.454672 17.89MiB 7.87MiB 164.44KiB 0 0.439672 True True
http_to_http_noack 85.48KiB 0.35 99.99% 23.76MiB 1023.72KiB 20.85KiB 0 0.042064 23.85MiB 244.58KiB 4.99KiB 0 0.0100143 False False
fluent_elasticsearch 89.18KiB 0.11 95.77% 79.39MiB 2.13MiB 43.89KiB 0 0.0268443 79.47MiB 53.9KiB 1.08KiB 0 0.000662197 False False
http_to_http_json 2.86KiB 0.01 22.18% 23.84MiB 357.77KiB 7.31KiB 0 0.0146525 23.84MiB 346.17KiB 7.07KiB 0 0.0141756 False False
splunk_hec_to_splunk_hec_logs_noack 46.01B 0 0.37% 23.84MiB 331.68KiB 6.79KiB 0 0.0135857 23.84MiB 333.7KiB 6.83KiB 0 0.0136683 False False
splunk_hec_indexer_ack_blackhole -10.47KiB -0.04 36.20% 23.77MiB 754.4KiB 15.35KiB 0 0.030991 23.76MiB 791.7KiB 16.1KiB 0 0.0325373 False False
splunk_hec_to_splunk_hec_logs_acks -12.96KiB -0.05 42.28% 23.77MiB 805.25KiB 16.38KiB 0 0.0330793 23.75MiB 811.61KiB 16.51KiB 0 0.0333584 False False
file_to_blackhole -143.9KiB -0.15 89.17% 95.32MiB 3.06MiB 63.64KiB 0 0.0321236 95.18MiB 3.05MiB 63.05KiB 0 0.0320668 False False

@neuronull
Copy link
Contributor Author

I'm also curious about this behavior across other HTTP-based sinks? We should be retrying these errors for all sinks.

That's not an exhaustive analysis but, it seems that many of the sinks retry all HttpErrors , rather than being selective, as was suggested in the comments here in this PR ...

Perhaps HttpRetryLogic should not be retrying everything (instead adopting something like what is in this PR) and should all the http based sinks be using the HttpRetryLogic (at least for http::HttpErrors) ?

Sounds like outside the scope of this issue. Jesse has referenced this in #10870 .

tobz
tobz previously requested changes Jun 17, 2022
Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good incremental change/cleanup, just a few comments.

src/sinks/datadog/metrics/service.rs Outdated Show resolved Hide resolved
src/sinks/datadog/mod.rs Outdated Show resolved Hide resolved
@neuronull neuronull requested a review from tobz June 17, 2022 17:28
@github-actions
Copy link

Soak Test Results

Baseline: ce08bc8
Comparison: 2a64ffe
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

Changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

experiment Δ mean Δ mean % confidence
http_pipelines_no_grok_blackhole 1.79MiB 10.48 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_no_grok_blackhole 1.79MiB 10.48 100.00% 17.09MiB 1.3MiB 27.1KiB 0 0.0758043 18.89MiB 1.42MiB 29.67KiB 0 0.0752884 False False
syslog_log2metric_humio_metrics 569.24KiB 4.09 100.00% 13.59MiB 463.55KiB 9.45KiB 0 0.0333096 14.14MiB 375.66KiB 7.69KiB 0 0.0259333 False False
splunk_hec_route_s3 688.77KiB 3.48 100.00% 19.35MiB 2.23MiB 46.57KiB 0 0.115115 20.02MiB 2.17MiB 45.44KiB 0 0.108536 False False
http_pipelines_blackhole_acks 143.03KiB 3.23 100.00% 4.33MiB 549.83KiB 11.25KiB 0 0.123969 4.47MiB 560.46KiB 11.45KiB 0 0.122417 False False
http_pipelines_blackhole 129.36KiB 2.99 100.00% 4.23MiB 626.11KiB 12.83KiB 0 0.14455 4.36MiB 637.38KiB 13.0KiB 0 0.142884 False False
syslog_splunk_hec_logs 490.67KiB 2.9 100.00% 16.54MiB 287.77KiB 5.87KiB 0 0.0169867 17.02MiB 328.46KiB 6.72KiB 0 0.0188431 False False
socket_to_socket_blackhole 389.6KiB 2.77 100.00% 13.74MiB 564.18KiB 11.49KiB 0 0.0401023 14.12MiB 552.07KiB 11.25KiB 0 0.0381837 False False
syslog_humio_logs 422.98KiB 2.53 100.00% 16.34MiB 545.8KiB 11.19KiB 0 0.0326066 16.76MiB 615.71KiB 12.6KiB 0 0.0358765 False False
syslog_log2metric_splunk_hec_metrics 435.47KiB 2.51 100.00% 16.96MiB 926.69KiB 18.86KiB 0 0.0533463 17.39MiB 856.09KiB 17.44KiB 0 0.0480765 False False
datadog_agent_remap_blackhole_acks 1.24MiB 1.87 100.00% 65.98MiB 2.61MiB 54.56KiB 0 0.0396047 67.21MiB 1.97MiB 41.23KiB 0 0.0293588 False False
syslog_regex_logs2metric_ddmetrics 177.77KiB 1.42 100.00% 12.22MiB 571.27KiB 11.64KiB 0 0.045631 12.4MiB 660.03KiB 13.44KiB 0 0.0519827 False False
datadog_agent_remap_blackhole 827.73KiB 1.3 100.00% 62.06MiB 4.1MiB 85.97KiB 0 0.0660509 62.87MiB 3.79MiB 79.24KiB 0 0.0602008 False False
datadog_agent_remap_datadog_logs_acks 608.97KiB 0.9 100.00% 65.85MiB 1.8MiB 37.74KiB 0 0.0273979 66.44MiB 3.08MiB 64.36KiB 0 0.0463128 False False
datadog_agent_remap_datadog_logs 578.23KiB 0.87 100.00% 64.92MiB 3.91MiB 81.75KiB 0 0.0602846 65.48MiB 4.28MiB 89.68KiB 0 0.0654006 False False
syslog_loki 124.04KiB 0.79 100.00% 15.37MiB 511.59KiB 10.43KiB 0 0.0324875 15.5MiB 251.91KiB 5.14KiB 0 0.0158718 False False
http_to_http_noack 121.49KiB 0.5 100.00% 23.73MiB 1.16MiB 24.23KiB 0 0.0489828 23.85MiB 244.85KiB 5.0KiB 0 0.0100246 False False
fluent_elasticsearch 47.0KiB 0.06 82.11% 79.43MiB 1.7MiB 34.94KiB 0 0.0213519 79.47MiB 53.86KiB 1.08KiB 0 0.000661747 False False
http_to_http_json -897.69B -0 6.78% 23.84MiB 354.94KiB 7.25KiB 0 0.0145365 23.84MiB 358.45KiB 7.32KiB 0 0.0146805 False False
splunk_hec_to_splunk_hec_logs_noack 584.27B 0 4.80% 23.84MiB 325.04KiB 6.66KiB 0 0.0133141 23.84MiB 330.3KiB 6.76KiB 0 0.0135293 False False
splunk_hec_indexer_ack_blackhole -272.0B -0 1.00% 23.77MiB 734.81KiB 14.96KiB 0 0.0301803 23.77MiB 732.94KiB 14.92KiB 0 0.0301038 False False
splunk_hec_to_splunk_hec_logs_acks -15.23KiB -0.06 50.02% 23.77MiB 761.47KiB 15.5KiB 0 0.0312783 23.75MiB 807.15KiB 16.41KiB 0 0.0331756 False False
file_to_blackhole -258.32KiB -0.26 99.32% 95.36MiB 3.08MiB 64.11KiB 0 0.0323331 95.11MiB 3.42MiB 70.6KiB 0 0.0359475 False False
http_to_http_acks -218.15KiB -1.18 65.14% 18.12MiB 7.99MiB 167.1KiB 0 0.441147 17.91MiB 7.75MiB 161.95KiB 0 0.432637 True True

Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Left some non-blocking nits.

src/sinks/datadog/metrics/service.rs Outdated Show resolved Hide resolved
src/sinks/datadog/mod.rs Outdated Show resolved Hide resolved
src/sinks/datadog/logs/service.rs Outdated Show resolved Hide resolved
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I agree with Toby's comments including that none are blocking.

src/sinks/datadog/logs/service.rs Outdated Show resolved Hide resolved
src/sinks/datadog/mod.rs Outdated Show resolved Hide resolved
@github-actions
Copy link

Soak Test Results

Baseline: 91d88e8
Comparison: 3e03766
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_text_to_http_json 1.09MiB 2.82 100.00% 38.73MiB 796.84KiB 16.27KiB 0 0.0200895 39.82MiB 775.75KiB 15.83KiB 0 0.0190214 False False
datadog_agent_remap_blackhole_acks 923.04KiB 1.36 100.00% 66.27MiB 5.19MiB 108.11KiB 0 0.0783104 67.17MiB 3.61MiB 75.53KiB 0 0.0536765 False False
syslog_regex_logs2metric_ddmetrics 114.02KiB 0.94 100.00% 11.9MiB 696.02KiB 14.17KiB 0 0.0570975 12.01MiB 625.02KiB 12.74KiB 0 0.0507977 False False
syslog_humio_logs 143.17KiB 0.87 100.00% 15.99MiB 468.24KiB 9.56KiB 0 0.0285846 16.13MiB 529.2KiB 10.84KiB 0 0.0320264 False False
http_pipelines_blackhole_acks 7.03KiB 0.6 98.06% 1.15MiB 118.75KiB 2.42KiB 0 0.100935 1.16MiB 87.83KiB 1.79KiB 0 0.0742076 False False
splunk_hec_route_s3 114.84KiB 0.59 89.14% 18.93MiB 2.48MiB 51.68KiB 0 0.131145 19.04MiB 2.37MiB 49.49KiB 0 0.124381 False False
socket_to_socket_blackhole 56.67KiB 0.41 100.00% 13.5MiB 349.29KiB 7.13KiB 0 0.025263 13.55MiB 365.64KiB 7.46KiB 0 0.0263379 False False
datadog_agent_remap_blackhole 269.69KiB 0.4 97.53% 66.12MiB 4.51MiB 93.85KiB 0 0.0681289 66.39MiB 3.59MiB 74.89KiB 0 0.0540788 False False
splunk_hec_to_splunk_hec_logs_noack 5.47KiB 0.02 40.41% 23.83MiB 383.87KiB 7.84KiB 0 0.0157267 23.84MiB 328.15KiB 6.7KiB 0 0.0134407 False False
splunk_hec_indexer_ack_blackhole -7.33KiB -0.03 22.19% 23.75MiB 893.26KiB 18.17KiB 0 0.0367223 23.74MiB 914.01KiB 18.59KiB 0 0.0375863 False False
file_to_blackhole -46.18KiB -0.05 38.26% 95.36MiB 2.96MiB 61.3KiB 0 0.031003 95.31MiB 3.32MiB 69.18KiB 0 0.0348714 False False
splunk_hec_to_splunk_hec_logs_acks -17.53KiB -0.07 50.54% 23.75MiB 854.33KiB 17.38KiB 0 0.0351233 23.73MiB 929.26KiB 18.89KiB 0 0.0382311 False False
http_to_http_json -41.6KiB -0.17 99.85% 23.84MiB 340.61KiB 6.95KiB 0 0.0139465 23.8MiB 545.15KiB 11.12KiB 0 0.0223601 False False
datadog_agent_remap_datadog_logs_acks -106.27KiB -0.17 70.60% 62.68MiB 2.42MiB 50.59KiB 0 0.0385403 62.58MiB 4.21MiB 87.69KiB 0 0.0672933 False False
http_to_http_noack -59.46KiB -0.24 99.15% 23.83MiB 521.0KiB 10.64KiB 0 0.0213483 23.77MiB 978.26KiB 19.93KiB 0 0.0401829 False False
fluent_elasticsearch -214.49KiB -0.26 100.00% 79.47MiB 52.9KiB 1.07KiB 0 0.000649939 79.26MiB 1.83MiB 37.57KiB 0 0.0230442 False False
syslog_loki -55.2KiB -0.36 99.89% 15.09MiB 331.47KiB 6.79KiB 0 0.0214448 15.04MiB 761.88KiB 15.49KiB 0 0.0494675 False False
http_pipelines_blackhole -6.39KiB -0.37 98.64% 1.66MiB 40.53KiB 847.46B 0 0.0237697 1.66MiB 120.19KiB 2.45KiB 0 0.0707564 False False
syslog_log2metric_humio_metrics -53.27KiB -0.38 99.97% 13.78MiB 406.43KiB 8.3KiB 0 0.0287999 13.73MiB 602.04KiB 12.26KiB 0 0.0428223 False False
syslog_splunk_hec_logs -71.18KiB -0.43 99.79% 16.17MiB 860.21KiB 17.51KiB 0 0.0519423 16.1MiB 737.89KiB 15.06KiB 0 0.0447488 False False
datadog_agent_remap_datadog_logs -340.56KiB -0.5 99.97% 67.07MiB 338.25KiB 6.92KiB 0 0.00492378 66.74MiB 4.47MiB 93.14KiB 0 0.0670029 False False
syslog_log2metric_splunk_hec_metrics -193.91KiB -1.1 100.00% 17.19MiB 1.02MiB 21.31KiB 0 0.0594053 17.01MiB 1.16MiB 24.12KiB 0 0.0680297 False False
http_pipelines_no_grok_blackhole -162.58KiB -1.36 100.00% 11.7MiB 356.77KiB 7.28KiB 0 0.0297617 11.55MiB 1.2MiB 24.97KiB 0 0.103898 False False
http_to_http_acks -340.57KiB -1.88 85.78% 17.69MiB 7.94MiB 165.99KiB 0 0.448763 17.36MiB 7.76MiB 162.13KiB 0 0.447296 True True

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@neuronull neuronull merged commit f45171b into master Jun 23, 2022
@neuronull neuronull deleted the neuronull/datadog_logs_sink_http_retries branch June 23, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-condition: integration tests enable Run integration tests on this PR domain: sinks Anything related to the Vector's sinks sink: datadog_logs Anything `datadog_logs` sink related sink: datadog_metrics Anything `datadog_metrics` sink related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The datadog_logs does not retry requests due to aborted connections
8 participants