Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify batch_size terminology #916

Closed
lukesteensen opened this issue Sep 23, 2019 · 2 comments · Fixed by #1493
Closed

Clarify batch_size terminology #916

lukesteensen opened this issue Sep 23, 2019 · 2 comments · Fixed by #1493
Labels
domain: config Anything related to configuring Vector meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin.

Comments

@lukesteensen
Copy link
Member

We use batch_size as a common configuration value, but it means different things in different contexts. For some sinks, it means a number of events and for others, it means a total byte size of the request body.

We should audit our sink configs and make sure that this terminology isn't confusing. Depending on the sink, it may be better to rename some of these uses to something along the lines of max_request_body or similar.

@lukesteensen lukesteensen added meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin. domain: config Anything related to configuring Vector labels Sep 23, 2019
@LucioFranco
Copy link
Contributor

Here is a list of sinks that do not use bytes as the unit for batch_size:

  • aws_cloudwatch_logs
  • aws_cloudwatch_metrics
  • aws_kinesis_firehose
  • aws_kinesis_streams
  • datadog_metrics

These sinks each use a specialized buffer type for their batching. Because of this their batch_size refers to the amount of batched events.

I suggest we breakout out batching config types into two separate ones and rename the size option to max_size.

struct BatchBytesConfig {
	max_size: Option<usize>,
	timeout_secs: Option<u64>,
}

and for the sinks mentioned above we can have something like this:

struct BatchEventsConfig {
	max_events: Option<usize>,
	timeout_secs: Option<u64>,
}

The end goal here is that sinks that use a size based batcher can use BatchBytesConfig and those that batch based on events can use BatchEventsConfig.

This in the end would make the batch configs look like so:

[sink.out]
type = "aws_s3"

[sink.out.batch]
max_size = 1024

and for an event based batcher

[sink.out]
type = "aws_cloudwatch_logs"

[sink.out.batch]
max_events = 1000

cc @lukesteensen @binarylogic @a-rodin @Jeffail

@binarylogic
Copy link
Contributor

I'm on board with your examples 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants