Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Vector's throughput for UDP protocols #8518

Open
jszwedko opened this issue Jul 29, 2021 · 2 comments
Open

Improve Vector's throughput for UDP protocols #8518

jszwedko opened this issue Jul 29, 2021 · 2 comments
Labels
source: socket Anything `socket` source related source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@jszwedko
Copy link
Member

jszwedko commented Jul 29, 2021

Currently we process UDP packets serially in the syslog source when mode udp is used:

https://github.com/timberio/vector/blob/410fac0b7fbb0c7361056105ba362f7ee6c112ca/src/sources/syslog.rs#L422-L444

It seems like we should be able to improve throughput introducing concurrency here given that processing the packet does involve some decoding and parsing work.

This is also true for the socket source:

Perhaps one worker per "connection" similar to the tcp mode where we could use a source ip/port pair to partition them; or just by having a fixed (configurable?) number of workers processing packets.

@jszwedko jszwedko added source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality. labels Jul 29, 2021
@jszwedko jszwedko added the source: socket Anything `socket` source related label Aug 25, 2021
@jszwedko jszwedko changed the title Improve syslog sourceudp throughput Improve Vector's throughput for UDP protocols Feb 25, 2022
@hhromic
Copy link
Contributor

hhromic commented Feb 25, 2022

Perhaps one worker per "connection" similar to the tcp mode where we could use a source ip/port pair to partition them; or just by having a fixed (configurable?) number of workers processing packets.

The idea of using the (source ip/port) tuple for marking a "connection" or "session" is indeed common in many applications. However, from my experience in my company where we deal with a big number of observability data feeds, many of those systems send data using a single source ip/port that rarely rotates. So, parallelising by this tuple won't do much in these cases unfortuntely.

Mentioning that for your consideration on the design 👍

@davidpellcb
Copy link

We're interested in potentially replacing the dogstatsd portion of Datadog Agent with a Vector agent dedicated to receiving statsd applicatino metrics, due to performance issues we've had with dogstatsd under load. It would be nice to see if this provided better performance once optimized!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: socket Anything `socket` source related source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

3 participants