-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable preserving event order between vector
source and sink
#13845
Comments
Hi @PerfectDay20 ! I think what is happening here is that the Let me know if that makes sense! This is intended behavior of that source and sink, but I can see the use-case for preserving order so we can repurpose this issue as a feature request. |
vector
source and sink
Thanks, @jszwedko ! This makes sense and I tried with |
@jszwedko I suggest put somewhere into the documentation information about this scenario. We also met the same problem with a changed events order in Vector. Not it is not clear enough, what and how should be configured for getting the desired result (preserving events order). Ideally some dedicated setting like 'preserve_event_order = true/false` could help here and would be much more understandable. However, I think we can start with a piece of documentation. |
馃憤 agreed. A |
Hi guys! My setup involves sending data through Vector with HTTPS and certificate authentication, using the following configuration: file source to vector sink, and then vector source to file sink. I only encounter reordering issues when the internet connectivity becomes exceptionally poor. It's worth noting that my Vector agent (sender) and aggregator (receiver) are geographically distant from each other, which could be contributing to the problem. With the concurrency set to 1, the throughput is approximately 7 times slower than with "adaptive" (default) concurrency, which is not an acceptable trade-off for my use case. This situation prompts me to wonder if the Vector protocol is adequately optimized for situations characterized by high latency and unstable internet connections. I'm interested in knowing if you have any recommended design solutions or best practices that could facilitate high-volume, ordered delivery over long distances? Ideally, I would prefer to maintain the current setup with adaptive concurrency, which operates effectively, and implement a transform akin to 'dedup', but designed for ensuring order consistency. |
Hey! It's expected that you would see much lower throughput with a concurrency of |
@jszwedko does this mean that with concurrency of |
No, the buffers will still fill up as normal, they will just egress Vector one request at a time. |
So the next batch might be sent before the acknowledgement received for the previous one? |
I believe retries are taken into account: that is that the next request won't be sent until the previous one is accepted. There will only be one batch in-flight, but the in-memory buffers can still queue up events. |
A note for the community
Problem
During a performance test between two Vector instances on two machines in a same DC, I found the data in the received file is disordered.
Machine1:
Machine2:
The data file is 956MB, each line prefixed with line number: 0,1,2...
In the received file, the line numbers are disordered like:
The client that writes to the HTTP source is a simple Java method:
The file written by machine1 file sink is ordered, while the file written by machine2 is ordered in some tests, and disordered in other tests.
At first, I thought this may be caused by Vector sink's concurrent sending and retries. But when I disabled retry with
request.retry_attempts = 0
, the file is still complete with disordered data. So I assume this is not caused by failed requests and retry.I read through the docs and searched issues but find no guarantees about the data order, so I wonder what's the cause of the disorder, is this the expected behavior?
Configuration
Version
vector 0.23.0 (x86_64-unknown-linux-gnu 38c2435 2022-07-11)
Debug Output
Example Data
(line number + space + long text)
0 a842a1434a... (500 chars)
1 a842a1434a...
2 a842a1434a...
3 a842a1434a...
4 a842a1434a...
5 a842a1434a...
6 a842a1434a...
7 a842a1434a...
8 a842a1434a...
9 a842a1434a...
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: