-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket source with bytes framing and TCP mode only sends data after connection is closed #17136
Comments
Hello, thanks for providing these details and reproducible configs. I think what is going on here (both for the socket source case, and the socket sink case), is that when using the
meaning that these boundaries haven't been crossed in your example cases. To illustrate this, if the framing of the That is because the socket sink now knows when to cut the boundary for an event. With the framing method set to bytes, even though the http source received one event, the i/o stream is still open as far as the socket sink is concerned. This also further supports @tobz's observation in the discord thread about netcat behavior (when only using the Receiving config). |
Hi @dekelpilli , have you had a chance to review my previous comment? |
Hi @neuronull - I still believe this is not how the source should behave. My understanding is that the "underlying I/O's boundary" should just be a single TCP packet, rather than a full connection. Where the structure of a TCP packet is This is my understanding of "split between messages or stream segments", rather than splitting messages when the connection is closed. I also believe this is far more useful, and if for TCP we take the I/O boundary to mean the start of a connection until the end of that connection, then bytes framing on the TCP is not a very useful feature (as it would require breaking and recreating the connection to get messages to progress). |
I completely agree with you that the Maybe it'd be better to step back to understand your use-case a bit better. Would you be able to describe it? Answering questions like: what is sending the data? What is the data? What processing are you hoping to do in Vector? That would help us understand how it could be best modeled in Vector (including if it would make sense for the |
Our own (thin) service that is receiving UDP messages and streaming them to Vector. At the moment, it's also adding newlines to the end of the messages as we're using
Various application and device logs, which go through some transforms in vector before ending up in storage/loki sinks. At the moment, this issue isn't impacting us very much because we have already implemented a workaround. That said, I do believe that if the current behaviour of the |
Thanks for the additional details @dekelpilli ! Using the newline delimited framing for that case makes sense in light of the known issues for UDP (which would be nice to fix). I agree that the connection based framing is not likely to be used much, but it is at least something senders using TCP have more control over than TCP packet splitting. That is, they can open a connection, send an "event", and close it but they can't guarantee that a single TCP packet will have a single event since TCP packets can be split in-transit. Given the above discussion, I'll close this issue, but I appreciate you raising it! |
I'm not aware of any sources that separate "steam segments" so I updated the language a bit to account for how the `tcp` mode of the `socket` source handles `bytes` framing. Reference: #17136 Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Opened #17745 to clarify the docs. |
I'm not aware of any sources that separate "steam segments" so I updated the language a bit to account for how the `tcp` mode of the `socket` source handles `bytes` framing. Reference: #17136 Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
A note for the community
Problem
As the title says, the
socket
source withbytes
framing doesn't seem to be behaving correctly when the mode istcp
.In addition to the tests detailed in the Discord link below, I also set up tests using vector on both sides. The data would only appear on the receiving Vector instance after the sending Vector instance was stopped, and even then all of the data would be concatenated into one event. I believe the netcat tests in the Discord also show that this is an issue with the
socket
source, rather than with the sink.Configuration
Receiving:
Sending:
Version
vector 0.28.2 (x86_64-apple-darwin 986dd37 2023-04-10)
Debug Output
No response
Example Data
I sent three separate http messages to the sending Vector:
abc
123
and
ab
. Once I shut down that Vector instance, the following data appeared in the receiving Vector:{"message":"abc123ab","timestamp":"2023-04-13T03:06:18.673551Z"}
Additional Context
No response
References
The text was updated successfully, but these errors were encountered: