-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(file source): more robust handling of split reads #4089
Conversation
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
/// This function will attempt to read a new line from its file, blocking, | ||
/// up to some maximum but unspecified amount of time. `read_line` will open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function still block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still does multiple blocking filesystems reads, so yes. This comment predates the addition of the sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seams great, and the shutdown tests are failing because of an other PR, so they can be ignored for this one.
+1. |
break; | ||
} | ||
|
||
let sz = line.len(); | ||
trace!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could shift into an internal event perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we do events here is a little different since it's its own crate, but not a bad idea. I'll make a note to do that in the next set of changes.
Can somebody trigger container building the version with this fix, please? The last one in docker hub dated 5 days ago (24 sep) |
@mikhno-s Apologies! We're working on an issue with the nightly builds. Hopefully, it will be fixed tomorrow. |
Fixes #2992
This is an initial minimal-ish fix for the file source being too willing to return lines that did not end in a newline. Our original fix in #1236 was to wait up to a millisecond for the rest of a line to show up before considering it complete and returning it anyway. It turns out that in high-throughput situations, it's not uncommon for partial log lines to be visible for periods of time longer than this. This causes Vector to return a single logical line split across multiple events, which obviously breaks many kinds of downstream processing.
The problem with the sleep approach is that we can't know how long we need to sleep to see the rest of the line, and any amount of time we sleep is time we're not doing useful work, potentially lowering our overall throughput. This PR eliminates the sleep, instead opting to make a single logical line read operation able to be suspended and resumed across calls. The core of that change is to switch from a single line buffer to one that's specific to each file watcher. This means the intermediate state won't get overwritten between calls and we're safe to return control to the main loop.
This opens up a couple of edge cases that will require a larger change to handle, but they generally seem very unlikely to cause issues. Mostly, we are less likely to return legitimately interrupted writes as their own event. These are likely extremely uncommon, and will now be prepended to the following line, just as they appear if you read the file normally. If there is no following line, we won't flush the partial message until the file is deleted, which may not happen. If Vector is shut down before it sees another newline, it doesn't have a mechanism to flush this intermediate state.
There's some further refactoring here that should happen, but I cut it short in favor of getting the fix out sooner:
BytesMut
instead of double buffering viaBufReader
read_until_with_max_size
should be inlined intoread_line
because the boundary doesn't really make sense anymoreBoth these refactoring and plugging any edge cases can happen as part of the larger file source cleanup that should be coming up soon.