Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(file source): more robust handling of split reads #4089

Merged
merged 4 commits into from Sep 28, 2020

Conversation

lukesteensen
Copy link
Member

Fixes #2992

This is an initial minimal-ish fix for the file source being too willing to return lines that did not end in a newline. Our original fix in #1236 was to wait up to a millisecond for the rest of a line to show up before considering it complete and returning it anyway. It turns out that in high-throughput situations, it's not uncommon for partial log lines to be visible for periods of time longer than this. This causes Vector to return a single logical line split across multiple events, which obviously breaks many kinds of downstream processing.

The problem with the sleep approach is that we can't know how long we need to sleep to see the rest of the line, and any amount of time we sleep is time we're not doing useful work, potentially lowering our overall throughput. This PR eliminates the sleep, instead opting to make a single logical line read operation able to be suspended and resumed across calls. The core of that change is to switch from a single line buffer to one that's specific to each file watcher. This means the intermediate state won't get overwritten between calls and we're safe to return control to the main loop.

This opens up a couple of edge cases that will require a larger change to handle, but they generally seem very unlikely to cause issues. Mostly, we are less likely to return legitimately interrupted writes as their own event. These are likely extremely uncommon, and will now be prepended to the following line, just as they appear if you read the file normally. If there is no following line, we won't flush the partial message until the file is deleted, which may not happen. If Vector is shut down before it sees another newline, it doesn't have a mechanism to flush this intermediate state.

There's some further refactoring here that should happen, but I cut it short in favor of getting the fix out sooner:

  1. We should read directly into the BytesMut instead of double buffering via BufReader
  2. read_until_with_max_size should be inlined into read_line because the boundary doesn't really make sense anymore

Both these refactoring and plugging any edge cases can happen as part of the larger file source cleanup that should be coming up soon.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Comment on lines 149 to 150
/// This function will attempt to read a new line from its file, blocking,
/// up to some maximum but unspecified amount of time. `read_line` will open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function still block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still does multiple blocking filesystems reads, so yes. This comment predates the addition of the sleep.

Copy link
Contributor

@ktff ktff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seams great, and the shutdown tests are failing because of an other PR, so they can be ignored for this one.

@jamtur01
Copy link
Contributor

+1.

break;
}

let sz = line.len();
trace!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could shift into an internal event perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we do events here is a little different since it's its own crate, but not a bad idea. I'll make a note to do that in the next set of changes.

@lukesteensen lukesteensen merged commit c5ec225 into master Sep 28, 2020
@lukesteensen lukesteensen deleted the fix-file-source branch September 28, 2020 15:59
@mikhno-s
Copy link

Can somebody trigger container building the version with this fix, please? The last one in docker hub dated 5 days ago (24 sep)
Thanks

@jamtur01
Copy link
Contributor

@mikhno-s Apologies! We're working on an issue with the nightly builds. Hopefully, it will be fixed tomorrow.

mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
)

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File source have problems with reading files under high load
5 participants