Skip to content

Don't overwrite line content after a read timeout #1276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 14, 2021

Conversation

tobim
Copy link
Member

@tobim tobim commented Jan 11, 2021

📔 Description

Fix incomplete events when a timeout is hit in the middle of a line.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

By commit. You can check before / after with:

cat conn.log | pv -L 1000 | vast import zeek

@tobim tobim added the bug Incorrect behavior label Jan 11, 2021
@tobim tobim requested a review from lava January 11, 2021 17:10
Copy link
Member

@lava lava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general while this should fix the issue where a valid line has a large pause while it is read, it looks like it symmetrically breaks the use where a line is interrupted and never completed, so the next valid line is discarded because it has junk in the beginning.

I wonder if we shouldn't have two separate timeouts for these cases, e.g. something like 30s to wait for the completion of a line and 500ms until we trigger a flush of the current table slice that was created from all finished lines.

Otherwise, if we're convinced that the first kind of error is more important to have fixed than the second kind, I'd also be ok with merging this.

@tobim
Copy link
Member Author

tobim commented Jan 13, 2021

it looks like it symmetrically breaks the use where a line is interrupted and never completed, so the next valid line is discarded because it has junk in the beginning.

From my understanding it only breaks if we hit a timeout in the middle of a line and the next data that arrives already belongs to the next line/event. I assume that to be extremely rare.

@tobim tobim force-pushed the story/ch21321/broken-lines-on-timeout branch 2 times, most recently from 4722bb7 to 192b09a Compare January 13, 2021 21:36
tobim added 5 commits January 14, 2021 10:36
The reader implementations got incomplete lines on resuming after
a timeout before this change. The old code cleared the line content
on every call to getline, which is incorrect if the content wasn't
parsed yet.
This reflects the changed behavior of not clearing the content of the
output argument any more.
@tobim tobim force-pushed the story/ch21321/broken-lines-on-timeout branch from 192b09a to 8acec7f Compare January 14, 2021 09:37
Copy link
Member

@lava lava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some external discussion it seems like the regression is an intentional trade-off, so no objections from me.

@tobim tobim merged commit 2ce9ddb into master Jan 14, 2021
@tobim tobim deleted the story/ch21321/broken-lines-on-timeout branch January 14, 2021 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants