Skip to content

Support \r\n-style newlines in VAST #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2020
Merged

Support \r\n-style newlines in VAST #865

merged 1 commit into from
May 13, 2020

Conversation

lava
Copy link
Member

@lava lava commented May 11, 2020

Support both Unix-, Windows- and Mac-style line endings
for all input formats, either uniformly or mixed.

NOTE: I'd actually like to run benchmarks for this change before merging, because I'm not sure if the libc-readline has some "magic sauce" that makes it far more performant.

@lava lava force-pushed the story/ch12203 branch from 7cafa48 to 1eec3b4 Compare May 11, 2020 15:09
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works.

Another way to go after this would be to write a custom line_parser in our parser DSL. It's not as generic as this one, but way faster. For now, what you added is good enough because we will split the I/O path from the parsing soon, so that we only need to focus on buffers of bytes eventually. Then we no longer need this heavy-weight I/O stream machinery from the standard library.

@lava
Copy link
Member Author

lava commented May 11, 2020

@mavam Where would the input to the line_parser come from in this scenario? It looks like the parser would still have to pull out the line byte-by-byte from the ifstream, so i'm not sure where the speedup could come from.

@mavam
Copy link
Member

mavam commented May 11, 2020

Where would the input to the line_parser come from in this scenario?

You'd always work on a contiguous block of memory. Better cache utilization. Also no virtual function calls.

The new API will forward a chunk that contains a block of lines to the format parser. Then we can put the \r\n logic simply at the end of an existing parser, put a kleene star around it, and have an efficient parser line-based parser.

@mavam
Copy link
Member

mavam commented May 11, 2020

Switching to this API will still take a bit. @dominiklohmann and I have talked about it extensively and we're looking forward to loop you in on this.

@dominiklohmann
Copy link
Member

Switching to this API will still take a bit. @dominiklohmann and I have talked about it extensively and we're looking forward to loop you in on this.

JFYI: Said looping-in has taken place this morning.

On this PR, I think this is a good way to fix for our current API. I'll let @mavam have the last word on this, but all I'm missing here is a changelog entry.

@mavam
Copy link
Member

mavam commented May 12, 2020

Yep, I think we should mention that we now support \r\n and merge.

(Modulo failing checks.)

@lava lava force-pushed the story/ch12203 branch from 1eec3b4 to 99b62f3 Compare May 12, 2020 09:19
@lava lava force-pushed the story/ch12203 branch from 99b62f3 to b8ac60d Compare May 12, 2020 16:16
@lava lava marked this pull request as ready for review May 12, 2020 16:18
Support both Unix-, Windows- and Mac-style line endings
for all input formats, either uniformly or mixed.
@lava lava force-pushed the story/ch12203 branch from b8ac60d to 4e4e5b6 Compare May 12, 2020 16:19
@lava lava merged commit db254cb into master May 13, 2020
@lava lava deleted the story/ch12203 branch May 13, 2020 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants