Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal::file_reader fails on multi-lined Windows files #14

Closed
hGriff0n opened this issue Feb 7, 2016 · 4 comments
Closed

internal::file_reader fails on multi-lined Windows files #14

hGriff0n opened this issue Feb 7, 2016 · 4 comments
Assignees
Labels

Comments

@hGriff0n
Copy link

hGriff0n commented Feb 7, 2016

Hi, I appear to have found a slight bug in internal::file_reader.

If a file with multiple newlines is created on Windows, file_reader fails with "unable to fread() file size errno 0". Creating the same file on Linux, or converting with 'dos2unix', causes file_reader to successfully parse the file. Conversely, a single line Windows file will also parse successfully. The core of the problem seems to be due to Windows using "\r\n" to indicate newlines, or more specifically, the different ways fseek/ftell and fread count the "\r\n" sequence.

In file_reader::read(), fread is given the number of characters it is expected to read. That number is calculated in file_reader::size(), which uses fseek/ftell to count the number of characters in the file (counting "\r\n" as two characters). However, std::fread automatically converts all "\r\n" to '\n', causing it to actually read in a smaller memory chunk than told. Since it is unable to read in the specified number of characters, it returns 0, causing the if check to fail and throwing the error shown above. In all other respects, the read completed successfully.

Also, if you stop execution right before the throw and look at the constructed string, the last N characters of the string are '\0', with N being the exact number of newlines in the file.

@ColinH
Copy link
Member

ColinH commented Feb 7, 2016

Could you please check whether changing the second argument to std::fopen() in pegtl/internal/file_reader.hh line 64 from "r" to "rb" fixes the issue?

Of course then the line endings will be \r\n rather than \n, is that acceptable to you? Otherwise we could try using std::feof() instead of checking the number of read bytes.

@hGriff0n
Copy link
Author

hGriff0n commented Feb 7, 2016

Yeah, changing to "rb" fixes the issue. The grammar I'm using ignores the '\r' in those instances though.

@ColinH
Copy link
Member

ColinH commented Feb 8, 2016

Ok, I've changed it to "rb" in the PEGTL master branch, too.

@ColinH ColinH closed this as completed Feb 8, 2016
@ColinH ColinH self-assigned this Apr 6, 2016
@ColinH
Copy link
Member

ColinH commented Apr 6, 2016

We have now updated the eol rule to accept both Unix and DOS line endings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants