You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I appear to have found a slight bug in internal::file_reader.
If a file with multiple newlines is created on Windows, file_reader fails with "unable to fread() file size errno 0". Creating the same file on Linux, or converting with 'dos2unix', causes file_reader to successfully parse the file. Conversely, a single line Windows file will also parse successfully. The core of the problem seems to be due to Windows using "\r\n" to indicate newlines, or more specifically, the different ways fseek/ftell and fread count the "\r\n" sequence.
In file_reader::read(), fread is given the number of characters it is expected to read. That number is calculated in file_reader::size(), which uses fseek/ftell to count the number of characters in the file (counting "\r\n" as two characters). However, std::fread automatically converts all "\r\n" to '\n', causing it to actually read in a smaller memory chunk than told. Since it is unable to read in the specified number of characters, it returns 0, causing the if check to fail and throwing the error shown above. In all other respects, the read completed successfully.
Also, if you stop execution right before the throw and look at the constructed string, the last N characters of the string are '\0', with N being the exact number of newlines in the file.
The text was updated successfully, but these errors were encountered:
Could you please check whether changing the second argument to std::fopen() in pegtl/internal/file_reader.hh line 64 from "r" to "rb" fixes the issue?
Of course then the line endings will be \r\n rather than \n, is that acceptable to you? Otherwise we could try using std::feof() instead of checking the number of read bytes.
Hi, I appear to have found a slight bug in internal::file_reader.
If a file with multiple newlines is created on Windows, file_reader fails with "unable to fread() file size errno 0". Creating the same file on Linux, or converting with 'dos2unix', causes file_reader to successfully parse the file. Conversely, a single line Windows file will also parse successfully. The core of the problem seems to be due to Windows using "\r\n" to indicate newlines, or more specifically, the different ways fseek/ftell and fread count the "\r\n" sequence.
In file_reader::read(), fread is given the number of characters it is expected to read. That number is calculated in file_reader::size(), which uses fseek/ftell to count the number of characters in the file (counting "\r\n" as two characters). However, std::fread automatically converts all "\r\n" to '\n', causing it to actually read in a smaller memory chunk than told. Since it is unable to read in the specified number of characters, it returns 0, causing the if check to fail and throwing the error shown above. In all other respects, the read completed successfully.
Also, if you stop execution right before the throw and look at the constructed string, the last N characters of the string are '\0', with N being the exact number of newlines in the file.
The text was updated successfully, but these errors were encountered: