Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler asserts & crashes on multiline string containing CRLF line ending (mainly affects Windows) #12550

Closed
moosichu opened this issue Aug 21, 2022 · 9 comments
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@moosichu
Copy link
Sponsor Contributor

moosichu commented Aug 21, 2022

Zig Version

0.10.0-dev.8439+4a98385b0

Steps to Reproduce

A simple build.zig containing:

const test_multiline =
    \\//
    \\// This file has been automatically generated by build.zig
    \\//
    \\
;

will do the trick (just make sure line endings are CRLF).

Expected Behavior

Zig not to crash. Although not sure how it should deal with this case, should they be converted to LF line endings automatically or something else?

Actual Behavior

zig crashes with the following output:

thread 14660 panic: reached unreachable code
$PATH_TO_ZIG\zig\lib\std\debug.zig:281:14: 0x7ff7dd112a79 in assert (zig.exe.obj)
    if (!ok) unreachable; // assertion failure
             ^
$PATH_TO_ZIG\zig\lib\std\zig\Ast.zig:120:11: 0x7ff7dd3ba563 in tokenSlice (zig.exe.obj)
    assert(token.tag == token_tag);
          ^
$PATH_TO_ZIG\zig\src\Module.zig:3726:63: 0x7ff7dd661ff7 in astGenFile (zig.exe.obj)
            const bad_off = @intCast(u32, file.tree.tokenSlice(parse_err.token + @boolToInt(parse_err.token_is_prev)).len);
                                                              ^
$PATH_TO_ZIG\zig\src\Compilation.zig:3281:19: 0x7ff7dd66570e in workerAstGenFile (zig.exe.obj)
    mod.astGenFile(file) catch |err| switch (err) {
                  ^
$PATH_TO_ZIG\zig\src\ThreadPool.zig:83:13: 0x7ff7dd6655b2 in runFn (zig.exe.obj)
        arguments: Args,
            ^
$PATH_TO_ZIG\zig\src\ThreadPool.zig:129:18: 0x7ff7dd445786 in worker (zig.exe.obj)
            runFn(&run_node.data);
                 ^
$PATH_TO_ZIG\zig\lib\std\Thread.zig:393:13: 0x7ff7dd63d551 in callFn__anon_103843 (zig.exe.obj)
            @call(.{}, f, args);
            ^
$PATH_TO_ZIG\zig\lib\std\Thread.zig:504:30: 0x7ff7dd446040 in entryFn (zig.exe.obj)
                    .detached => self.thread.free(),
                             ^
Unable to dump stack trace: FileNotFound
@moosichu moosichu added the bug Observed behavior contradicts documented or intended behavior label Aug 21, 2022
@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Aug 21, 2022

So looking at the issue in a debugger, the token_tag seems to a nonsense value here - so going to read the code around how the token_tags array is generated and try and found out why it would be so broken here.

image

@Vexu Vexu added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Aug 21, 2022
@Vexu Vexu added this to the 0.10.0 milestone Aug 21, 2022
@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Aug 21, 2022

This does seem to be Windows only having tested on other platforms, so will come back to it once #12420 has been resolved, as those test fixes might just end up fixing this anyway. But if not will happily work on trying to debug this.

@nektro
Copy link
Contributor

nektro commented Aug 21, 2022

does it compile if you run zig fmt .? zig compiler does not accept files with windows line endings

@moosichu
Copy link
Sponsor Contributor Author

Doing that causes the same panic to happen (almost identical stack trace).

@moosichu moosichu changed the title Immediate assert when trying to build personal project on Windows Compiler asserts & crashes on multiline string containing CRLF line ending (mainly affects Windows) Aug 24, 2022
@moosichu
Copy link
Sponsor Contributor Author

zig compiler does not accept files with windows line endings

So this was the issue in the end (finally had the time to do some more investigation), but not sure how it should deal with this (obviously a crash is not the right solution here).

@nektro
Copy link
Contributor

nektro commented Aug 24, 2022

compile error is the ideal

workaround is including this in your project's root .gitattributes

* text=auto
*.zig text eol=lf

@moosichu
Copy link
Sponsor Contributor Author

I've managed to get something along the lines of this working locally:

...\build.zig:1:36: error: expected test, comptime, var decl, or container field, found 'carriage return'
const builtin = @import("builtin");
                                   ^
...\build.zig:1:36: note: invalid byte: '\r'

Although the goal now is to flesh out the error message to make it more descriptive. Will clean up what I have later & make a PR in a bit.

@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Aug 27, 2022

The more I think about this the more that I think an error is the wrong approach, especially when considering programmer intent & the first time user experience (especially for someone learning zig, or even learning to program using zig!).

Our options are:

  1. Accept CRLF line-endings, and treat them as-is when found in multiline string literals.
  2. Accept CRLF line-endings, but convert the to LF when found in multiline string literals.
  3. Accept CRLF line-endings, but error if found in multiline string literals.
  4. Error on any occurence of "CR".

Note: It's multiline string literals in particular that make this a more nuanced problem.

Option 1: Accept CRLF line-endings as-is

Benefits:

  • Simple & easy to reason about. Works are you would expect in isolation.

Cons:

  • Breaks the cross-compilation ethos that zig strives for. Creates situations where you could download the same source-code on Windows & have that compile to a different program than it would if compiled from any other host. This would be a common case as well as git's default settings on Windows will do this. Also worth bearing in mind that if you were to cross-compile to another target from Windows, you would now have multiline string literals with CRLF line-endings encoded into your program in this case as well!

Conclusion:
My opinion is that the fact that in the default case, people checking-out a project on Windows could unknowingly be effectively compiling a different program than you would otherwise expect in an ecosystem that so heavily focuses on the determinism & simplicity of the cross-platform experience makes this a no-go as a solution.

Option 2: Accept CRLF line-endings but convert to LF in program

Benefits:

  • For the most common cases, everything "just works":
    • When checking-out a program on Windows using git's default settings.
    • When creating a zig program in any text edtior & compiling it.
    • When cross-compiling a program from Windows to any other platform.
  • 99.9% of the time, "CRLF" characters in a text file mean "I want a newline" not "I want the CRLF bytes specifically" here.
  • The documentation can make a note of this fact.

Cons:

  • In that 0.1% of the time where you do care about the CRLF character in a multiline string being preserved, the compiler could be doing something unexpected. (If that case even exists.).

Conclusion:
I think that con is such an edge-case in practise, that is unlikely to really ever effect anyone & can be easily fixed. Additionally if the documentation just makes it clear that is what happens, I don't see the problem.

On the other hand, the first-time user experience of zig is super important. It's the kind of program I wish I had when first setting-out into learning lower-level programming. Whilst the language itself is really cool, it's the tooling & the fact that you just need a single binary to build any zig (& C/C++!) from any platform, for any platform that really sets it apart from the rest.

Imagine the case of a beginner programmer, someone wanting to contribute to a project for the first or maybe even just creating a simple program in notepad following a tutorial. They are likely using Windows, and don't necesarily know about different line-ending encodings yet. To me the idea that in the default case, zig would error creates such an unnecessary barrier to entry out of concern for an inredibly rare-edge case would be getting the priorities wrong.

It's worth pointing at that with regards to the status quo, the documentation says that CRLF characters are OK but discouraged. Updating that to say the CRLF characters in multiline string literals will be encoded as LF characters would be fairly easy.

Option 3: Accept CRLF line-endings, but error if found in multiline string literals.

Pros:

  • Programs without multiline string literals will still work, and programs aren't being changed from their intent in any-case.

Cons:

  • Creates a weirld inconsistent experience, and all the cons of erroring would still apply, but now only when coming across a multline string.

Conclusion:
This has the same issues as option 4, but introduces the inconsistency of those issues only cropping up if a certain langauge feature is used.

Option 4: Error on any occurence of "CR".

Pros:

  • When a program compiles, the intent is incredibly clear with 0 ambiguity.

Cons:

  • The default experience on Windows will cause a compile error, which can be non-trivial to fix.

Conclusion:
At first this seems like the correct choice, but I think the "non-trivial" part of the con is underestimated by experienced programmers with extensive experience in both git & zig. When I first came across this issue, it was super easy for me to fix as I know about line-endings, I know how a .gitattributes file works & how to change these things in a text editor.

However, a big part of zig's appeal to me is that fact that it's a great learning tool for aspiring lower-level programmers, who may not have that experience. I think stuff like this can introduce friction in a very non-intuitive way (which other language errors in this case?) that is unnessecary.

@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Aug 27, 2022

Closing this as it's a dupe of #11414

However, also looking at ziglang/zig-spec#38:

However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

Seems like option 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

No branches or pull requests

3 participants