Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle carriage return characters according to the spec #12661

Merged
merged 5 commits into from
Feb 19, 2023

Conversation

moosichu
Copy link
Sponsor Contributor

@moosichu moosichu commented Aug 28, 2022

Taking guidance from: ziglang/zig-spec#38, carriage returns are now allowed preceding line-feeds when found in doc-comments or multiline strings. The CRLF is also only interpreted as a newline in this case. However - they are otherwise rejected when found in any other context except when used as plain whitespace.

This resolves #11414 and resolves #12674. It replaces #12692 which is a part of this PR.

Ideally the error reporting should be improved by #12449 sometime in the future, but I think I might take a look at that once this has been merged.

Copy link
Member

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test, you'll have to use string literals in test/compile_errors.zig.

@moosichu moosichu changed the title Fix crash when carriage return is encountered in multiline string Correctly handle carriage return characters according to the spec Aug 30, 2022
@moosichu moosichu force-pushed the fix/carriage-return-crash branch 3 times, most recently from 582b344 to 4af4034 Compare September 5, 2022 07:54
@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Sep 5, 2022

I've added a couple of tests that should hopefully do the trick, one to check the case that should work and the other to test the case that should error.

@moosichu moosichu force-pushed the fix/carriage-return-crash branch 2 times, most recently from cd43521 to 8fc2537 Compare September 5, 2022 12:31
@moosichu moosichu requested a review from Vexu September 5, 2022 12:32
@moosichu moosichu force-pushed the fix/carriage-return-crash branch 2 times, most recently from 65f116d to 965779a Compare September 5, 2022 12:35
@moosichu moosichu force-pushed the fix/carriage-return-crash branch 2 times, most recently from 7889cfc to 18dade0 Compare September 16, 2022 09:06
@andrewrk
Copy link
Member

CI checks are failing

@moosichu
Copy link
Sponsor Contributor Author

Yeah - I haven't had the time to address those test issues for the the past while, but should hopefully be able to at the end of this week if that's ok :) Sorry for the delay on this one!

@moosichu
Copy link
Sponsor Contributor Author

Things should be fixed now! Ended up being something really simple, but only just had the time to get to it now.

@andrewrk andrewrk self-assigned this Oct 31, 2022
@moosichu
Copy link
Sponsor Contributor Author

moosichu commented Dec 5, 2022

Are there any further changes desired here? Just double checking as I might have some time available to act on feedback this month if needed.

@moosichu
Copy link
Sponsor Contributor Author

Just resolved the merge conflicts that I saw had cropped-up (hopefully 🤞)

This resolves a crash that can occur for invalid bytes like carriage
returns that are valid characters when not parsed from within literals.

There are potentially other edge cases this could resolve as well, as
the calling code for this function didn't account for any potential
'pending_invalid_tokens' that could be queued up by the tokenizer from
within another state.
Follow the guidance of ziglang#38:

> However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

Zig fmt already had code for deleting carriage returns, but would still
crash - now it no longer does so. Carriage returns encountered before
line-feeds are now appropriately removed on program compilation as well.
Previous commit was much less strict about this, this more closely
matches the desired spec of only allow CR characters in a CRLF pair, but
not otherwise.
Missed this comment from ziglang/zig-spec#83:

> CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.
@Vexu Vexu merged commit 346ec15 into ziglang:master Feb 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants