-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode escapes: support u{N...} #2823
Conversation
<tr> | ||
<td><code>\UNNNNNN</code></td> | ||
<td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td> | ||
<td><code>\u{NNNNNN}</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the clearest way to write this is. Could also be something like:
\u{N...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "1 or more digits" you have below is sufficient
}, | ||
else => { | ||
state = State.CharLiteralEnd; | ||
}, | ||
}, | ||
|
||
State.CharLiteralHexEscape => switch (c) { | ||
'0'...'9', 'a'...'z', 'A'...'F' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was a bug (found when new tests were added)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. thanks!
}, | ||
}, | ||
|
||
State.CharLiteralUnicodeInvalid => switch (c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a little creative here because I thought this behavior might prevent some confusing error output. If it doesn't actually help, I'd be totally fine removing this special state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run with this and see what happens.
break; | ||
} | ||
if (t.char_code > 0x10ffff) { | ||
tokenize_error(&t, "unicode value out of range: %x", t.char_code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this down to the else
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, easy merge
<tr> | ||
<td><code>\UNNNNNN</code></td> | ||
<td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td> | ||
<td><code>\u{NNNNNN}</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "1 or more digits" you have below is sufficient
}, | ||
else => { | ||
state = State.CharLiteralEnd; | ||
}, | ||
}, | ||
|
||
State.CharLiteralHexEscape => switch (c) { | ||
'0'...'9', 'a'...'z', 'A'...'F' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. thanks!
}, | ||
}, | ||
|
||
State.CharLiteralUnicodeInvalid => switch (c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run with this and see what happens.
On neither stage1 or stage2 did you reject UTF-16 surrogate pairs, 0xd800 - 0xdfff. |
@shawnl The purpose of this PR was to change the grammar, not introduce new validation logic. |
Closes #2129
TODO
Notes
0x10ffff
.\uNNNN
and\UNNNNNN
syntaxes were removed.