New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid multibyte escape
with regexp literals
#134
Comments
@yujinakayama Could you please compose a table indicating which combinations of regexp bodies, regexp encoding options and source encoding options result in a deviation of parser's behavior from ruby's behavior? This would speed up solving of this issue a lot. |
@yujinakayama Ping. |
I'm really confused by the combinations of the parameters. 😱 Note that Parser was run on utf-8 source in this verification. |
@yujinakayama Awesome, thanks! I'll take a look, but do not expect quick solution. It's such a horrible mess it'll take a while to untangle it. |
Noise: I pray for ruby dropping non UTF-8 encoded source. I really have hope for rubinius-x here. |
@yujinakayama Fascinating. I've removed all the cases where Ruby and parser have identical behavior and categorized the leftover ones:
|
So the cases are:
|
@yujinakayama Actually, this is not a bug. You're parsing that file in the wrong mode. Look:
That file wouldn't actually run under 2.0, and parser in 1.9 mode handles it just fine. |
rubocop/rubocop#796
I've tried to fix this, but I'm not sure the best way.
Here's the points I've investigated so far:
/\xff/
in utf-8 source)US_ASCII
encoding rejects character codes that have non-zero 8th bit, but actually it strangely accepts such regexp and convert it toASCII-8BIT
encoding.The text was updated successfully, but these errors were encountered: