I've tried to fix this, but I'm not sure the best way.
Here's the points I've investigated so far:
@yujinakayama Could you please compose a table indicating which combinations of regexp bodies, regexp encoding options and source encoding options result in a deviation of parser's behavior from ruby's behavior? This would speed up solving of this issue a lot.
I'm really confused by the combinations of the parameters. 😱
Note that Parser was run on utf-8 source in this verification.
@yujinakayama Awesome, thanks! I'll take a look, but do not expect quick solution. It's such a horrible mess it'll take a while to untangle it.
Noise: I pray for ruby dropping non UTF-8 encoded source. I really have hope for rubinius-x here.
@yujinakayama Fascinating. I've removed all the cases where Ruby and parser have identical behavior and categorized the leftover ones:
So the cases are:
@yujinakayama Actually, this is not a bug. You're parsing that file in the wrong mode.
$ ruby -v
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
$ ruby -e '/\xff/'
-e:1: invalid multibyte escape: /\xff/
$ ./bin/ruby-parse --19 -e '/xff/'
$ ./bin/ruby-parse --19 -e 'if /\xff/ =~ foo; end'
(send nil :foo)) nil nil)
$ ./bin/ruby-parse --20 -e '/xff/'
$ ./bin/ruby-parse --20 -e 'if /\xff/ =~ foo; end'
Failed on: (fragment:0)
/home/whitequark/Work/parser/lib/parser/builders/default.rb:726:in `initialize': invalid multibyte escape: /\xff/ (RegexpError)
from /home/whitequark/Work/parser/lib/parser/builders/default.rb:726:in `new'
That file wouldn't actually run under 2.0, and parser in 1.9 mode handles it just fine.