Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexpengine=2 with whole pattern (\@>) and multi-line (\_.) atom fails to match when searching #3651

Open
inkarkat opened this issue Dec 1, 2018 · 0 comments

Comments

@inkarkat
Copy link

inkarkat commented Dec 1, 2018

Example

I've tried to match a BEGIN...END block only if there's no MIDDLE inside; like the second block in the following example:

BEGIN
    some
    MIDDLE
    stuff
END
BEGIN
    other
    stuff
END

I came up with the following regular expression, using the whole pattern multi (\@>) to avoid backtracking and a non-greedy multi-line atom (\_.\{-}) to match just a single block.

/\%(BEGIN\_.\{-}\%(MIDDLE\|END\)\)\@>\%(END\)\@<=/

In Vim version 8.1.553 (huge version with GTK2 GUI, on Ubuntu 16.04.5 LTS x64), this only matches with the old engine ('regexpengine' set to 1); it does not match with the new, NFA-based one. I think this is a bug; the behavior should not differ.

Investigations

The problem is not triggered when the entire match is on a single line. The following joins the text into one line, and this gives 1 as expected:

vim --clean \
    -c 'let @x = "BEGIN some MIDDLE stuff END BEGIN other stuff END "' \
    -c 'silent put! x' \
    -c 'echomsg search("\\%#=2\\%(BEGIN\\_.\\{-}\\%(MIDDLE\\|END\\)\\)\\@>\\%(END\\)\\@<=")'

The problem is only triggered when doing text searches (with /, or :echo search(), but not with =~ or matchstr(). The following gives old:1 new:1:

vim --clean \
    -c 'let @x = "BEGIN\nsome\nMIDDLE\nstuff\nEND\nBEGIN\nother\nstuff\nEND\n"' \
    -c 'echon " old:" @x =~ "\\%#=1\\%(BEGIN\\_.\\{-}\\%(MIDDLE\\|END\\)\\)\\@>\\%(END\\)\\@<="' \
    -c 'echon " new:" @x =~ "\\%#=2\\%(BEGIN\\_.\\{-}\\%(MIDDLE\\|END\\)\\)\\@>\\%(END\\)\\@<="'

How to reproduce

The following two invocations use the different engines; the first correctly gives 6, the second wrongly 0:

vim --clean \
    -c 'let @x = "BEGIN\nsome\nMIDDLE\nstuff\nEND\nBEGIN\nother\nstuff\nEND\n"' \
    -c 'silent put! x' \
    -c 'echomsg search("\\%#=1\\%(BEGIN\\_.\\{-}\\%(MIDDLE\\|END\\)\\)\\@>\\%(END\\)\\@<=")'
vim --clean \
    -c 'let @x = "BEGIN\nsome\nMIDDLE\nstuff\nEND\nBEGIN\nother\nstuff\nEND\n"' \
    -c 'silent put! x' \
    -c 'echomsg search("\\%#=2\\%(BEGIN\\_.\\{-}\\%(MIDDLE\\|END\\)\\)\\@>\\%(END\\)\\@<=")'

I tried to write an automated test for it, but I only found coverage of old-vs-new regexp engines in (old-style) test44.in / test99.in, and this multi-line example here doesn't seem to fit in with that very well. I guess you'll be able to quickly build a new-style test from the code above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant