Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expressions with backreferences to a group including .* and \n are matching where they shouldn’t #6239

Open
chris-morgan opened this issue Jun 11, 2020 · 4 comments

Comments

@chris-morgan
Copy link

Describe the bug
Backreferences are supposed to match the text that the nominated group matched. But once .* and \n are involved in the group, the regular expression engine can match different text in the backreference, allowing the backreference to kind of expand the .* a second time.

To Reproduce
Detailed steps to reproduce the behavior:

  1. Run vim --clean (or gvim --clean, etc.)
  2. Insert the following:
    foo
    bar
    barnaby
    baz
    
  3. Search for any duplicated lines: /\(^.*\n\)\1<Enter>
  4. Observe that this matches bar\nbarnaby\n. (It’s like it searched for \(^.*\)\n\1.*\n instead.)

Expected behavior
There should be no matches: in the case that did match, \1 is bar\n, which is different from “barn\n”.

(\(^.*$\)\n\1\n does not exhibit this bug.)

Environment (please complete the following information):

  • Vim 8.2.814
  • OS: Arch Linux
@chrisbra
Copy link
Member

can confirm and the 'regexpengine' setting does not seem to make a difference

@jlittlenz
Copy link

As well, (^.*\n)\1 does not match duplicated lines at the end of the file.

@mvduin
Copy link

mvduin commented Jul 2, 2021

It gets weirder, the pattern ^\(.*\n\)\1xyz matches neither

foo
foo
xyz

nor

foo
foobar
xyz

yet the pattern ^\(.*\n\)\1fyz matches both of these.

@mvduin
Copy link

mvduin commented Jul 2, 2021

It looks like the bug has nothing to do with .*, just with backreferences to groups ending in a newline: @chris-morgan's example still works if you search for ^\(bar\n\)\1 instead of ^\(.*\n\)\1. Similarly, replacing .* by foo in my previous comment does not change the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants