Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Respect [absence of] flags semantics of regular expression #6

Closed
RReverser opened this issue Nov 21, 2015 · 12 comments
Closed

Respect [absence of] flags semantics of regular expression #6

RReverser opened this issue Nov 21, 2015 · 12 comments

Comments

@RReverser
Copy link

[Continuation of Twitter discussion]

I'd propose to always respect g and/or y flags, including in matchAll method in order to preserve existing semantics, as otherwise implicit addition of "global" flag specifically in this method might be pretty surprising for users. Also, this will reduce implementation complexity, as neither polyfill nor engine will need to reconstruct RegExp, but reuse existing one.

That is, I propose to yield all matches when g or y flag is present (same behavior as with exec loop) but yield only first match when it's not. This way behavior will be consistent with exec, match and replace calls, and it will make it possible to use same preconstructed regular expression object and expect same behavior in any of given methods.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2015

My initial thought was that "match all" implied that you wanted all the matches for the pattern, regardless of the global/sticky flags.

However, it also makes sense to me to respect the global flag, and when neither global or sticky are present, to simply return an iterator that yields only one match. This seems more in the spirit of conforming to the full RegExp API.

@abozhilov
Copy link

@ljharb if you respect global and sticky, then the name is more likely to be execAll and it should appear in RegExp.prototype instead.
Unfortunately neither String#matchAll nor RegExp#execAll are clear enough, that they return Iterator. Both lead to inconsistency with String#match and RegExp#exec.

I think the name should state clear enough that the method returns an Iterator.

In my opinion this method should be exposed as Symbol in RegExp.prototype, e.g. [Symbol.execIterator], then the library authors can expose it with name whatever they want. e.g. I'm going to implement Iter.exec in ES-Iter https://github.com/abozhilov/ES-Iter, but in my case the name is pretty clear what the method returns.

@RReverser
Copy link
Author

However, it also makes sense to me to respect the global flag, and when neither global or sticky are present, to simply return an iterator that yields only one match. This seems more in the spirit of conforming to the full RegExp API.

Yay!

To summarize related comment from parallel thread - #5 (comment) - what I propose is:

  • When g flag is absent, we conform to full API by returning just one match - this way regular expressions can be reused in a safe manner with the same behavior.
  • When g flag is present, same - we conform by returning all the matches, no matter where are they in the string.
  • When y flag is absent, we conform by returning only one match for /^regex/ - if it's present at the beginning of string.
  • When y flag is present, we conform by respecting lastIndex between calls, and thus for /^regex/y returning all consecutive matches. That means, while /g flag makes it possible to match all regex occurances in regex, regex something something regex, /y flag is different in the way that it will match only first three occurences in regexregexregex something something regex but not the last one, which is very useful for building simple tokenizers where you do want to match patterns only if they strictly follow previous matches, and not in any random position in the string.

P.S. Note that I'm not proposing anything new here - all the described situations just match existing behavior of .exec, except that with matchAll we can safely assume that it's executed against entire string and doesn't conflict with lastIndex from other calls.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2015

@abozhilov There's no convention in the language for what returns an iterator and doesn't - and all the prototype keys/values/entries methods don't convey it either. The committee does not want to use a naming convention for this purpose.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2015

@RReverser You have four items there. (1) and (3) I think I'm convinced by, and (2) is already in the proposal.

As for (4) - when i do var r = /regex/y; var str = 'regexregexregex something something regex'; and then r.exec(str); multiple times, I get the following:

  • ["regex"] with { input: str, index: 0 } and r.lastIndex === 5
  • ["regex"] with { input: str, index: 5 } and r.lastIndex === 10
  • ["regex"] with { input: str, index: 10 } and r.lastIndex === 15
  • null

I think it makes the most sense to be consistent with all four of these, and I think I am indeed convinced that this will make for the most useful, coherent, and consistent method, and I think the committee will be convinced by these arguments.

I'll leave this open until I've made the changes (which will take time since I'm still making the changes for Symbol.matchAll).

@abozhilov
Copy link

@ljharb I agree, but the name tends to confusion.

  1. Languages which not favor global flag, they use suffix all for global match. Python has re.findall, PHP preg_match_all. If the name is match All then it's better to yield all matches, but it leads to another inconsistencies.
  2. String#match returns array or null and matchAll does not hint, that it returns something different.
  3. If you change the semantic similar to RegExp#exec and keep the name matchAll then inconsistencies will be doubled, first with languages which not favor global and with String#match.

Please reconsider the name, at least to be discussed with the committee, thanks.

@RReverser
Copy link
Author

@abozhilov I believe name could be raise as a separate issue here in order not to pollute discussion thread. Personally, I agree that all might be confusing, and I like @WebReflection's proposal the most - matches - this would be consistent with other iterator methods we have on collection types and Object - values, entries etc.

@ljharb
Copy link
Member

ljharb commented Nov 23, 2015

The concern there is that matches sounds both like a noun - like entries or values - and like a verb, ie, "x matches y" - so the committee already considered and rejected that name. @abozhilov, the committee already discussed many names and this is the one we all agreed on.

Please do file a new issue about the naming, as this issue is solely about the g and y flags.

@littledan
Copy link
Member

OK, is the conclusion here to respect the original values for both g and y then? Personally, I would've expected that g is added and y doesn't take effect--it's just a little hard for me to see a use case otherwise, and it feels more like something to trip up on than a feature.

@ljharb
Copy link
Member

ljharb commented Apr 25, 2017

#6 (comment) is the conclusion we'd come to.

@ljharb ljharb self-assigned this Jul 29, 2017
@ljharb
Copy link
Member

ljharb commented Jul 29, 2017

To restate the conclusion: just like .exec, matchAll will work with all regexes no matter their flags, and it will respect the lastIndex (although it won't mutate lastIndex on the regex)

@littledan
Copy link
Member

I'm not really a fan of the resolution here. It just seems like the proposal here, and the recently merged spec text, will make the result harder to use. String.prototype.split is a good model for always adding the y flag; I could go either way on the g flag given the precedent there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants