Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Why throw SyntaxErrors in 8.1 Runtime Semantics: GetSubstitution #29

Closed
msaboff opened this issue Sep 2, 2017 · 21 comments
Closed

Why throw SyntaxErrors in 8.1 Runtime Semantics: GetSubstitution #29

msaboff opened this issue Sep 2, 2017 · 21 comments

Comments

@msaboff
Copy link

msaboff commented Sep 2, 2017

In Table 1: Replacement Text Symbol Substitutions, SyntaxErrors are thrown in two situations. The first in the case that the $<groupName is not terminated by a closing '>'. The second is when the groupName is not a named group in the regular expression. This introduces the runtime first exceptions for the non-function flavor of String.prototype.replace.

It seems that both cases can be handled by defining alternative semantics. For the missing closing '>', treat the $<groupName as the replacement.

In the second case, use empty string as the replacement.

This would eliminate the need for programmers to wrap calls with try / catch for proper coding.

@littledan
Copy link
Member

These ideas are reasonable and consistent with other ways the replace string works, but I'm not sure if we should switch to them.

The current semantics were chosen based on a general design principle that, for new language features, we should be stricter and throw exceptions where it's reasonable. This is malformed syntax. Another example of something similar is how Unicode RegExps don't allow invalid escape sequences to just be ignored (e.g., RegExp("\k") is like RegExp("k"), but RegExp("\k", "u") is a SyntaxError). However, maybe this case isn't appropriate for such treatment, since the replace string has no way to be "compiled".

FWIW these aren't the only way exceptions can be thrown from this code path. WIth RegExp subclassing or monkey-patching, a groups object may be provided which doesn't behave appropriately, and leads exceptions to be thrown (either directly or from the ToString on the result). I don't see a good way to ignore this; it would be pretty extreme to actually catch and suppress these errors.

Any more thoughts? @schuay @hashseed @mathiasbynens

@littledan
Copy link
Member

We discussed part of your suggestion this bug thread: #14

@msaboff
Copy link
Author

msaboff commented Sep 11, 2017

It is clear from @littledan's reply and from bug thread #14, that there isn't an overwhelming drive for throwing a SyntaxError in the two cases.

First off, I would contend that SyntaxError is wrong for String.prototype.replace() and a malformed named replacement reference. In the case of new RegExp("("), throwing a SyntaxError makes sense as, but for String.replace() the proposed throwing of SyntaxError is not thrown during a strict parsing phase but during runtime processing of replacements to create the result. By using a Proxy or Subclassing, this processing is observable.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string. Having defined semantics for both the missing closing '>' and a properly formed group name that isn't present in the RegExp, would make be easier for interested programmers to validate proper operation of String.prototype.replace().

Lastly, I can accept that new features should throw errors when extending existing methods where appropriate. In this case though, the proposal describes throws a SyntaxError during the runtime processing of a String utility method. All other existing String methods throw either Range or Type errors. In each of those cases, the error is detected and thrown early in the processing of the method and with a clear reason for the error. This proposed SyntaxError is thrown during processing and only catches the first of what could be several errors and provides very little direct indication as to the cause.

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error. In existing code that calls String.prototype.replace() via helper methods where the RegExp and replacement string are passed as arguments, such code would require rework to catch the new error. Defining semantics for malformed named references allows such code to operate in the presence of malformed group names in replacement strings.

The code currently checked into JavaScriptCore implements the changes I propose. That code will be updated as appropriate based on the resolution of this issue.

@littledan
Copy link
Member

It is clear from @littledan's reply and from bug thread #14, that there isn't an overwhelming drive for throwing a SyntaxError in the two cases.

I agree; this is a sort of matter of taste.

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

@schuay
Copy link

schuay commented Sep 12, 2017

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

SyntaxError also seems reasonable to me. Why do you think TypeError would be more suitable?

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

+1, and an exception gives more information than just failing silently.

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

No, this makes no difference to our implementation in V8.

  1. We currently bail out to runtime to handle anything but trivial replacement strings (anything containing a '$' character).
  2. Even if we were to fully implement GetSubstitution on the fast path, it shouldn't make a difference whether it throws or not.

I still don't have a strong opinion either way. If there are good arguments for silent failure vs. throwing, then I'm happy to go with that.

@littledan
Copy link
Member

littledan commented Sep 12, 2017

@msaboff Would throwing an exception in the middle of processing pose more of a challenge to throw an error partway through for issues in the replacement string, than the Get for what the actual named capture is?

Some possible alternative semantics, if we want all the exceptions thrown upfront:

  • If the RegExp has named captures:
    • Scan the string for $<.
    • Throw on any malformed elements before doing the replace
    • ? Get() all of the named captures that are used in $<> elements and put them in a list, throwing an exception if something isn't present
  • Call ! GetSubstitution, which now will not throw

That preserves the current semantics of more actively giving users errors, while moving all exceptions to the beginning of the replacement.

Thoughts?

@schuay
Copy link

schuay commented Sep 12, 2017

-1 from me for upfront scanning & named group collection, seems like that would complicate things unnecessarily.

I'm not sure I understand what the issues are with throwing during result construction, @msaboff could you perhaps elaborate on that?

@msaboff
Copy link
Author

msaboff commented Sep 12, 2017

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

SyntaxError also seems reasonable to me. Why do you think TypeError would be more suitable?

It seems to me that TypeError is more consistent with existing uses elsewhere.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

+1, and an exception gives more information than just failing silently.

I said nothing about failing silently. My suggestion is that we define semantics for all forms of $ expressions in the replacement string. This would effectively eliminate "failing".

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

No, this makes no difference to our implementation in V8.

The same is true about JavaScriptCore. I don't not advocate a 2 pass solution to the problem. That would impact performance.

@mathiasbynens
Copy link
Member

It seems to me that TypeError is more consistent with existing uses elsewhere.

Could you give an example?

@msaboff
Copy link
Author

msaboff commented Sep 12, 2017

-1 from me for upfront scanning & named group collection, seems like that would complicate things unnecessarily.

I concur. I don't want upfront scanning for error and then replacement processing, i.e. two passes over the replacement string.

I'm not sure I understand what the issues are with throwing during result construction, @msaboff could you perhaps elaborate on that?

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist. It is a fundamental operational change in the API. That operational change is observable via proxies or subclassing. To take advantage of the new capabilities a programmer needs to not only update their RegExps and replacement strings, but they need to account for the new error path. By defining semantics for "malformed" group references, we introduce the functionality with much less change to the API.

Consider existing code with calls to String.prototype.replace where replacement strings contain "$<". Before this change, you'd get "$<" in the result. Now you get a SyntaxError. Although the occurrence of such replacement strings is arguably low, this is a breaking change.

@littledan
Copy link
Member

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist.

Is it at all a problem for language design or JSC's implementation that the Get from the groups object may throw, observably in the middle of the replacement? This is also a new error path that didn't exist previously. I get the point about how there were previously no syntax errors, and now there are, but Unicode RegExps made an analogous change.

Consider existing code with calls to String.prototype.replace where replacement strings contain "$<". Before this change, you'd get "$<" in the result. Now you get a SyntaxError. Although the occurrence of such replacement strings is arguably low, this is a breaking change.

The breakage would only happen if you use it with a new RegExp with named groups. So, this can break existing libraries, but it can't break existing deployed websites. Even with your proposed fix, if an existing replacement string contained $<foo> and you had a group named foo, it would be replaced, giving incorrect output.

@msaboff
Copy link
Author

msaboff commented Sep 12, 2017

It seems to me that TypeError is more consistent with existing uses elsewhere.

Could you give an example?

In 19.5.5.4 SyntaxError, it states:

Indicates that a parsing error has occurred.

In 19.5.5.5 TypeError is states:

TypeError is used to indicate an unsuccessful operation when none of the other NativeError objects are an appropriate indication of the failure cause

In almost all cases, SyntaxError is thrown for malformed Java Script or JSON source, malformed RegExp expression, failed parameter sanitization, and strict mode errors. In 8.1 Runtime Semantics: GetSubstitution the steps describe copy with replacement of $ prefixed strings.

This isn't a hill for me to die on though as I don't think we should be throwing any error.

@msaboff
Copy link
Author

msaboff commented Sep 12, 2017

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist.

Is it at all a problem for language design or JSC's implementation that the Get from the groups object may throw, observably in the middle of the replacement? This is also a new error path that didn't exist previously. I get the point about how there were previously no syntax errors, and now there are, but Unicode RegExps made an analogous change.

No, throwing from Get isn't an issue. The difference is that exceptions along Get and other property access paths are not new and well understood. My issue is the direct introduction of throwing an error in an algorithm that didn't have throws for any of the existing cases that also handle malformed $xxx replacement constructs.

@schuay
Copy link

schuay commented Sep 20, 2017

+cc @ajklein

@mathiasbynens
Copy link
Member

Now that implementations are starting to ship, let’s get this issue resolved.

I’m okay with making the proposed change.

@ajklein
Copy link

ajklein commented Oct 9, 2017

@msaboff's arguments sound good to me, I'd be happy with this spec change (and as @mathiasbynens says, I'd like to get this resolved so we can move forward with un-flagging V8's implementation).

@schuay
Copy link

schuay commented Oct 10, 2017

+1 from me as well.

@hashseed
Copy link

Also +1. Let's implement this and ship!

@schuay
Copy link

schuay commented Oct 10, 2017

kisg pushed a commit to paul99/v8mips that referenced this issue Oct 10, 2017
The specced semantics of GetSubstitution are expected to change in the
case of malformed named references, or named references to nonexistent
named groups. The former will evaluate to the identity replacement of
'$<', while the latter will result in replacement by the empty string.

See also:
tc39/proposal-regexp-named-groups#29

Bug: v8:5437, v8:6912
Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: I879288f775774cb0ec563f9d9129a99710efb77c
Reviewed-on: https://chromium-review.googlesource.com/708654
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48426}
@littledan
Copy link
Member

OK, glad we could resolve this. I'll fix up the spec and test262 tests soon to reflect the V8/JSC consensus of not throwing errors in these cases.

@littledan
Copy link
Member

littledan commented Oct 11, 2017

OK, committed a patch to attempt to match the implementations; will update tests soon. cc @ljharb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants