Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Investigate need for reflection/genericity over groups #12

Open
littledan opened this issue Jan 25, 2017 · 9 comments
Open

Investigate need for reflection/genericity over groups #12

littledan opened this issue Jan 25, 2017 · 9 comments

Comments

@littledan
Copy link
Member

Raised in various ways by @bterlson and @ljharb at the January 2017 TC39 presentation.

@ljharb
Copy link
Member

ljharb commented Jan 25, 2017

Some thoughts:

const { a, b, c } = match.groups; // if it's always an object
const { a, b, c } = match.groups || {}; // if it's absent when there's no matches

const { groups: { a, b, c } } = match; // if it's always an object
const { groups: { a, b, c } = {} } = match; // if it's absent when there's no matches

One possibly performant way we could make it always be an object is, instead of making groups an own data property, making its property descriptor be something like:

{
  enumerable: true,
  configurable: true,
  get() {
    if (!this.[[matchedGroups]]) { // internal slot
      this.[[matchedGroups]] = {};
    }
    return this.[[matchedGroups]];
  },
  set(value) {
    Object.defineProperty(this, 'groups', { enumerable: true, configurable: true, value, writable: true });
  }
}

However, given that match.groups || {} or = {} isn't that problematic, i think it's not that big a deal that it always be an object (but it'd be nice).

@schuay
Copy link

schuay commented Feb 23, 2017

From an efficiency point of view, I'd still argue for keeping the groups object as a plain JS object with named captures added as data properties by RegExpBuiltinExec (i.e. the current proposal). The object should be added only when the RegExp contains named captures.

In V8, RegExpResult creation (as well as other RegExp glue code) is now done in CodeStubAssembler, which means we're limited in what we can do without bailing out to (slow) runtime. For instance, we can efficiently construct objects from existing maps, but map manipulations (such as adding accessors at runtime) have to be done in C++.

Additional advantages of using a plain JS object with properties:

  • Avoids messy logic related to modified properties on the result object, e.g.: modifying a numbered capture on the result object never affects named captures on the result:
var result = /(?<a>.)/u.exec("x");
result[1] = "42";
result.group.a;  // Still "x".
  • Interacts nicely with destructuring and the replace method.

Disadvantages:

  • CreateDataProperty currently still requires a slow runtime call. A fast path seems feasible for dictionary mode objects (which the groups object currently is). This would require some effort on V8 implementer side.
  • Adds unconditional overhead to RegExpBuiltinExec to check whether we need to construct a groups object. Currently this is an object load, a fixed array element load, and a conditional jump. We could probably optimize this to avoid the element load.

Implementation-wise, a 'group(name)' getter installed on the result object itself which references an internal name-string map is probably more efficient, but also less convenient for users.

@schuay
Copy link

schuay commented Feb 23, 2017

More context on the previous post: this was in response to an offline discussion about different possibilities of exposing named captures on the result object.

Some of these were:

  • A result.group(name) getter.
  • A result.group object with named getters on its proto, possibly lazily constructed.
  • A result.group Map object.
  • Storing named capture as data properties on result itself.
  • A result.group plain JS object with data properties.

The previous reply argues for the last option.

@littledan
Copy link
Member Author

At the January 2017 TC39 meeting, we discussed some other properties that we might like to have from the groups property (cc @bterlson and @michaelficarra). I've talked it over with some of the current and former V8 team (@hashseed, @schuay and @ErikCorryGoogle) and concluded that we should actually stay with the current specification. The proposed changes and responses are:

  • Creating the keys on the groups object only for the groups that are present in the RegExp: The counterargument is that, since numbered groups are all added and initialized to undefined, we should do the same for named groups. What makes named groups different? Further, if we always make all of the groups properties, then there will be a single "hidden class" for the groups object, making subsequent property accesses faster in V8 (I believe other engines do similar optimizations).
  • Always making a groups property, which is an empty object (possibly based on a class hierarchy): from @hashseed:

by adding the groups object unconditionally, we would retroactively change the behavior of existing code that use any regexp. A frozen empty object is weird, because I expect the ordinary groups object to be mutable.

@ljharb
Copy link
Member

ljharb commented Feb 23, 2017

@littledan the resolution from the first bullet implies that all named groups will be present on the object, which makes sense to me. Does the second imply that there will be an unconditional unfrozen groups object on every match object, or that it will only be created on match objects when the regex had named groups?

@littledan
Copy link
Member Author

littledan commented Feb 23, 2017

That it will only be created on match objects when the RegExp had named groups. The quoted section could be rephrased to, "If we added the groups..."

@ljharb
Copy link
Member

ljharb commented Feb 23, 2017

Sounds good.

@getify
Copy link

getify commented Nov 17, 2017

Is there any concern over behavior ambiguity where the groups object, if created, has a __proto__ of Object.prototype, and where a named group could be of the name "__proto__" or "toString", or otherwise shadowing any of the other built-ins on Object.prototype?

What if a named group property name is attempted to be added to groups, but the Object.prototype has been extended with a setter (or writable:false property descriptor) of that same name?

Wouldn't it be simpler if the groups object was defined to be a plain dictionary object with __proto__ of null, like Object.create(null), to avoid these quirks? Or couldn't it argue more in favor of that above-mentioned option of it being a Map instance?

@littledan
Copy link
Member Author

I agree about the proto of groups. Actually, it already is null, see https://tc39.github.io/proposal-regexp-named-groups/#sec-regexpbuiltinexec

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants