RegExp match array offsets for ECMAScript
RegExp match array offsets provide additional information about the position of a
captured substring relative to the
index of the match.
An example implementation can be found in
regexp-measurewas built around the Stage 0 proposal and is no longer up to date with respect to the current proposed API design.
Champion: Ron Buckton (@rbuckton)
For detailed status of this proposal see TODO, below.
- Ron Buckton (@rbuckton)
RegExp objects can provide information about a match when calling the
method. This result is an
Array containing information about the substrings that were matched,
along with additional properties to indicate the
input string, the
index in the input at which
the match was found, as well as a
groups object containing the substrings for any named capture
However, there are several more advanced scenarios where this information may not
necessarily be sufficient. For example, an ECMAScript implementation of TextMate Language syntax
highlighting needs more than just the
index of the match, but also the offsets for individual
capture groups. We also have no mechanism to indicate whether a capture group was merely empty
vs. unmatched (either optional or in an unchosen alternative of a disjunction).
As such, we propose the addition of an optional second argument to
exec that would take function
callback that could be used to map the offsets of each capture to be used as the result in the
resulting match array. This callback would be supplied with three arguments: The start position
of the match within the input (
-1 if unmatched), the end position of the match within the input
-1 if unmatched), and the input string itself. The structure of the resulting match array
itself does not change (it still would have own
input, and (optional)
properties), but rather the value of each element would merely be the result as mapped through the
provided mapping function.
In addition, we propose a similar change to both
- Oniguruma NodeJS bindings:
const re1 = /a*(?<Z>z)?/; // offsets are relative to start of the match: const s1 = "xaaaz"; const m1 = re1.exec(s1, (start, end) => [start, end]); m1 === 1; m1 === 5; s1.slice(...m1) === "aaaz"; m1 === 4; m1 === 5; s1.slice(...m1) === "z"; m1.groups["Z"] === 4; m1.groups["Z"] === 5; s1.slice(...m1.groups["Z"]) === "z"; // capture groups that are not matched (either optional or in the unmatched alternative of a // disjunction) have an offset of -1: const m2 = re1.exec("xaaay", (start, end) => start === -1 ? null : [start, end]); m2 === null; m2.groups["Z"] === null; // the following two statements are functionally equivalent: re1.exec(text); re1.exec(text, (start, end, input) => input.slice(start, end));
The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:
Stage 1 Entrance Criteria
- Identified a "champion" who will advance the addition.
- Prose outlining the problem or need and the general shape of a solution.
- Illustrative examples of usage.
- High-level API.
Stage 2 Entrance Criteria
Stage 3 Entrance Criteria
- Complete specification text.
- Designated reviewers have signed off on the current spec text.
- The ECMAScript editor has signed off on the current spec text.