-
Notifications
You must be signed in to change notification settings - Fork 30
/
spec.emu
86 lines (78 loc) · 4.39 KB
/
spec.emu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
<!doctype html>
<meta charset="utf8">
<link rel="stylesheet" href="./spec.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<script src="./spec.js"></script>
<pre class="metadata">
title: RegExp.escape
stage: 2
contributors: Jordan Harband
</pre>
<emu-clause id="sec-text-processing" number="22">
<h1>Text Processing</h1>
<emu-clause id="sec-regexp-regular-expression-objects" number="2">
<h1>RegExp (Regular Expression) Objects</h1>
<emu-clause id="sec-properties-of-the-regexp-constructor" number="5">
<h1>Properties of the RegExp Constructor</h1>
<ins>
<emu-clause id="sec-regexp.escape" number="2">
<h1>RegExp.escape ( _S_ )</h1>
<p>This method takes a string and returns a similar string in which each character that is potentially special in a regular expression |Pattern| has been replaced by an escape sequence representing that character.</p>
<p>It performs the following steps when called:</p>
<emu-alg>
1. If _S_ is not a String, throw a TypeError exception.
1. Let _cpList_ be StringToCodePoints(_S_).
1. Let _escapedList_ be a new empty List.
1. For each code point _c_ in _cpList_, do
1. If _escapedList_ is empty and _c_ is matched by |DecimalDigit|, then
1. Append the code point U+005C (REVERSE SOLIDUS) to _escapedList_.
1. Append the code point U+0078 (LATIN SMALL LETTER X) to _escapedList_.
1. Append the code point U+0033 (DIGIT THREE) to _escapedList_.
1. Append _c_ to _escapedList_.
1. Else,
1. Append the code points in EncodeForRegExpEscape(_c_) to _escapedList_.
1. Return CodePointsToString(_escapedList_).
</emu-alg>
<emu-note>
<p>`escape` takes a string and escapes it so it can be literally represented as a pattern. In contrast EscapeRegExpPattern (as the name implies) takes a pattern and escapes it so that it can be represented as a string. While the two are related, they do not share the same character escape set or perform similar actions.</p>
</emu-note>
</emu-clause>
<emu-clause id="sec-encode" type="abstract operation">
<h1>
EncodeForRegExpEscape (
_c_: a code point,
): a List of code points
</h1>
<dl class="header">
<dt>description</dt>
<dd>If _c_ represents a RegExp punctuator that needs escaping, or ASCII whitespace, it produces the code points for *"\x"* followed by the relevant escape code. If _c_ represents non-ASCII white space, it produces the code points for *"\u"* followed by the relevant escape code. Otherwise, it returns a List containing _c_.</dd>
</dl>
<emu-alg>
1. Let _codePoints_ be a new empty List.
1. Let _punctuators_ be the following String, which consists of every ASCII punctuator except U+005F (LOW LINE): *"(){}[]|,.?\*+-^$=<>\/#&!%:;@~'"`"*.
1. Let _toEscape_ be StringToCodePoints(_punctuators_).
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then
1. Append the code point U+005C (REVERSE SOLIDUS) to _codePoints_.
1. Let _hex_ be Number::toString(𝔽(_c_), 16).
1. If the length of _hex_ is 1 or 2, then
1. Set _hex_ to StringPad(_hex_, 2, *"0"*, ~start~).
1. Append the code point U+0078 (LATIN SMALL LETTER X) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Else if the length of _hex_ is 3 or 4, then
1. Set _hex_ to StringPad(_hex_, 4, *"0"*, ~start~).
1. Append the code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Else,
1. Append the code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
1. Append the code point U+007B (LEFT CURLY BRACKET) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Append the code point U+007D (RIGHT CURLY BRACKET) to _codePoints_.
1. Else,
1. Append _c_ to _codePoints_.
1. Return _codePoints_.
</emu-alg>
</emu-clause>
</ins>
</emu-clause>
</emu-clause>
</emu-clause>