Skip to content

Commit 6939b44

Browse files
authored
Merge pull request #625 from gibson042/2021-11-ascii-consistency
Editorial: Improve case sensitivity and conversion
2 parents 2703d06 + 746b610 commit 6939b44

File tree

3 files changed

+54
-28
lines changed

3 files changed

+54
-28
lines changed

spec/locale-sensitive-functions.html

Lines changed: 50 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ <h1>String.prototype.localeCompare ( _that_ [ , _locales_ [ , _options_ ] ] )</h
2424
</p>
2525

2626
<emu-alg>
27-
1. Let _O_ be RequireObjectCoercible(*this* value).
27+
1. Let _O_ be ? RequireObjectCoercible(*this* value).
2828
1. Let _S_ be ? ToString(_O_).
2929
1. Let _thatValue_ be ? ToString(_that_).
3030
1. Let _collator_ be ? Construct(%Collator%, &laquo; _locales_, _options_ &raquo;).
@@ -56,34 +56,54 @@ <h1>String.prototype.toLocaleLowerCase ( [ _locales_ ] )</h1>
5656
</p>
5757

5858
<emu-alg>
59-
1. Let _O_ be RequireObjectCoercible(*this* value).
59+
1. Let _O_ be ? RequireObjectCoercible(*this* value).
6060
1. Let _S_ be ? ToString(_O_).
61-
1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_).
62-
1. If _requestedLocales_ is not an empty List, then
63-
1. Let _requestedLocale_ be _requestedLocales_[0].
64-
1. Else,
65-
1. Let _requestedLocale_ be DefaultLocale().
66-
1. Let _noExtensionsLocale_ be the String value that is _requestedLocale_ with any Unicode locale extension sequences (<emu-xref href="#sec-unicode-locale-extension-sequences"></emu-xref>) removed.
67-
1. Let _availableLocales_ be a List with language tags that includes the languages for which the Unicode Character Database contains language sensitive case mappings. Implementations may add additional language tags if they support case mapping for additional locales.
68-
1. Let _locale_ be BestAvailableLocale(_availableLocales_, _noExtensionsLocale_).
69-
1. If _locale_ is *undefined*, let _locale_ be *"und"*.
70-
1. Let _cpList_ be a List containing in order the code points of _S_ as defined in es2022, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>, starting at the first element of _S_.
71-
1. Let _cuList_ be a List where the elements are the result of a lower case transformation of the ordered code points in _cpList_ according to the Unicode Default Case Conversion algorithm or an implementation-defined conversion algorithm. A conforming implementation's lower case transformation algorithm must always yield the same _cpList_ given the same _cuList_ and locale.
72-
1. Let _L_ be a String whose elements are the UTF-16 Encoding (defined in es2022, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>) of the code points of _cuList_.
73-
1. Return _L_.
61+
1. Return ? TransformCase(_S_, _locales_, ~lower~).
7462
</emu-alg>
7563

76-
<p>
77-
Lower case code point mappings may be derived according to a tailored version of the Default Case Conversion Algorithms of the Unicode Standard. Implementations may use locale specific tailoring defined in SpecialCasings.txt and/or CLDR and/or any other custom tailoring.
78-
</p>
79-
80-
<emu-note>
81-
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both `toLocaleUpperCase` and `toLocaleLowerCase` have context-sensitive behaviour, the functions are not symmetrical. In other words, `s.toLocaleUpperCase().toLocaleLowerCase()` is not necessarily equal to `s.toLocaleLowerCase()`.
82-
</emu-note>
83-
8464
<emu-note>
8565
The `toLocaleLowerCase` function is intentionally generic; it does not require that its *this* value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.
8666
</emu-note>
67+
68+
<emu-clause id="sec-transform-case" type="abstract operation">
69+
<h1>
70+
TransformCase (
71+
_S_: a String,
72+
_locales_: an ECMAScript language value,
73+
_targetCase_: ~lower~ or ~upper~,
74+
)
75+
</h1>
76+
<dl class="header">
77+
<dt>description</dt>
78+
<dd>It interprets _S_ as a sequence of UTF-16 encoded code points, as described in <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>, and returns the result of implementation- and locale-dependent (ILD) transformation into _targetCase_ as a new String value.</dd>
79+
</dl>
80+
<emu-alg>
81+
1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_).
82+
1. If _requestedLocales_ is not an empty List, then
83+
1. Let _requestedLocale_ be _requestedLocales_[0].
84+
1. Else,
85+
1. Let _requestedLocale_ be ! DefaultLocale().
86+
1. Let _noExtensionsLocale_ be the String value that is _requestedLocale_ with any Unicode locale extension sequences (<emu-xref href="#sec-unicode-locale-extension-sequences"></emu-xref>) removed.
87+
1. Let _availableLocales_ be a List with language tags that includes the languages for which the Unicode Character Database contains language sensitive case mappings. Implementations may add additional language tags if they support case mapping for additional locales.
88+
1. Let _locale_ be ! BestAvailableLocale(_availableLocales_, _noExtensionsLocale_).
89+
1. If _locale_ is *undefined*, set _locale_ to *"und"*.
90+
1. Let _codePoints_ be ! StringToCodePoints(_S_).
91+
1. If _targetCase_ is ~lower~, then
92+
1. Let _newCodePoints_ be a List whose elements are the result of a lower case transformation of _codePoints_ according to an implementation-derived algorithm using _locale_ or the Unicode Default Case Conversion algorithm.
93+
1. Else,
94+
1. Assert: _targetCase_ is ~upper~.
95+
1. Let _newCodePoints_ be a List whose elements are the result of an upper case transformation of _codePoints_ according to an implementation-derived algorithm using _locale_ or the Unicode Default Case Conversion algorithm.
96+
1. Return ! CodePointsToString(_newCodePoints_).
97+
</emu-alg>
98+
99+
<p>
100+
Code point mappings may be derived according to a tailored version of the Default Case Conversion Algorithms of the Unicode Standard. Implementations may use locale-sensitive tailoring defined in the file <a href="https://unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt"><code>SpecialCasing.txt</code></a> of the Unicode Character Database and/or CLDR and/or any other custom tailoring. Regardless of tailoring, a conforming implementation's case transformation algorithm must always yield the same result given the same input code points, locale, and target case.
101+
</p>
102+
103+
<emu-note>
104+
The case mapping of some code points may produce multiple code points, and therefore the result may not be the same length as the input. Because both `toLocaleUpperCase` and `toLocaleLowerCase` have context-sensitive behaviour, the functions are not symmetrical. In other words, `s.toLocaleUpperCase().toLocaleLowerCase()` is not necessarily equal to `s.toLocaleLowerCase()` and `s.toLocaleLowerCase().toLocaleUpperCase()` is not necessarily equal to `s.toLocaleUpperCase()`.
105+
</emu-note>
106+
</emu-clause>
87107
</emu-clause>
88108

89109
<emu-clause id="sup-string.prototype.tolocaleuppercase">
@@ -94,9 +114,15 @@ <h1>String.prototype.toLocaleUpperCase ( [ _locales_ ] )</h1>
94114
</p>
95115

96116
<p>
97-
This function interprets a String value as a sequence of code points, as described in es2022, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>. This function behaves in exactly the same way as `String.prototype.toLocaleLowerCase`, except that characters are mapped to their _uppercase_ equivalents. A conforming implementation's upper case transformation algorithm must always yield the same result given the same sequence of code points and locale.
117+
This function interprets a String value as a sequence of code points, as described in es2022, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>. The following steps are taken:
98118
</p>
99119

120+
<emu-alg>
121+
1. Let _O_ be ? RequireObjectCoercible(*this* value).
122+
1. Let _S_ be ? ToString(_O_).
123+
1. Return ? TransformCase(_S_, _locales_, ~upper~).
124+
</emu-alg>
125+
100126
<emu-note>
101127
The `toLocaleUpperCase` function is intentionally generic; it does not require that its *this* value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.
102128
</emu-note>

spec/locales-currencies-tz.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,8 +153,8 @@ <h1>IsWellFormedCurrencyCode ( _currency_ )</h1>
153153

154154
<emu-alg>
155155
1. Let _normalized_ be the result of mapping _currency_ to upper case as described in <emu-xref href="#sec-case-sensitivity-and-case-mapping"></emu-xref>.
156-
1. If the number of elements in _normalized_ is not 3, return *false*.
157-
1. If _normalized_ contains any character that is not in the range *"A"* to *"Z"* (U+0041 to U+005A), return *false*.
156+
1. If the length of _normalized_ is not 3, return *false*.
157+
1. If _normalized_ contains any code unit outside of 0x0041 through 0x005A (corresponding to Unicode characters LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z), return *false*.
158158
1. Return *true*.
159159
</emu-alg>
160160
</emu-clause>
@@ -220,7 +220,7 @@ <h1>DefaultTimeZone ( )</h1>
220220
<h1>Measurement Unit Identifiers</h1>
221221

222222
<p>
223-
The ECMAScript 2022 Internationalization API Specification identifies measurement units using a <em>core unit identifier</em> as defined by <a href="https://unicode.org/reports/tr35/tr35-general.html#Unit_Elements">Unicode Technical Standard #35, Part 2, Section 6</a>. Their canonical form is a string containing all lowercase letters with zero or more hyphens.
223+
The ECMAScript 2022 Internationalization API Specification identifies measurement units using a <em>core unit identifier</em> as defined by <a href="https://unicode.org/reports/tr35/tr35-general.html#Unit_Elements">Unicode Technical Standard #35, Part 2, Section 6</a>. Their canonical form is a string containing only Unicode Basic Latin lower case letters (U+0061 LATIN SMALL LETTER A through U+007A LATIN SMALL LETTER Z) with zero or more medial hyphens (U+002D HYPHEN-MINUS).
224224
</p>
225225

226226
<p>

spec/numberformat.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ <h1>SetNumberFormatUnitOptions ( _intlObj_, _options_ )</h1>
176176
1. If the result of IsWellFormedUnitIdentifier(_unit_) is *false*, throw a *RangeError* exception.
177177
1. Let _unitDisplay_ be ? GetOption(_options_, *"unitDisplay"*, *"string"*, &laquo; *"short"*, *"narrow"*, *"long"* &raquo;, *"short"*).
178178
1. If _style_ is *"currency"*, then
179-
1. Let _currency_ be the result of converting _currency_ to upper case as specified in <emu-xref href="#sec-case-sensitivity-and-case-mapping"></emu-xref>.
179+
1. Let _currency_ be the result of mapping _currency_ to upper case as specified in <emu-xref href="#sec-case-sensitivity-and-case-mapping"></emu-xref>.
180180
1. Set _intlObj_.[[Currency]] to _currency_.
181181
1. Set _intlObj_.[[CurrencyDisplay]] to _currencyDisplay_.
182182
1. Set _intlObj_.[[CurrencySign]] to _currencySign_.

0 commit comments

Comments
 (0)