Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

Merged
merged 3 commits into from
Jan 24, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 46 additions & 5 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,9 @@ <h1>ApplyUnicodeExtensionToTag( _tag_, _options_, _relevantExtensionKeys_ )</h1>
1. Append the Record{[[Key]]: _key_, [[Value]]: _value_} to _keywords_.
1. Set _result_.[[<_key_>]] to _value_.
1. Let _locale_ be the String value that is _tag_ with all Unicode locale extension sequences removed.
1. Let _newExtension_ be the canonicalized Unicode BCP 47 U Extension based on _attributes_ and _keywords_ as defined in <a href="https://www.unicode.org/reports/tr35/#u_Extension">UTS #35 section 3.6</a>.
1. Let _newExtension_ be a Unicode BCP 47 U Extension based on _attributes_ and _keywords_.
1. If _newExtension_ is not the empty String, then
1. Let _locale_ be ? InsertUnicodeExtension(_locale_, _newExtension_).
1. Let _locale_ be ! InsertUnicodeExtensionAndCanonicalize(_locale_, _newExtension_).
1. Set _result_.[[locale]] to _locale_.
1. Return _result_.
</emu-alg>
Expand Down Expand Up @@ -135,10 +135,10 @@ <h1>UnicodeExtensionComponents( _extension_ )</h1>
</emu-alg>
</emu-clause>

<emu-clause id="sec-insert-unicode-extension" aoid=InsertUnicodeExtension>
<h1>InsertUnicodeExtension( _locale_, _extension_ )</h1>
<emu-clause id="sec-insert-unicode-extension-and-canonicalize" aoid=InsertUnicodeExtensionAndCanonicalize>
<h1>InsertUnicodeExtensionAndCanonicalize( _locale_, _extension_ )</h1>
<p>
The InsertUnicodeExtension abstract operation inserts _extension_, which must be a Unicode locale extension sequence, into _locale_, which must be a String value with a structurally valid and canonicalized BCP 47 language tag. The following steps are taken:
The InsertUnicodeExtensionAndCanonicalize abstract operation inserts _extension_, which must be a Unicode locale extension sequence, into _locale_, which must be a String value with a structurally valid and canonicalized BCP 47 language tag. The following steps are taken:
</p>
<p>
The following algorithm refers to <a href="https://www.unicode.org/reports/tr35/#Identifiers">UTS 35's Unicode Language and Locale Identifiers grammar</a>.
Expand Down Expand Up @@ -438,6 +438,47 @@ <h1>get Intl.Locale.prototype.region</h1>
<emu-clause id="sec-locale-modified-algorithms">
<h1>Modified algorithms</h1>

<emu-clause id="sec-canonicalizelanguagetag" aoid="CanonicalizeLanguageTag">
<h1>CanonicalizeLanguageTag ( _locale_ )</h1>

<p>
The CanonicalizeLanguageTag abstract operation returns the canonical and case-regularized form of the _locale_ argument (which must be a String value that is a structurally valid Unicode BCP 47 Locale Identifier as verified by the IsStructurallyValidLanguageTag abstract operation).
<del>A conforming implementation shall take the steps specified in the &ldquo;BCP 47 Language Tag to Unicode BCP 47 Locale Identifier&rdquo; algorithm, from <a href="https://unicode.org/reports/tr35/#BCP_47_Language_Tag_Conversion">Unicode Technical Standard #35 LDML § 3.3.1 BCP 47 Language Tag Conversion</a>.</del>
<ins>The following steps are taken:</ins>
</p>

<emu-alg>
1. <ins>Let _localeId_ be the string _locale_ after performing the steps specified in the &ldquo;<a href="https://www.unicode.org/reports/tr35/tr35.html#Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to Unicode BCP 47 Locale Identifier</a>&rdquo; algorithm, from <a href="https://unicode.org/reports/tr35/#BCP_47_Language_Tag_Conversion">Unicode Technical Standard #35 LDML § 3.3.1 BCP 47 Language Tag Conversion</a>, on it. (The result is a Unicode BCP 47 locale identifier, in canonical syntax but not necessarily in canonical form.)
1. <ins>Let _localeId_ be the string _localeId_ after performing the algorithm to <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">transform it to canonical form</a>. (The result is a Unicode BCP 47 locale identifier, in both canonical syntax and canonical form.)
1. <ins>If _localeId_ contains a substring _extension_ that is a Unicode locale extension sequence, then
1. <ins>Let _components_ be ! UnicodeExtensionComponents(_extension_).
1. <ins>Let _attributes_ be _components_.[[Attributes]].
1. <ins>Let _keywords_ be _components_.[[Keywords]].
1. <ins>Let _newExtension_ be `"u"`.
1. <ins>For each element _attr_ of _attributes_ in List order, do
1. <ins>Append `"-"` to _newExtension_.
1. <ins>Append _attr_ to _newExtension_.
1. <ins>For each element _keyword_ of _keywords_ in List order, do
1. <ins>Append `"-"` to _newExtension_.
1. <ins>Append _keyword_.[[Key]] to _newExtension_.
1. <ins>If _keyword_.[[Value]] is not the empty String, then
1. <ins>Append `"-"` to _newExtension_.
1. <ins>Append _keyword_.[[Value]] to _newExtension_.
1. <ins>Assert: _newExtension_ is not equal to `"u"`.
1. <ins>Let _localeId_ be _localeId_ with the substring corresponding to _extension_ replaced by the string _newExtension_.
1. <ins>Return _localeId_.
</emu-alg>

<emu-note>
<ins>The third step of this algorithm ensures that a Unicode locale extension sequence in the returned language tag contains:</ins>

<ul>
<li><ins>only the first instance of any attribute duplicated in the input, and</ins></li>
<li><ins>only the first keyword for a given key in the input.</ins></li>
</ul>
</emu-note>
</emu-clause>

<emu-clause id="sec-canonicalizelocalelist" aoid="CanonicalizeLocaleList">
<h1>CanonicalizeLocaleList ( _locales_ )</h1>

Expand Down