tc39 · zbraniecki · Jan 24, 2020 · Nov 18, 2019 · Nov 18, 2019 · Jan 24, 2020
diff --git a/spec.html b/spec.html
@@ -88,9 +88,9 @@ <h1>ApplyUnicodeExtensionToTag( _tag_, _options_, _relevantExtensionKeys_ )</h1>
               1. Append the Record{[[Key]]: _key_, [[Value]]: _value_} to _keywords_.
           1. Set _result_.[[<_key_>]] to _value_.
         1. Let _locale_ be the String value that is _tag_ with all Unicode locale extension sequences removed.
-        1. Let _newExtension_ be the canonicalized Unicode BCP 47 U Extension based on _attributes_ and _keywords_ as defined in <a href="https://www.unicode.org/reports/tr35/#u_Extension">UTS #35 section 3.6</a>.
+        1. Let _newExtension_ be a Unicode BCP 47 U Extension based on _attributes_ and _keywords_.
         1. If _newExtension_ is not the empty String, then
-          1. Let _locale_ be ? InsertUnicodeExtension(_locale_, _newExtension_).
+          1. Let _locale_ be ! InsertUnicodeExtensionAndCanonicalize(_locale_, _newExtension_).
         1. Set _result_.[[locale]] to _locale_.
         1. Return _result_.
       </emu-alg>
@@ -135,10 +135,10 @@ <h1>UnicodeExtensionComponents( _extension_ )</h1>
       </emu-alg>
     </emu-clause>
 
-    <emu-clause id="sec-insert-unicode-extension" aoid=InsertUnicodeExtension>
-      <h1>InsertUnicodeExtension( _locale_, _extension_ )</h1>
+    <emu-clause id="sec-insert-unicode-extension-and-canonicalize" aoid=InsertUnicodeExtensionAndCanonicalize>
+      <h1>InsertUnicodeExtensionAndCanonicalize( _locale_, _extension_ )</h1>
       <p>
-        The InsertUnicodeExtension abstract operation inserts _extension_, which must be a Unicode locale extension sequence, into _locale_, which must be a String value with a structurally valid and canonicalized BCP 47 language tag. The following steps are taken:
+        The InsertUnicodeExtensionAndCanonicalize abstract operation inserts _extension_, which must be a Unicode locale extension sequence, into _locale_, which must be a String value with a structurally valid and canonicalized BCP 47 language tag. The following steps are taken:
       </p>
       <p>
         The following algorithm refers to <a href="https://www.unicode.org/reports/tr35/#Identifiers">UTS 35's Unicode Language and Locale Identifiers grammar</a>.
@@ -438,6 +438,47 @@ <h1>get Intl.Locale.prototype.region</h1>
 <emu-clause id="sec-locale-modified-algorithms">
   <h1>Modified algorithms</h1>
 
+    <emu-clause id="sec-canonicalizelanguagetag" aoid="CanonicalizeLanguageTag">
+      <h1>CanonicalizeLanguageTag ( _locale_ )</h1>
+
+      <p>
+        The CanonicalizeLanguageTag abstract operation returns the canonical and case-regularized form of the _locale_ argument (which must be a String value that is a structurally valid Unicode BCP 47 Locale Identifier as verified by the IsStructurallyValidLanguageTag abstract operation).
+        <del>A conforming implementation shall take the steps specified in the &ldquo;BCP 47 Language Tag to Unicode BCP 47 Locale Identifier&rdquo; algorithm, from <a href="https://unicode.org/reports/tr35/#BCP_47_Language_Tag_Conversion">Unicode Technical Standard #35 LDML § 3.3.1 BCP 47 Language Tag Conversion</a>.</del>
+        <ins>The following steps are taken:</ins>
+      </p>
+
+      <emu-alg>
+        1. <ins>Let _localeId_ be the string _locale_ after performing the steps specified in the &ldquo;<a href="https://www.unicode.org/reports/tr35/tr35.html#Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to Unicode BCP 47 Locale Identifier</a>&rdquo; algorithm, from <a href="https://unicode.org/reports/tr35/#BCP_47_Language_Tag_Conversion">Unicode Technical Standard #35 LDML § 3.3.1 BCP 47 Language Tag Conversion</a>, on it.  (The result is a Unicode BCP 47 locale identifier, in canonical syntax but not necessarily in canonical form.)
+        1. <ins>Let _localeId_ be the string _localeId_ after performing the algorithm to <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">transform it to canonical form</a>.  (The result is a Unicode BCP 47 locale identifier, in both canonical syntax and canonical form.)
+        1. <ins>If _localeId_ contains a substring _extension_ that is a Unicode locale extension sequence, then
+          1. <ins>Let _components_ be ! UnicodeExtensionComponents(_extension_).
+          1. <ins>Let _attributes_ be _components_.[[Attributes]].
+          1. <ins>Let _keywords_ be _components_.[[Keywords]].
+          1. <ins>Let _newExtension_ be `"u"`.
+          1. <ins>For each element _attr_ of _attributes_ in List order, do
+            1. <ins>Append `"-"` to _newExtension_.
+            1. <ins>Append _attr_ to _newExtension_.
+          1. <ins>For each element _keyword_ of _keywords_ in List order, do
+            1. <ins>Append `"-"` to _newExtension_.
+            1. <ins>Append _keyword_.[[Key]] to _newExtension_.
+            1. <ins>If _keyword_.[[Value]] is not the empty String, then
+              1. <ins>Append `"-"` to _newExtension_.
+              1. <ins>Append _keyword_.[[Value]] to _newExtension_.
+          1. <ins>Assert: _newExtension_ is not equal to `"u"`.
+          1. <ins>Let _localeId_ be _localeId_ with the substring corresponding to _extension_ replaced by the string _newExtension_.
+        1. <ins>Return _localeId_.
+      </emu-alg>
+
+      <emu-note>
+        <ins>The third step of this algorithm ensures that a Unicode locale extension sequence in the returned language tag contains:</ins>
+
+        <ul>
+          <li><ins>only the first instance of any attribute duplicated in the input, and</ins></li>
+          <li><ins>only the first keyword for a given key in the input.</ins></li>
+        </ul>
+      </emu-note>
+    </emu-clause>
+
     <emu-clause id="sec-canonicalizelocalelist" aoid="CanonicalizeLocaleList">
       <h1>CanonicalizeLocaleList ( _locales_ )</h1>