Fix percent-encoding for ISO-2022-JP

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
whatwg · Nov 2, 2020 · 2ce4938 · 2ce4938
1 parent 04f5bd0
commit 2ce4938
Showing 1 changed file with 42 additions and 53 deletions.
diff --git a/url.bs b/url.bs
@@ -217,81 +217,70 @@ inclusive, and U+007E (~).
 all code points, except the <a>ASCII alphanumeric</a>, U+002A (*), U+002D (-), U+002E (.), and
 U+005F (_).
 
-<p>To <dfn for="code point">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
-<var>encoding</var>, <a for=/>code point</a> <var>codePoint</var>, and a
-<var>percentEncodeSet</var>, run these steps:
+<p>To <dfn for=string>percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
+<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and an
+optional boolean <var>spaceAsPlus</var> (default false), run these steps:
 
 <ol>
- <li><p>Let <var>bytes</var> be the result of <a lt=encode>encoding</a> <var>codePoint</var> using
- <var>encoding</var>.
+ <li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.
 
- <li>
-  <p>If <var>bytes</var> starts with 0x26 (&amp;) 0x23 (#) and ends with 0x3B (;), then:
-
-  <ol>
-   <li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>.
+ <li><p>Let <var>inputQueue</var> be <var>input</var> converted to an <a for=/>I/O queue</a>.
 
-   <li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>".
-
-   <li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>".
-
-   <li><p>Return <var>output</var>.
-  </ol>
+ <li><p>Let <var>output</var> be the empty string.
 
-  <p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
+ <li>
+  <p>Let <var>potentialError</var> be 0.
 
- <li><p>Let <var>output</var> be the empty string.</p></li>
+  <p class=note>This needs to be a non-null value to initiate the subsequent while loop.
 
  <li>
-  <p>For each <var>byte</var> of <var>bytes</var>:
+  <p>While <var>potentialError</var> is non-null:
 
   <ol>
-   <li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
-   is <var>byte</var>'s <a for=byte>value</a>.
+   <li><p>Let <var>encodeOutput</var> be an empty <a for=/>I/O queue</a>.
 
-   <li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
+   <li><p>Set <var>potentialError</var> to the result of running <a>encode or fail</a> with
+   <var>inputQueue</var>, <var>encoder</var>, and <var>encodeOutput</var>.
 
-   <li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
-   <var>isomorph</var> to <var>output</var>.
+   <li>
+    <p>For each <var>byte</var> of <var>encodeOutput</var> converted to a byte sequence:
 
-   <li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
-   <var>output</var>.
-  </ol>
+    <ol>
+     <li><p>If <var>spaceAsPlus</var> is true and <var>byte</var> is 0x20 (SP), then append
+     U+002B (+) to <var>output</var>.
 
- <li><p>Return <var>output</var>.
-</ol>
+     <li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
+     is <var>byte</var>'s <a for=byte>value</a>.
 
-<p>To <dfn for="string">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
-<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and a
-boolean <var>spaceAsPlus</var>, run these steps:
+     <li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
 
-<ol>
- <li><p>Let <var>output</var> be the empty string.</p></li>
+     <li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
+     <var>isomorph</var> to <var>output</var>.
 
- <li>
-  <p>For each <var>codePoint</var> of <var>input</var>:
+     <li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
+     <var>output</var>.
+    </ol>
 
-  <ol>
-   <li><p>If <var>spaceAsPlus</var> is true and <var>codePoint</var> is U+0020, then append
-   U+002B (+) to <var>output</var>.
+   <li>
+    <p>If <var>potentialError</var> is non-null, then append "<code>%26%23</code>", followed by the
+    shortest sequence of <a for=/>ASCII digits</a> representing <var>potentialError</var> in base
+    ten, followed by "<code>%3B</code>", to <var>output</var>.
 
-   <li><p>Otherwise, run <a for="code point">percent-encode after encoding</a> with
-   <var>encoding</var>, <var>codePoint</var>, and <var>percentEncodeSet</var>, and append the result
-   to <var>output</var>.
+    <p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
   </ol>
 
  <li><p>Return <var>output</var>.
 </ol>
 
 <p>To <dfn for="code point" id=utf-8-percent-encode>UTF-8 percent-encode</dfn> a
 <a for=/>code point</a> <var>codePoint</var> using a <var>percentEncodeSet</var>, return the result
-of running <a for="code point">percent-encode after encoding</a> with <a for=/>UTF-8</a>,
-<var>codePoint</var>, and <var>percentEncodeSet</var>.
+of running <a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>,
+<var>codePoint</var> as a <a for=/>string</a>, and <var>percentEncodeSet</var>.
 
 <p>To <dfn export for=string>UTF-8 percent-encode</dfn> a <a for=/>string</a> <var>input</var> using
 a <var>percentEncodeSet</var>, return the result of running
-<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>,
-<var>percentEncodeSet</var>, and false.
+<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>, and
+<var>percentEncodeSet</var>.
 
 <hr>
 
@@ -319,20 +308,20 @@ a <var>percentEncodeSet</var>, return the result of running
    <td>"<code>‽%25%2E</code>"
    <td>0xE2 0x80 0xBD 0x25 0x2E
   <tr>
-   <td rowspan=3><a for="code point">Percent-encode after encoding</a> with <a>Shift_JIS</a>,
+   <td rowspan=3><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>,
    <var>input</var>, and the <a>userinfo percent-encode set</a>
-   <td>U+0020
+   <td>"<code> </code>"
    <td>"<code>%20</code>"
   <tr>
-   <td>U+2261 (≡)
+   <td>"<code>≡</code>"
    <td>"<code>%81%DF</code>"
   <tr>
-   <td>U+203D (‽)
+   <td>"<code>‽</code>"
    <td>"<code>%26%238253%3B</code>"
   <tr>
-   <td><a for="code point">Percent-encode after encoding</a> with <a>ISO-2022-JP</a>,
-   <var>input</var>, and the <a>userinfo percent-encode set</a>
-   <td>U+00A5 (¥)
+   <td><a for=string>Percent-encode after encoding</a> with <a>ISO-2022-JP</a>, <var>input</var>,
+   and the <a>userinfo percent-encode set</a>
+   <td>"<code>¥</code>"
    <td>"<code>%1B(J\%1B(B</code>"
   <tr>
    <td><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>, <var>input</var>, the