From e87711a448d11b33f94a524765278aa7bfed7852 Mon Sep 17 00:00:00 2001
From: Domenic Denicola <d@domenic.me>
Date: Wed, 3 Jun 2020 10:04:27 -0400
Subject: [PATCH] Fix and symmetrize "less than" and "prefix"

We had a definition of (code unit) "prefix" for strings, but "starts
with" for byte sequences, which led to usage errors within the
corresponding "less than" algorithms.

This makes "prefix" the primary operation for both strings and byte
sequences, with dedicated <dfn>s for "starts with" for both. Then, it
fixes "less than" for byte sequences to use "prefix" instead of "starts
with", which makes the algorithm correct and closes #309.

Additionally, this fixes the code unit prefix algorithm to be correct
instead of backward, and updates the variable names from "a" and "b" to
"potentialPrefix" and "input" for clarity.
---
 infra.bs | 54 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 23 deletions(-)
diff --git a/infra.bs b/infra.bs
index 49747df..ef9761c 100644
--- a/infra.bs
+++ b/infra.bs
@@ -491,8 +491,9 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.
 
 <hr>
 
-<p>A <a>byte sequence</a> <var>a</var> <dfn export for="byte sequence">starts with</dfn> a
-<a>byte sequence</a> <var>b</var> if the following steps return true:
+<p>A <a>byte sequence</a> <var>potentialPrefix</var> is a
+<dfn export for="byte sequence">prefix</dfn> of a <a>byte sequence</a> <var>input</var> if the
+following steps return true:
 
 <ol>
  <li><p>Let <var>i</var> be 0.
@@ -501,32 +502,37 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.
   <p><a>While</a> true:
 
   <ol>
-   <li><p>Let <var>aByte</var> be the <var>i</var>th <a>byte</a> of <var>a</var> if <var>i</var> is
-   less than <var>a</var>'s <a for="byte sequence">length</a>; otherwise null.
+   <li><p>Let <var>potentialPrefixByte</var> be the <var>i</var>th <a>byte</a> of
+   <var>potentialPrefix</var> if <var>i</var> is less than <var>potentialPrefix</var>'s
+   <a for="byte sequence">length</a>; otherwise null.
 
-   <li><p>Let <var>bByte</var> be the <var>i</var>th <a>byte</a> of <var>b</var> if <var>i</var> is
-   less than <var>b</var>'s <a for="byte sequence">length</a>; otherwise null.
+   <li><p>Let <var>inputByte</var> be the <var>i</var>th <a>byte</a> of <var>input</var> if
+   <var>i</var> is less than <var>input</var>'s <a for="byte sequence">length</a>; otherwise null.
 
-   <li><p>If <var>bByte</var> is null, then return true.
+   <li><p>If <var>potentialPrefixByte</var> is null, then return true.
 
-   <li><p>Return false if <var>aByte</var> is not <var>bByte</var>.
+   <li><p>Return false if <var>potentialPrefixByte</var> is not <var>inputByte</var>.
 
    <li><p>Set <var>i</var> to <var>i</var> + 1.
   </ol>
  </li>
 </ol>
 
+<p>"<var>input</var> <dfn export for="byte sequence">starts with</dfn> <var>potentialPrefix</var>
+can be used as a synonym for "<var>potentialPrefix</var> is a <a for="byte sequence">prefix</a> of
+<var>input</var>".
+
 <p>A <a>byte sequence</a> <var>a</var> is <dfn export>byte less than</dfn> a <a>byte sequence</a>
 <var>b</var> if the following steps return true:
 
 <ol>
- <li><p>If <var>b</var> <a for="byte sequence">starts with</a> <var>a</var>, then return false.
+ <li><p>If <var>b</var> is a <a for="byte sequence">prefix</a> of <var>a</var>, then return false.
 
- <li><p>If <var>a</var> <a for="byte sequence">starts with</a> <var>b</var>, then return true.
+ <li><p>If <var>a</var> is a <a for="byte sequence">prefix</a> of <var>b</var>, then return true.
 
  <li><p>Let <var>n</var> be the smallest index such that the <var>n</var>th <a>byte</a> of
  <var>a</var> is different from the <var>n</var>th byte of <var>b</var>. (There has to be such an
- index, since neither byte sequence starts with the other.)
+ index, since neither byte sequence is a prefix of the other.)
 
  <li><p>If the <var>n</var>th byte of <var>a</var> is less than the <var>n</var>th byte of
  <var>b</var>, then return true.
@@ -698,9 +704,8 @@ point encoding choices, such as normalization form or the order of combining mar
 are visually or even canonically equivalent according to Unicode might still not be
 <a for=string>identical to</a> each other. [[HTML]] [[UNICODE]]
 
-<p>A <a>string</a> <var>a</var> is a
-<dfn export lt="code unit prefix|starts with">code unit prefix</dfn> of a <a>string</a> <var>b</var>
-if the following steps return true:
+<p>A <a>string</a> <var>potentialPrefix</var> is a <dfn export>code unit prefix</dfn> of a
+<a>string</a> <var>input</var> if the following steps return true:
 
 <ol>
  <li><p>Let <var>i</var> be 0.
@@ -709,15 +714,16 @@ if the following steps return true:
   <p><a>While</a> true:
 
   <ol>
-   <li><p>Let <var>aCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>a</var> if
-   <var>i</var> is less than <var>a</var>'s <a for=string>length</a>; otherwise null.
+   <li><p>Let <var>potentialPrefixCodeUnit</var> be the <var>i</var>th <a>code unit</a> of
+   <var>potentialPrefix</var> if <var>i</var> is less than <var>potentialPrefix</var>'s
+   <a for=string>length</a>; otherwise null.
 
-   <li><p>Let <var>bCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>b</var> if
-   <var>i</var> is less than <var>b</var>'s <a for=string>length</a>; otherwise null.
+   <li><p>Let <var>inputCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>input</var> if
+   <var>i</var> is less than <var>input</var>'s <a for=string>length</a>; otherwise null.
 
-   <li><p>If <var>bCodeUnit</var> is null, then return true.
+   <li><p>If <var>potentialPrefixCodeUnit</var> is null, then return true.
 
-   <li><p>Return false if <var>aCodeUnit</var> is different from <var>bCodeUnit</var>.
+   <li><p>Return false if <var>potentialPrefixCodeUnit</var> is not <var>inputCodeUnit</var>.
 
    <li><p>Set <var>i</var> to <var>i</var> + 1.
   </ol>
@@ -726,12 +732,13 @@ if the following steps return true:
 
 <p>When it is clear from context that <a>code units</a> are in play, e.g., because one of the
 strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~),
-"<var>a</var> starts with <var>b</var>" can be used as a synonym for "<var>b</var> is a
-<a>code unit prefix</a> of <var>a</var>".
+"<var>input</var> <dfn export for="string">starts with</dfn> <var>potentialPrefix</var>" can be used
+as a synonym for "<var>potentialPrefix</var> is a <a>code unit prefix</a> of <var>input</var>".
 
 <p class=example id=code-unit-prefix-example>With unknown values, it is good to be explicit:
 <var ignore>targetString</var> is a <a>code unit prefix</a> of <var>userInput</var>. But with a
-literal, we can use plainer language: <var>userInput</var> starts with "<code>!</code>".
+literal, we can use plainer language: <var>userInput</var> <a for="string">starts with</a>
+"<code>!</code>".
 
 <p>A <a>string</a> <var>a</var> is <dfn export>code unit less than</dfn> a <a>string</a>
 <var>b</var> if the following steps return true:
@@ -1548,6 +1555,7 @@ Aryeh Gregor,
 Chris Rebert,
 Daniel Ehrenberg,
 Dominic Farolino,
+Gabriel Pivovarov,
 Jake Archibald,
 Jeff Hodges,
 Jungkee Song,