Skip to content

Commit

Permalink
Fix and symmetrize "less than" and "prefix"
Browse files Browse the repository at this point in the history
We had a definition of (code unit) "prefix" for strings, but "starts
with" for byte sequences, which led to usage errors within the
corresponding "less than" algorithms.

This makes "prefix" the primary operation for both strings and byte
sequences, with dedicated <dfn>s for "starts with" for both. Then, it
fixes "less than" for byte sequences to use "prefix" instead of "starts
with", which makes the algorithm correct and closes #309.

Additionally, this fixes the code unit prefix algorithm to be correct
instead of backward, and updates the variable names from "a" and "b" to
"potentialPrefix" and "input" for clarity.
  • Loading branch information
domenic committed Jun 3, 2020
1 parent 87650ca commit e87711a
Showing 1 changed file with 31 additions and 23 deletions.
54 changes: 31 additions & 23 deletions infra.bs
Expand Up @@ -491,8 +491,9 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.

<hr>

<p>A <a>byte sequence</a> <var>a</var> <dfn export for="byte sequence">starts with</dfn> a
<a>byte sequence</a> <var>b</var> if the following steps return true:
<p>A <a>byte sequence</a> <var>potentialPrefix</var> is a
<dfn export for="byte sequence">prefix</dfn> of a <a>byte sequence</a> <var>input</var> if the
following steps return true:

<ol>
<li><p>Let <var>i</var> be 0.
Expand All @@ -501,32 +502,37 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.
<p><a>While</a> true:

<ol>
<li><p>Let <var>aByte</var> be the <var>i</var>th <a>byte</a> of <var>a</var> if <var>i</var> is
less than <var>a</var>'s <a for="byte sequence">length</a>; otherwise null.
<li><p>Let <var>potentialPrefixByte</var> be the <var>i</var>th <a>byte</a> of
<var>potentialPrefix</var> if <var>i</var> is less than <var>potentialPrefix</var>'s
<a for="byte sequence">length</a>; otherwise null.

<li><p>Let <var>bByte</var> be the <var>i</var>th <a>byte</a> of <var>b</var> if <var>i</var> is
less than <var>b</var>'s <a for="byte sequence">length</a>; otherwise null.
<li><p>Let <var>inputByte</var> be the <var>i</var>th <a>byte</a> of <var>input</var> if
<var>i</var> is less than <var>input</var>'s <a for="byte sequence">length</a>; otherwise null.

<li><p>If <var>bByte</var> is null, then return true.
<li><p>If <var>potentialPrefixByte</var> is null, then return true.

<li><p>Return false if <var>aByte</var> is not <var>bByte</var>.
<li><p>Return false if <var>potentialPrefixByte</var> is not <var>inputByte</var>.

<li><p>Set <var>i</var> to <var>i</var> + 1.
</ol>
</li>
</ol>

<p>"<var>input</var> <dfn export for="byte sequence">starts with</dfn> <var>potentialPrefix</var>
can be used as a synonym for "<var>potentialPrefix</var> is a <a for="byte sequence">prefix</a> of
<var>input</var>".

<p>A <a>byte sequence</a> <var>a</var> is <dfn export>byte less than</dfn> a <a>byte sequence</a>
<var>b</var> if the following steps return true:

<ol>
<li><p>If <var>b</var> <a for="byte sequence">starts with</a> <var>a</var>, then return false.
<li><p>If <var>b</var> is a <a for="byte sequence">prefix</a> of <var>a</var>, then return false.

<li><p>If <var>a</var> <a for="byte sequence">starts with</a> <var>b</var>, then return true.
<li><p>If <var>a</var> is a <a for="byte sequence">prefix</a> of <var>b</var>, then return true.

<li><p>Let <var>n</var> be the smallest index such that the <var>n</var>th <a>byte</a> of
<var>a</var> is different from the <var>n</var>th byte of <var>b</var>. (There has to be such an
index, since neither byte sequence starts with the other.)
index, since neither byte sequence is a prefix of the other.)

<li><p>If the <var>n</var>th byte of <var>a</var> is less than the <var>n</var>th byte of
<var>b</var>, then return true.
Expand Down Expand Up @@ -698,9 +704,8 @@ point encoding choices, such as normalization form or the order of combining mar
are visually or even canonically equivalent according to Unicode might still not be
<a for=string>identical to</a> each other. [[HTML]] [[UNICODE]]

<p>A <a>string</a> <var>a</var> is a
<dfn export lt="code unit prefix|starts with">code unit prefix</dfn> of a <a>string</a> <var>b</var>
if the following steps return true:
<p>A <a>string</a> <var>potentialPrefix</var> is a <dfn export>code unit prefix</dfn> of a
<a>string</a> <var>input</var> if the following steps return true:

<ol>
<li><p>Let <var>i</var> be 0.
Expand All @@ -709,15 +714,16 @@ if the following steps return true:
<p><a>While</a> true:

<ol>
<li><p>Let <var>aCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>a</var> if
<var>i</var> is less than <var>a</var>'s <a for=string>length</a>; otherwise null.
<li><p>Let <var>potentialPrefixCodeUnit</var> be the <var>i</var>th <a>code unit</a> of
<var>potentialPrefix</var> if <var>i</var> is less than <var>potentialPrefix</var>'s
<a for=string>length</a>; otherwise null.

<li><p>Let <var>bCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>b</var> if
<var>i</var> is less than <var>b</var>'s <a for=string>length</a>; otherwise null.
<li><p>Let <var>inputCodeUnit</var> be the <var>i</var>th <a>code unit</a> of <var>input</var> if
<var>i</var> is less than <var>input</var>'s <a for=string>length</a>; otherwise null.

<li><p>If <var>bCodeUnit</var> is null, then return true.
<li><p>If <var>potentialPrefixCodeUnit</var> is null, then return true.

<li><p>Return false if <var>aCodeUnit</var> is different from <var>bCodeUnit</var>.
<li><p>Return false if <var>potentialPrefixCodeUnit</var> is not <var>inputCodeUnit</var>.

<li><p>Set <var>i</var> to <var>i</var> + 1.
</ol>
Expand All @@ -726,12 +732,13 @@ if the following steps return true:

<p>When it is clear from context that <a>code units</a> are in play, e.g., because one of the
strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~),
"<var>a</var> starts with <var>b</var>" can be used as a synonym for "<var>b</var> is a
<a>code unit prefix</a> of <var>a</var>".
"<var>input</var> <dfn export for="string">starts with</dfn> <var>potentialPrefix</var>" can be used
as a synonym for "<var>potentialPrefix</var> is a <a>code unit prefix</a> of <var>input</var>".

<p class=example id=code-unit-prefix-example>With unknown values, it is good to be explicit:
<var ignore>targetString</var> is a <a>code unit prefix</a> of <var>userInput</var>. But with a
literal, we can use plainer language: <var>userInput</var> starts with "<code>!</code>".
literal, we can use plainer language: <var>userInput</var> <a for="string">starts with</a>
"<code>!</code>".

<p>A <a>string</a> <var>a</var> is <dfn export>code unit less than</dfn> a <a>string</a>
<var>b</var> if the following steps return true:
Expand Down Expand Up @@ -1548,6 +1555,7 @@ Aryeh Gregor,
Chris Rebert,
Daniel Ehrenberg,
Dominic Farolino,
Gabriel Pivovarov,
Jake Archibald,
Jeff Hodges,
Jungkee Song,
Expand Down

0 comments on commit e87711a

Please sign in to comment.