From e87711a448d11b33f94a524765278aa7bfed7852 Mon Sep 17 00:00:00 2001 From: Domenic Denicola Date: Wed, 3 Jun 2020 10:04:27 -0400 Subject: [PATCH] Fix and symmetrize "less than" and "prefix" We had a definition of (code unit) "prefix" for strings, but "starts with" for byte sequences, which led to usage errors within the corresponding "less than" algorithms. This makes "prefix" the primary operation for both strings and byte sequences, with dedicated s for "starts with" for both. Then, it fixes "less than" for byte sequences to use "prefix" instead of "starts with", which makes the algorithm correct and closes #309. Additionally, this fixes the code unit prefix algorithm to be correct instead of backward, and updates the variable names from "a" and "b" to "potentialPrefix" and "input" for clarity. --- infra.bs | 54 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/infra.bs b/infra.bs index 49747df..ef9761c 100644 --- a/infra.bs +++ b/infra.bs @@ -491,8 +491,9 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.
-

A byte sequence a starts with a -byte sequence b if the following steps return true: +

A byte sequence potentialPrefix is a +prefix of a byte sequence input if the +following steps return true:

  1. Let i be 0. @@ -501,32 +502,37 @@ contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.

    While true:

      -
    1. Let aByte be the ith byte of a if i is - less than a's length; otherwise null. +

    2. Let potentialPrefixByte be the ith byte of + potentialPrefix if i is less than potentialPrefix's + length; otherwise null. -

    3. Let bByte be the ith byte of b if i is - less than b's length; otherwise null. +

    4. Let inputByte be the ith byte of input if + i is less than input's length; otherwise null. -

    5. If bByte is null, then return true. +

    6. If potentialPrefixByte is null, then return true. -

    7. Return false if aByte is not bByte. +

    8. Return false if potentialPrefixByte is not inputByte.

    9. Set i to i + 1.

+

"input starts with potentialPrefix +can be used as a synonym for "potentialPrefix is a prefix of +input". +

A byte sequence a is byte less than a byte sequence b if the following steps return true:

    -
  1. If b starts with a, then return false. +

  2. If b is a prefix of a, then return false. -

  3. If a starts with b, then return true. +

  4. If a is a prefix of b, then return true.

  5. Let n be the smallest index such that the nth byte of a is different from the nth byte of b. (There has to be such an - index, since neither byte sequence starts with the other.) + index, since neither byte sequence is a prefix of the other.)

  6. If the nth byte of a is less than the nth byte of b, then return true. @@ -698,9 +704,8 @@ point encoding choices, such as normalization form or the order of combining mar are visually or even canonically equivalent according to Unicode might still not be identical to each other. [[HTML]] [[UNICODE]] -

    A string a is a -code unit prefix of a string b -if the following steps return true: +

    A string potentialPrefix is a code unit prefix of a +string input if the following steps return true:

    1. Let i be 0. @@ -709,15 +714,16 @@ if the following steps return true:

      While true:

        -
      1. Let aCodeUnit be the ith code unit of a if - i is less than a's length; otherwise null. +

      2. Let potentialPrefixCodeUnit be the ith code unit of + potentialPrefix if i is less than potentialPrefix's + length; otherwise null. -

      3. Let bCodeUnit be the ith code unit of b if - i is less than b's length; otherwise null. +

      4. Let inputCodeUnit be the ith code unit of input if + i is less than input's length; otherwise null. -

      5. If bCodeUnit is null, then return true. +

      6. If potentialPrefixCodeUnit is null, then return true. -

      7. Return false if aCodeUnit is different from bCodeUnit. +

      8. Return false if potentialPrefixCodeUnit is not inputCodeUnit.

      9. Set i to i + 1.

      @@ -726,12 +732,13 @@ if the following steps return true:

      When it is clear from context that code units are in play, e.g., because one of the strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~), -"a starts with b" can be used as a synonym for "b is a -code unit prefix of a". +"input starts with potentialPrefix" can be used +as a synonym for "potentialPrefix is a code unit prefix of input".

      With unknown values, it is good to be explicit: targetString is a code unit prefix of userInput. But with a -literal, we can use plainer language: userInput starts with "!". +literal, we can use plainer language: userInput starts with +"!".

      A string a is code unit less than a string b if the following steps return true: @@ -1548,6 +1555,7 @@ Aryeh Gregor, Chris Rebert, Daniel Ehrenberg, Dominic Farolino, +Gabriel Pivovarov, Jake Archibald, Jeff Hodges, Jungkee Song,