diff --git a/index.html b/index.html index 9a496ab..92ae9cb 100644 --- a/index.html +++ b/index.html @@ -1385,7 +1385,7 @@

String Matching of Syntactic Content in Document Formats and Protocols

The Matching Algorithm

This section defines the algorithm for matching strings in a formal language or syntax. Specifications need to specify certain options called out below. Recommendations are provided for best practices in the sub-sections below.

    -
  1. Convert the strings to be compared to a sequence of Unicode code points. This might entail transcoding from a legacy character encoding.
  2. +
  3. Convert the strings to be compared to a sequence of Unicode code points. This might entail transcoding from a legacy character encoding.
  4. Expand all character escapes and includes.

  5. @@ -1590,7 +1590,7 @@

    Additional Match Tailoring

    [S] Specificiations MUST clearly define any additional tailoring done as part of the matching process.

    -

    Some specifications might wish to include additional tailoring to assist with matching in a given vocabulary. Examples of this might include removing additional textual differences described in Section 2, mapping together or removing characters that are part of the syntax, or performing a whitespace trim.

    +

    Some specifications might wish to include additional tailoring to assist with matching in a given vocabulary. Examples of this might include removing additional textual differences described in Section 2, mapping together or removing characters that are part of the syntax, or performing a whitespace trim.

    Any additional tailoring needs to avoid interfering with the way that different languages are represented in Unicode. For example, a process that attempts to remove accents from letters by decomposing the text and then removing all of the combining characters will break languages that rely on combining marks. An example of this would be as the Devanagari text in Example 2. (Such a process would also fail to remove all of the potential accents and probably do harm to the meaning and representation of the text.)