Skip to content

Commit

Permalink
Rewrite "UTF-8 sequence(s)" to "UTF-8 code point(s)"
Browse files Browse the repository at this point in the history
  • Loading branch information
emlun committed Feb 28, 2024
1 parent f337fce commit 8f89a8e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -4930,7 +4930,7 @@ Such truncation SHOULD also respect UTF-8 code point boundaries or [=grapheme cl
The resulting truncated value MAY be shorter than the chosen size limit
but MUST NOT be shorter than the longest prefix substring that satisfies the size limit and ends on a [=grapheme cluster=] boundary.

For example, in <a href="#fig-stringTruncation">figure <span class="figure-num-following"></span></a> the string is 65 bytes long. If truncating to 64 bytes then the final 0x88 byte must be removed purely because of space reasons. Since that leaves a partial UTF-8 sequence the remainder of that sequence may also be removed. Since that leaves a partial [=grapheme cluster=] an authenticator may remove the remainder of that cluster.
For example, in <a href="#fig-stringTruncation">figure <span class="figure-num-following"></span></a> the string is 65 bytes long. If truncating to 64 bytes then the final 0x88 byte must be removed purely because of space reasons. Since that leaves a partial UTF-8 code point the remainder of that code point may also be removed. Since that leaves a partial [=grapheme cluster=] an authenticator may remove the remainder of that cluster.

<figure id="fig-stringTruncation">
<img src="images/string-truncation.svg"></img>
Expand All @@ -4939,7 +4939,7 @@ For example, in <a href="#fig-stringTruncation">figure <span class="figure-num-f

[=Conforming User Agents=] are responsible for ensuring that the authenticator behavior observed by [=[RPS]=] conforms to this specification with respect to string handling. For example, if an authenticator is known to behave incorrectly when asked to store large strings, the user agent SHOULD perform the truncation for it in order to maintain the model from the point of view of the [=[RP]=]. User-agents that do this SHOULD truncate at [=grapheme cluster=] boundaries.

Truncation based on UTF-8 sequences alone may cause a [=grapheme cluster=] to be truncated. This could make the grapheme cluster render as a different glyph, potentially changing the meaning of the string, instead of removing the glyph entirely.
Truncation based on UTF-8 code points alone may cause a [=grapheme cluster=] to be truncated. This could make the grapheme cluster render as a different glyph, potentially changing the meaning of the string, instead of removing the glyph entirely.

In addition to that, truncating on byte boundaries alone causes a known issue that user agents should be aware of: if the authenticator is using [[!FIDO-CTAP]] then future messages from the authenticator may contain invalid CBOR since the value is typed as a CBOR string and thus is required to be valid UTF-8. User agents are tasked with handling this to avoid burdening authenticators with understanding character encodings and Unicode character properties. Thus, when dealing with [=authenticators=], user agents SHOULD:

Expand Down

0 comments on commit 8f89a8e

Please sign in to comment.