Specify language and direction encoding. #1530

agl · 2020-12-02T22:14:11Z

ve7jtb

Looks good to me. Authenticators with internal displays are going to have some work to do, however.

w3c/webauthn#1530 specifies a method for encoding language and direction information in strings that will be stored on authenticators. Since we may start to see this in credentials in the wild, this change adds examples to the UI test so that we see what they'll look like. (On Linux, at least, they at least don't render as garbage or crash. However the zh-Hant example doesn't display, possibly because I'm missing suitable fonts.) Change-Id: I9e778a08df85b8ae34032341c6bcd84291ff210c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2576795 Commit-Queue: Martin Kreichgauer <martinkr@google.com> Auto-Submit: Adam Langley <agl@chromium.org> Reviewed-by: Martin Kreichgauer <martinkr@google.com> Cr-Commit-Position: refs/heads/master@{#834370}

equalsJeffH

LGTM, thx @agl!

nadalin · 2020-12-09T20:10:24Z

@aphillips Please review

equalsJeffH · 2020-12-09T20:11:36Z

on 2020-12-09 call: @nadalin to chase-up @aphillips for review.

aphillips · 2020-12-09T20:49:30Z

index.bs


-If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.
+Each arbitrary string in the API will have some accommodation for the potentially limited resources available to an [=authenticator=]. If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.


Such truncation MAY also respect

I would suggest that this should be SHOULD, since arbitrary byte truncation is harmful and since UTF-8 sequence boundary truncation is easily accomplished using a bitmask. SHOULD still permits truly minimal authenticators to use byte boundaries. Because grapheme cluster boundaries is more complex, probably the resulting requirement should be:

Such truncation SHOULD also respect UTF-8 sequence boundaries and MAY also choose to respect grapheme cluster boundaries...

Given the language/direction encoding below, should the first step here be: remove the language/direction metadata? (This is one reason I suggested suffixing at least the language metadata)

Changed to SHOULD, although I don't expect any impact: there are millions of fixed-function devices already out in the world and I understand that even small features like this are troublesome. But perhaps some would be more likely to do the right thing.

aphillips · 2020-12-09T20:59:10Z

index.bs

@@ -4007,6 +4019,23 @@ In addition to that, truncating on byte boundaries alone causes a known issue th
 1. Ensure that any strings sent to authenticators are validly encoded.
 1. Handle the case where strings have been truncated resulting in an invalid encoding. For example, any partial code point at the end may be dropped or replaced with [U+FFFD](http://unicode.org/cldr/utility/character.jsp?a=FFFD).

+### Language and Direction Encoding ### {#sctn-strings-langdir}
+
+In order to be correctly displayed in context, the language and base direction of a string [may be required](https://www.w3.org/TR/string-meta/#why-is-this-important). Strings in this API may have to be written to fixed-function [=authenticators=] and then later read back and displayed on a different platform. Thus language and direction metadata is encoded in the string itself to ensure that it is transported atomically.


I'm curious why you chose prefixing versus suffixing. With prefixing, one runs the risk of losing content due to truncation. Suffixing would be least harmful this way? I'll admit that having the RLM/LRM before the text is more helpful, though.

The other thing that should be considered (and our document string-meta calls this out) is that applications should avoid adding additional bidi controls if the required controls already exist on the string.

I assumed that text libraries would require that such things came first in the string but, if not, suffixing acts better under truncation as you note. However, truncation becomes more complex because a language tag could be truncated into another valid language. For example, “ar-SA” into “ar”. That might be even worse than not knowing the language.

Thus, in order that the consumer can know whether the language has been truncated or not I have made either a directionality tag or U+E007F mandatory since it then acts as a terminator for the language tag. I've noted that a directionality tag should only be used if necessary to produce the correct result.

@aphillips Are you ok with Adam's response ?

@aphillips This is holding us from submitting for CR, so we would like to get this closed, please review

equalsJeffH

@agl has updated this PR per @aphillips' futher comments, i.e., switched to encoding the language and directionality as suffixed information, and indicated that directionality indicator should only be used when necessary to produce the correct result.

since we've addressed his outstanding comments, I think, as a suggestion, we can go ahead and merge this if @aphillips has not reviewed it by the end of day Monday.

nadalin · 2020-12-14T00:54:22Z

@equalsJeffH OK, if any one objects with this approach please speak up before EOB on 12/14/2020

nadalin · 2020-12-15T01:45:36Z

@equalsJeffH Please merge

Specify language and direction encoding.

7837936

Fixes w3c#1526.

agl requested a review from aphillips December 2, 2020 22:14

ve7jtb approved these changes Dec 2, 2020

View reviewed changes

equalsJeffH approved these changes Dec 9, 2020

View reviewed changes

equalsJeffH assigned agl Dec 9, 2020

aphillips reviewed Dec 9, 2020

View reviewed changes

Address comments

48f64df

equalsJeffH approved these changes Dec 14, 2020

View reviewed changes

nadalin approved these changes Dec 14, 2020

View reviewed changes

agl merged commit f180c7a into w3c:master Dec 15, 2020

agl deleted the i18n3 branch December 15, 2020 17:25

WebAuthnBot pushed a commit that referenced this pull request Dec 15, 2020

Built by Travis-CI: f180c7a Merge pull request #1530 from agl/i18n3

cc256c1

WebAuthnBot pushed a commit that referenced this pull request Dec 15, 2020

Built by Travis-CI: f180c7a Merge pull request #1530 from agl/i18n3

0fc63b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify language and direction encoding. #1530

Specify language and direction encoding. #1530

agl commented Dec 2, 2020 •

edited by pr-preview bot

Loading

ve7jtb left a comment

equalsJeffH left a comment

nadalin commented Dec 9, 2020

equalsJeffH commented Dec 9, 2020

aphillips Dec 9, 2020

aphillips Dec 9, 2020

agl Dec 10, 2020

aphillips Dec 9, 2020

agl Dec 10, 2020

nadalin Dec 11, 2020

nadalin Dec 13, 2020

equalsJeffH left a comment •

edited

Loading

nadalin commented Dec 14, 2020

nadalin commented Dec 15, 2020


		If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.
		Each arbitrary string in the API will have some accommodation for the potentially limited resources available to an [=authenticator=]. If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.

Specify language and direction encoding. #1530

Specify language and direction encoding. #1530

Conversation

agl commented Dec 2, 2020 • edited by pr-preview bot Loading

ve7jtb left a comment

Choose a reason for hiding this comment

equalsJeffH left a comment

Choose a reason for hiding this comment

nadalin commented Dec 9, 2020

equalsJeffH commented Dec 9, 2020

aphillips Dec 9, 2020

Choose a reason for hiding this comment

aphillips Dec 9, 2020

Choose a reason for hiding this comment

agl Dec 10, 2020

Choose a reason for hiding this comment

aphillips Dec 9, 2020

Choose a reason for hiding this comment

agl Dec 10, 2020

Choose a reason for hiding this comment

nadalin Dec 11, 2020

Choose a reason for hiding this comment

nadalin Dec 13, 2020

Choose a reason for hiding this comment

equalsJeffH left a comment • edited Loading

Choose a reason for hiding this comment

nadalin commented Dec 14, 2020

nadalin commented Dec 15, 2020

agl commented Dec 2, 2020 •

edited by pr-preview bot

Loading

equalsJeffH left a comment •

edited

Loading