-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify language and direction encoding. #1530
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Authenticators with internal displays are going to have some work to do, however.
w3c/webauthn#1530 specifies a method for encoding language and direction information in strings that will be stored on authenticators. Since we may start to see this in credentials in the wild, this change adds examples to the UI test so that we see what they'll look like. (On Linux, at least, they at least don't render as garbage or crash. However the zh-Hant example doesn't display, possibly because I'm missing suitable fonts.) Change-Id: I9e778a08df85b8ae34032341c6bcd84291ff210c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2576795 Commit-Queue: Martin Kreichgauer <martinkr@google.com> Auto-Submit: Adam Langley <agl@chromium.org> Reviewed-by: Martin Kreichgauer <martinkr@google.com> Cr-Commit-Position: refs/heads/master@{#834370}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx @agl!
@aphillips Please review |
on 2020-12-09 call: @nadalin to chase-up @aphillips for review. |
index.bs
Outdated
|
||
If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further. | ||
Each arbitrary string in the API will have some accommodation for the potentially limited resources available to an [=authenticator=]. If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such truncation MAY also respect
I would suggest that this should be SHOULD
, since arbitrary byte truncation is harmful and since UTF-8 sequence boundary truncation is easily accomplished using a bitmask. SHOULD
still permits truly minimal authenticators to use byte boundaries. Because grapheme cluster boundaries is more complex, probably the resulting requirement should be:
Such truncation SHOULD also respect UTF-8 sequence boundaries and MAY also choose to respect
grapheme cluster
boundaries...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the language/direction encoding below, should the first step here be: remove the language/direction metadata? (This is one reason I suggested suffixing at least the language metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to SHOULD, although I don't expect any impact: there are millions of fixed-function devices already out in the world and I understand that even small features like this are troublesome. But perhaps some would be more likely to do the right thing.
@@ -4007,6 +4019,23 @@ In addition to that, truncating on byte boundaries alone causes a known issue th | |||
1. Ensure that any strings sent to authenticators are validly encoded. | |||
1. Handle the case where strings have been truncated resulting in an invalid encoding. For example, any partial code point at the end may be dropped or replaced with [U+FFFD](http://unicode.org/cldr/utility/character.jsp?a=FFFD). | |||
|
|||
### Language and Direction Encoding ### {#sctn-strings-langdir} | |||
|
|||
In order to be correctly displayed in context, the language and base direction of a string [may be required](https://www.w3.org/TR/string-meta/#why-is-this-important). Strings in this API may have to be written to fixed-function [=authenticators=] and then later read back and displayed on a different platform. Thus language and direction metadata is encoded in the string itself to ensure that it is transported atomically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you chose prefixing versus suffixing. With prefixing, one runs the risk of losing content due to truncation. Suffixing would be least harmful this way? I'll admit that having the RLM/LRM before the text is more helpful, though.
The other thing that should be considered (and our document string-meta calls this out) is that applications should avoid adding additional bidi controls if the required controls already exist on the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that text libraries would require that such things came first in the string but, if not, suffixing acts better under truncation as you note. However, truncation becomes more complex because a language tag could be truncated into another valid language. For example, “ar-SA” into “ar”. That might be even worse than not knowing the language.
Thus, in order that the consumer can know whether the language has been truncated or not I have made either a directionality tag or U+E007F mandatory since it then acts as a terminator for the language tag. I've noted that a directionality tag should only be used if necessary to produce the correct result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aphillips Are you ok with Adam's response ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aphillips This is holding us from submitting for CR, so we would like to get this closed, please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agl has updated this PR per @aphillips' futher comments, i.e., switched to encoding the language and directionality as suffixed information, and indicated that directionality indicator should only be used when necessary to produce the correct result.
since we've addressed his outstanding comments, I think, as a suggestion, we can go ahead and merge this if @aphillips has not reviewed it by the end of day Monday.
@equalsJeffH OK, if any one objects with this approach please speak up before EOB on 12/14/2020 |
@equalsJeffH Please merge |
Fixes #1526.
Preview | Diff