Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify language and direction encoding. #1530

Merged
merged 2 commits into from
Dec 15, 2020
Merged

Specify language and direction encoding. #1530

merged 2 commits into from
Dec 15, 2020

Conversation

agl
Copy link
Contributor

@agl agl commented Dec 2, 2020

Fixes #1526.


Preview | Diff

@agl agl requested a review from aphillips December 2, 2020 22:14
Copy link
Contributor

@ve7jtb ve7jtb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Authenticators with internal displays are going to have some work to do, however.

pull bot pushed a commit to Yannic/chromium that referenced this pull request Dec 8, 2020
w3c/webauthn#1530 specifies a method for
encoding language and direction information in strings that will be
stored on authenticators. Since we may start to see this in credentials
in the wild, this change adds examples to the UI test so that we see
what they'll look like.

(On Linux, at least, they at least don't render as garbage or crash.
However the zh-Hant example doesn't display, possibly because I'm
missing suitable fonts.)

Change-Id: I9e778a08df85b8ae34032341c6bcd84291ff210c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2576795
Commit-Queue: Martin Kreichgauer <martinkr@google.com>
Auto-Submit: Adam Langley <agl@chromium.org>
Reviewed-by: Martin Kreichgauer <martinkr@google.com>
Cr-Commit-Position: refs/heads/master@{#834370}
Copy link
Contributor

@equalsJeffH equalsJeffH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx @agl!

@nadalin
Copy link
Contributor

nadalin commented Dec 9, 2020

@aphillips Please review

@equalsJeffH
Copy link
Contributor

on 2020-12-09 call: @nadalin to chase-up @aphillips for review.

index.bs Outdated

If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.
Each arbitrary string in the API will have some accommodation for the potentially limited resources available to an [=authenticator=]. If string value truncation is the chosen accommodation then authenticators MAY truncate in order to make the string fit within a length equal or greater than the specified minimum supported length. Such truncation MAY also respect UTF-8 sequence boundaries or [=grapheme cluster=] boundaries [[UTR29]]. This defines the maximum truncation permitted and authenticators MUST NOT truncate further.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such truncation MAY also respect

I would suggest that this should be SHOULD, since arbitrary byte truncation is harmful and since UTF-8 sequence boundary truncation is easily accomplished using a bitmask. SHOULD still permits truly minimal authenticators to use byte boundaries. Because grapheme cluster boundaries is more complex, probably the resulting requirement should be:

Such truncation SHOULD also respect UTF-8 sequence boundaries and MAY also choose to respect grapheme cluster boundaries...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the language/direction encoding below, should the first step here be: remove the language/direction metadata? (This is one reason I suggested suffixing at least the language metadata)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to SHOULD, although I don't expect any impact: there are millions of fixed-function devices already out in the world and I understand that even small features like this are troublesome. But perhaps some would be more likely to do the right thing.

@@ -4007,6 +4019,23 @@ In addition to that, truncating on byte boundaries alone causes a known issue th
1. Ensure that any strings sent to authenticators are validly encoded.
1. Handle the case where strings have been truncated resulting in an invalid encoding. For example, any partial code point at the end may be dropped or replaced with [U+FFFD](http://unicode.org/cldr/utility/character.jsp?a=FFFD).

### Language and Direction Encoding ### {#sctn-strings-langdir}

In order to be correctly displayed in context, the language and base direction of a string [may be required](https://www.w3.org/TR/string-meta/#why-is-this-important). Strings in this API may have to be written to fixed-function [=authenticators=] and then later read back and displayed on a different platform. Thus language and direction metadata is encoded in the string itself to ensure that it is transported atomically.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you chose prefixing versus suffixing. With prefixing, one runs the risk of losing content due to truncation. Suffixing would be least harmful this way? I'll admit that having the RLM/LRM before the text is more helpful, though.

The other thing that should be considered (and our document string-meta calls this out) is that applications should avoid adding additional bidi controls if the required controls already exist on the string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that text libraries would require that such things came first in the string but, if not, suffixing acts better under truncation as you note. However, truncation becomes more complex because a language tag could be truncated into another valid language. For example, “ar-SA” into “ar”. That might be even worse than not knowing the language.

Thus, in order that the consumer can know whether the language has been truncated or not I have made either a directionality tag or U+E007F mandatory since it then acts as a terminator for the language tag. I've noted that a directionality tag should only be used if necessary to produce the correct result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips Are you ok with Adam's response ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips This is holding us from submitting for CR, so we would like to get this closed, please review

Copy link
Contributor

@equalsJeffH equalsJeffH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agl has updated this PR per @aphillips' futher comments, i.e., switched to encoding the language and directionality as suffixed information, and indicated that directionality indicator should only be used when necessary to produce the correct result.

since we've addressed his outstanding comments, I think, as a suggestion, we can go ahead and merge this if @aphillips has not reviewed it by the end of day Monday.

@nadalin
Copy link
Contributor

nadalin commented Dec 14, 2020

@equalsJeffH OK, if any one objects with this approach please speak up before EOB on 12/14/2020

@nadalin
Copy link
Contributor

nadalin commented Dec 15, 2020

@equalsJeffH Please merge

@agl agl merged commit f180c7a into w3c:master Dec 15, 2020
@agl agl deleted the i18n3 branch December 15, 2020 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for language/direction metadata in PKCE 'name'?
5 participants