Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing position of metadata #1398

Open
aphillips opened this issue Jul 6, 2021 · 2 comments
Open

Trailing position of metadata #1398

aphillips opened this issue Jul 6, 2021 · 2 comments
Labels
needs-resolution i18n expects this item to be resolved to their satisfaction. s:webauthn https://w3c.github.io/webauthn/ t:bidi_strings 3.5 Handling base direction for strings t:lang_strings 2.4 Identifying the language of strings wg:webauthn https://www.w3.org/groups/wg/webauthn

Comments

@aphillips
Copy link
Contributor

aphillips commented Jul 6, 2021

This is a tracker issue. Only discuss things here if they are i18n WG internal meta-discussions about the issue. Contribute to the actual discussion at the following link:

§ w3c/webauthn#1646

@aphillips aphillips added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:webauthn https://w3c.github.io/webauthn/ labels Jul 6, 2021
@r12a
Copy link
Contributor

r12a commented Jul 7, 2021

I'm not persuaded that suffixing is a good idea at all. You already mentioned that including RLM/LRM at the start of the string is more efficient, and we don't want them to use language tag code points, which are the things that take up lots of initial bytes. So really we're talking about one extra character in the string, and then only where the bidi algorithm needs help.

So i don't think the argument about preserving data rather than medata is convincing. And anyway, if strings are going to be truncated, either (a) it's likely to be less problematic to lose one code point at the end (since we are truncating already) than to lose directional information, and (b) this format they describe isn't JSON-LD, so i don't think it's comparable to @lang, and (c) if they intend for metadata to be post-pended, they should require the consumer to capture and apply the metadata before truncating.

And, as Martin mentioned, using paired controls, such as the language tags or the RLI...PDI etc code points, where some of the metadata is effectively post-pended, is dangerous in scenarios where truncation becomes a possibility, since a missing end code point can cause problems when the text is inserted into a location. (I think we may need to make that point in string-meta, btw.)

Btw, although it does say it, I think we could improve the first para to more clearly indicate that what was put in the spec drew on Addison's personal thoughts before they could be discussed by the i18n WG.

@aphillips
Copy link
Contributor Author

I think it is reasonable to separate language and directional metadata here.

Language tags can be quite long and even the shortest language tag, when encoded using the Unicode language tag characters, would require 16-bytes to encode (start tag, alpha2 primary language, cancel tag) in UTF-8. Since the tag characters probably should be removed before displaying or processing the string, a trailing position might be cleaner (it's easier to truncate a string than substringing it from the front).

Either way, adding tag characters produces problems for string concatenation and other string operations. And naive implementations that don't process the field can display tofu or garbage as if it were part of the data. Overall, using in-string metadata is a bad idea.

When talking about direction, I think it is helpful to separate bidi controls from metadata. Including LRM/RLM or a LRI/RLI/FSI + PDI enclosing sequence is, to my mind, "altering the contents" of the string to help it display correctly. Processes such as truncation (particularly with the paired controls!) or additional attempts to produce a display-ready sequence alters the meaning and display of the content. These arguments are not new: we talk exhaustively about this in String-Meta as reasons why not to use this as a way of communicating direction.

To me, bidi metadata should instead be explicit, which includes not using invisible controls to convey the value. A field like direction with values such as ltr and rtl is a better choice by far.

Overall, it would have been better if, given that webAuthn could not/would not introduce additional fields, they had adopted a serialization scheme using ASCII characters that was unambiguous and machine readable. I note that the RDF solution found in JSON-LD does this pretty well. Amusingly, the example given there uses 16 bytes to encoding an average sized language tag and the direction:

"HTML و CSS: تصميم و إنشاء مواقع الويب"^^i18n:ar-eg_rtl

... but I'd still tend to say that failing to address our comment at all and coming back in v3 to introduce true metadata would have been the better option.

@aphillips aphillips removed the pending Issue not yet sent to WG, or raised by tracker tool & needing labels. label Jul 9, 2021
@xfq xfq added the needs-resolution i18n expects this item to be resolved to their satisfaction. label Jul 10, 2021
@r12a r12a added t:bidi_strings 3.5 Handling base direction for strings t:lang_strings 2.4 Identifying the language of strings labels Jul 14, 2022
@w3cbot w3cbot added the wg:webauthn https://www.w3.org/groups/wg/webauthn label Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-resolution i18n expects this item to be resolved to their satisfaction. s:webauthn https://w3c.github.io/webauthn/ t:bidi_strings 3.5 Handling base direction for strings t:lang_strings 2.4 Identifying the language of strings wg:webauthn https://www.w3.org/groups/wg/webauthn
Projects
None yet
Development

No branches or pull requests

4 participants