Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should there be a UTF-8 health warning? #335

Open
aphillips opened this issue Aug 5, 2021 · 2 comments
Open

Should there be a UTF-8 health warning? #335

aphillips opened this issue Aug 5, 2021 · 2 comments
Labels
editorial i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on.

Comments

@aphillips
Copy link

PushMessageData interface
https://www.w3.org/TR/push-api/#pushmessagedata-interface

In #276 we asked about the inherent UTF-8 requirement for the text (and to a far lesser extent json) methods. These method's default implementation assumes that the encoding of the message's bytes are, in fact, UTF-8 if the message is to be treated as text. The I18N WG is happy that UTF-8 is the default encoding and that it is the only supported encoding. But we note that there is no mention outside of the message data interface of UTF-8 or Unicode. Other data can be sent down the wire and retrieved using arrayBuffer or blob, but there is no mention of character encodings aside from the references to utf-8 decode and utf-8 encode in this section. So our ask is:

Should there be a health warning about using non-UTF-8 encodings?

[Note: this came out of I18N WG reviewing our previous comments in our periodic review cycle]

@aphillips aphillips added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Aug 5, 2021
@marcoscaceres
Copy link
Member

Hi @aphillips,

Should there be a health warning about using non-UTF-8 encodings?

We can probably add a note or something. My reading is that the "utf-8 decode" will just add replacement characters but will always succeed (even with garbage).

Should we add a note just saying something about replacement characters? Or do you mean something else by "health warning about using non-UTF-8 encodings"?

If you have an example from another spec, that would be really helpful!

@aphillips
Copy link
Author

The problem here is that there is no actual mention of character encoding besides the utf-8 decode. Yes, the decode will succeed regardless of the encoding of bytes, but this interface can also be used for sending bytes. I would at least mention that failing to use UTF-8 will produce replacement characters or mojibake garbage. Perhaps:

Note that textual content is expected to use the UTF-8 character encoding. Content using a different character encoding needs to be decoded from an arrayBuffer() or blob().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on.
Projects
None yet
Development

No branches or pull requests

2 participants