Should there be a UTF-8 health warning? #335

aphillips · 2021-08-05T21:58:15Z

PushMessageData interface
https://www.w3.org/TR/push-api/#pushmessagedata-interface

In #276 we asked about the inherent UTF-8 requirement for the text (and to a far lesser extent json) methods. These method's default implementation assumes that the encoding of the message's bytes are, in fact, UTF-8 if the message is to be treated as text. The I18N WG is happy that UTF-8 is the default encoding and that it is the only supported encoding. But we note that there is no mention outside of the message data interface of UTF-8 or Unicode. Other data can be sent down the wire and retrieved using arrayBuffer or blob, but there is no mention of character encodings aside from the references to utf-8 decode and utf-8 encode in this section. So our ask is:

Should there be a health warning about using non-UTF-8 encodings?

[Note: this came out of I18N WG reviewing our previous comments in our periodic review cycle]

The text was updated successfully, but these errors were encountered:

marcoscaceres · 2021-10-21T04:59:35Z

Hi @aphillips,

Should there be a health warning about using non-UTF-8 encodings?

We can probably add a note or something. My reading is that the "utf-8 decode" will just add replacement characters but will always succeed (even with garbage).

Should we add a note just saying something about replacement characters? Or do you mean something else by "health warning about using non-UTF-8 encodings"?

If you have an example from another spec, that would be really helpful!

aphillips · 2022-01-18T15:56:07Z

The problem here is that there is no actual mention of character encoding besides the utf-8 decode. Yes, the decode will succeed regardless of the encoding of bytes, but this interface can also be used for sending bytes. I would at least mention that failing to use UTF-8 will produce replacement characters or mojibake garbage. Perhaps:

Note that textual content is expected to use the UTF-8 character encoding. Content using a different character encoding needs to be decoded from an arrayBuffer() or blob().

aphillips added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Aug 5, 2021

aphillips mentioned this issue Aug 5, 2021

Should text have a UTF-8 health warning? w3c/i18n-activity#1403

Open

marcoscaceres added the editorial label Oct 21, 2021

marcoscaceres mentioned this issue Oct 21, 2021

Progressing spec to CR #334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should there be a UTF-8 health warning? #335

Should there be a UTF-8 health warning? #335

aphillips commented Aug 5, 2021

marcoscaceres commented Oct 21, 2021

aphillips commented Jan 18, 2022

Should there be a UTF-8 health warning? #335

Should there be a UTF-8 health warning? #335

Comments

aphillips commented Aug 5, 2021

marcoscaceres commented Oct 21, 2021

aphillips commented Jan 18, 2022