Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should UTF-8 'as specified in' point to the Encoding spec? #253

Closed
r12a opened this issue Sep 15, 2017 · 8 comments
Closed

Should UTF-8 'as specified in' point to the Encoding spec? #253

r12a opened this issue Sep 15, 2017 · 8 comments
Labels
i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. imsc1.0.1 WR-commenter-no-response WR-resolved
Milestone

Comments

@r12a
Copy link

r12a commented Sep 15, 2017

6.1 Document Encoding
https://www.w3.org/TR/ttml-imsc1.0.1/#document-encoding

A Document Instance SHALL use UTF-8 character encoding as specified in [UNICODE].

The i18n WG discussed this and suggests that 'as specified in' should probably either point to the Encoding spec or be dropped.

@r12a r12a added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Sep 15, 2017
@palemieux
Copy link
Contributor

@r12a Isn't UTF-8 specified at http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf#G7404 ?

@css-meeting-bot
Copy link
Member

The Working Group just discussed Should UTF-8 'as specified in' point to the Encoding spec? #253.

The full IRC log of that discussion <nigel> Topic: Should UTF-8 'as specified in' point to the Encoding spec? #253
<nigel> github: https://github.com//issues/253
<nigel> Nigel: I see you asked @r12a a question, @palemieux, so I think we're just waiting for that.
<nigel> Pierre: That's right.

@asmusf
Copy link

asmusf commented Nov 6, 2017

There are some options especially in handling error conditions that the Encoding spec may select from, but which would be indeterminate in the Unicode specification.

From the perspective of the Unicode Consortium, it is seen as preferable to have specific error condition handling be the scope of application-specific standards, such as the Encoding spec (not all conceivable use cases of UTF-8 benefit from being 100% aligned in handling certain details of illegal code sequences, beyond knowing that they are illegal).

The last UTC meeting had a discussion on this very topic and I heard something to the extent that the above was the outcome (but perhaps someone who was there could confirm).

The Encoding spec should then be sure to not deviate from the [Unicode] specification of UTF-8, except as far as optional behavior is concerned, where it may decide to limit the available options to achieve identical behavior even in the case of ill-formed input.

@palemieux
Copy link
Contributor

@asmusf Where would I find and/or reference the Encoding spec?

@css-meeting-bot
Copy link
Member

The Working Group just discussed Should UTF-8 'as specified in' point to the Encoding spec? #253, and agreed to the following resolutions:

  • RESOLUTION: We do not need to refer to the Encoding spec and do not need to make any change.
The full IRC log of that discussion <nigel> Topic: Should UTF-8 'as specified in' point to the Encoding spec? #253
<nigel> github: https://github.com//issues/253
<nigel> glenn: My response is "no" it should point to Unicode.
<nigel> r12a: The first question is does it have to point anywhere? People don't normally referene
<nigel> s/ene/ence
<nigel> .. a definition of UTF-8.
<nigel> glenn: There are different definitions of UTF-8 so we need to nail which one we mean.
<nigel> .. All current TTML specs refer to Unicode for that definition.
<nigel> r12a: Unicode defines UTF-8, the encoding specification defines it in terms of conversion
<nigel> .. between legacy encodings and UTF-8 code points, but does not define UTF-8 per se.
<nigel> .. Which of those do you need or want?
<nigel> glenn: We don't refer to legacy encodings anywhere so we don't have a normative requirement
<nigel> .. to say anything. There's just UTF-8, and how it got there is outside of the scope.
<nigel> atai: How does HTML5 handle this?
<nigel> nigel: It seems like nobody thinks we need to make any changes here?
<cyril> regarding the previous resolution, there should be matching normative behavior that produces what the note says. You cannot simply have a note that says "does not result"
<nigel> RESOLUTION: We do not need to refer to the Encoding spec and do not need to make any change.
<nigel> nigel: I can't label this because the labels aren't on the repo for WD tracking.
<nigel> .. I've raised #282 for Thierry to add them.
<cyril> rrsagent, pointer
<RRSAgent> See https://www.w3.org/2017/11/09-tt-irc#T19-03-00

@nigelmegitt
Copy link
Contributor

@r12a please could you confirm if this discussion has addressed the issue to your satisfaction (for WR comment tracking purposes please could you respond within the next 2 weeks?)

@nigelmegitt
Copy link
Contributor

As resolved in today's TTWG meeting, we have marked this as commenter no response and will close the issue.

@r12a
Copy link
Author

r12a commented Feb 19, 2018

The i18n WG notes that no change was made, but doesn't not consider this important enough to object formally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. imsc1.0.1 WR-commenter-no-response WR-resolved
Projects
None yet
Development

No branches or pull requests

6 participants