Added introduction #552

palemieux · 2020-05-16T18:26:37Z

nigelmegitt

I like the way this is going - it does seem to be an improvement to bring the introductory bits together up front to set the scene.

nigelmegitt · 2020-05-18T13:34:40Z

imsc1/spec/ttml-ww-profiles.html

+    are intended to be used across subtitle and caption delivery applications worldwide. It defines extensions to [[ttml2]], as well
+    as incorporates extensions specified in [[SMPTE2052-1]] and [[EBU-TT-D]].</p>
+
+    <p>In the <a>Text Profile</a>, timed text is expressed using Unicode text exclusively, whereas, in the <a>Image Profile</a>,


Probably a minor nit that we can quietly move along from, but I just wanted to note that the phrase "Unicode text" made me stop and do research. (note: this wording was present previously and moved here from what was §5.1)

Two things:

I'm not sure how well understood "Unicode text" is as a general concept, and

whereas Unicode assigns a unique number to every character, the Unicode encoding that we use also allows private use area codes to be specified, which feel like they are "not Unicode" in some sense, even though they are still encoded in the same way. In v1.2 we actually have a specific use for the PUA codes associated with the fonts we are referencing.

If anyone knows a better phrasing here, please suggest it!

Character Information Item is the formal term if I am not mistaken.

Oh dear, that's not a common everyday term!

whereas Unicode assigns a unique number to every character, the Unicode encoding that we use also allows private use area codes to be specified, which feel like they are "not Unicode" in some sense ...

Correct! One of the basic principles of the Unicode Standard is to separate text content encoding from any specific requirements for text display (e.g. such as ligatures). Prior to Unicode, certain code points have already been occupied by ligatures and other special presentation features, so in an effort to avoid conflicts and not introduce ambiguities, the existing presentations features (and other special purpose symbols) were left as PUA codes of the Unicode. They do violate basic principles that the Unicode is built on, and thus "not Unicode" in some sense, but for sake of backward compatibility were left as is. It is clear why ligatures (for example) should not be encoded as part of the text (if you want text to be editable and searchable), and for many "new" languages (e.g. Devanagari) ligatures have never had PUA codes assigned, but for legacy implementations certain presentation features had to be accommodated. The Unicode standard itself is a much more than just a list of code points, so compliant text encoding also implies compliance to applicable rules.

Suggestion: replace all references to "Unicode text" with "Unicode-compliant text string", or "text encoded according to the Unicode Standard", or something similar.

Thanks @vlevantovsky either of those suggestions would work for me.

See c2470d7

I'm confused here. in section 8.1, document instance is supposed to be encoded by UTF-8, but not allowing the all encodings in Unicode. Does this really encoding, or code points?

cf. https://unicode.org/standard/principles.html

The Unicode Standard defines codes for characters used in all the major languages written today.
The Unicode Standard defines three encoding forms that allow the same data to be transmitted in a byte, word or double word oriented format (i.e. in 8, 16 or 32-bits per code unit). All three encoding forms encode the same common character repertoire and can be efficiently transformed into one another without loss of data.

I am not sure I understand your concern. The paragraph you mentioned clearly says that "all three encoding forms encode the same common character repertoire and can be efficiently transformed into one another", so regardless whether the spec allows any encoding form to be used, or just one of them, the resulting text string is still compliant with the Unicode Standard.

My original comment was specifically related to Unicode having a provision for PUA code points, which @nigelmegitt rightfully described as seemingly "not Unicode" in spirit. One of the basic Unicode principles is to encode text in the logical order of characters , without any concern for language, writing direction, and any particular presentation features - the encoding conveys text content and remains neutral to anything related to text display. Any use of PUA codepoint that encodes a presentation feature (e.g. a ligature that replaces a combination of characters in a word) is a violation of this principle.

how about

timed text is expressed exclusively using code for characters defined in [[[Unicode]]]

@vlevantovsky this text changed around the Unicode text part that you commented on, to resolve comments by @himorin - you might want to take a look.

imsc1/spec/ttml-ww-profiles.html

css-meeting-bot · 2020-05-21T15:26:03Z

The Timed Text Working Group just discussed IMSC 1.2 Introduction, and agreed to the following:

SUMMARY: @nigelmegitt to open new issue for example, normal PR review to continue.

The full IRC log of that discussion

<nigel> Topic: IMSC 1.2 Introduction
<nigel> github: https://github.com//pull/552
<nigel> Nigel: It feels like what's there now is probably good enough, though I think the main
<nigel> .. remaining comments are from me.
<nigel> Pierre: I want to make sure that Atsushi's comment got resolved.
<nigel> Nigel: Atushi's comment was about the Unicode text wording.
<nigel> Atsushi: I assume this part wants to mention that this specification should mention that
<nigel> .. Unicode code points should be used in the encoding but not anything else.
<nigel> Pierre: To answer that, Unicode is being used maybe not very formally here. To the casual
<nigel> .. reader Unicode text means something.
<nigel> Atsushi: I think I should point to some reference here but sorry I haven't. I'm curious about
<nigel> .. using the word "encoding" here.
<nigel> .. The actual definition is "code point" in Unicode.
<nigel> Nigel: Is there something misleading about the current wording "encoded according to the Unicode standard"?
<nigel> Atsushi: 3 encodings are defined. Encoding is a transformation from code point identifier to byte stream.
<nigel> .. PUA has no meaning in encoding, it's within a code point of Unicode.
<nigel> Nigel: PUA is not mentioned, it's something that is understood by experts.
<atsushi> > The Unicode Standard defines codes for characters used in all the major languages written today.
<nigel> Atsushi: [proposes to say that the document consists of Unicode code points]
<nigel> Pierre: That's fine by me
<nigel> Nigel: Is PUA included in that set?
<nigel> Atsushi: Included.
<nigel> .. PUA is defined by each party, not standardised with a match between character and code point.
<nigel> Pierre: I would be really happy to see the exact proposal on the ticket, because that
<nigel> .. would also allow @vlevantovski to comment. Could I ask you to make a proposal in
<nigel> .. the pull request for the exact text? That would be great.
<nigel> Atsushi: Let me do that now.
<nigel> Nigel: While Atsushi is doing that, I think it's safe to mention that my comments that
<nigel> .. are still outstanding (thank you for addressing the others), are all about adding an
<nigel> .. example. I think what we have already is good enough, and a clear improvement,
<nigel> .. and crucially, satisfies the APA WG issue, so the best thing seems to me to be to
<nigel> .. move addition of an example to a new issue, and I should try to prepare a pull request
<nigel> .. for that separately. It would be great to do it before IMSC 1.2 PR, but not essential.
<nigel> .. In other words, it could go to a next version.
<nigel> Pierre: Atsushi's change is fine with me.
<atsushi> https://glyphwiki.org/wiki/u3110
<nigel> Nigel: I might have used "character codes"
<nigel> Pierre: Or "code points"
<nigel> Atsushi: This U+3110 is a code point defined by Unicode.
<nigel> .. 3110 is the code point, and this will be transformed into several formed, like in UTF
<nigel> .. it will be 3 bytes.
<nigel> Pierre: Understood. How about my proposal "using code points defined in Unicode"
<nigel> Atsushi: Should be fine also.
<nigel> Pierre: I will make the change now.
<nigel> .. I just want to point out that because the only representation is UTF-8 it is true
<nigel> .. that the only representation is Unicode, right, but you're saying that is too specific?
<nigel> .. In other words it is not wrong to say it is encoded according to Unicode.
<nigel> Atsushi: I actually wondered if people would think other encodings would be valid
<nigel> .. like UTF-16, which is a Unicode encoding.
<nigel> Pierre: Right, and it's forbidden in IMSC.
<nigel> Atsushi: I just wanted to be clear about that.
<nigel> Pierre: [makes the change]
<nigel> .. Nigel, you will resolve your review comment and open a new issue?
<nigel> Nigel: Yes.
<nigel> Pierre: Then we can close this after our usual 2 week period.
<nigel> Nigel: Yes.
<nigel> .. Any other comments on the introduction text before we move on?
<nigel> SUMMARY: @nigelmegitt to open new issue for example, normal PR review to continue.

nigelmegitt

Looks good to me.

…roduction

Added introduction (#522)

3bfe2d4

palemieux added this to the IMSC1.2-PR milestone May 16, 2020

palemieux self-assigned this May 16, 2020

palemieux mentioned this pull request May 16, 2020

APA WG comment: Add introduction #522

Closed

nigelmegitt requested changes May 18, 2020

View reviewed changes

Address review comments

c2470d7

nigelmegitt reviewed May 19, 2020

View reviewed changes

imsc1/spec/ttml-ww-profiles.html Show resolved Hide resolved

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved

nigelmegitt mentioned this pull request May 19, 2020

TTWG Meeting 2020-05-21 w3c/ttwg#115

Closed

palemieux added 3 commits May 19, 2020 08:59

Fixed typo

d63d389

Improved introduction

ba2fd4f

Address comment #552 (comment)

db0379a

nigelmegitt mentioned this pull request May 21, 2020

Introduction: include an example pair of documents, one Text and one Image profile #553

Open

nigelmegitt approved these changes May 21, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into issues/0522-add-int…

43c5672

…roduction

palemieux merged commit 3091d08 into master Jun 2, 2020

palemieux deleted the issues/0522-add-introduction branch June 4, 2020 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added introduction #552

Added introduction #552

palemieux commented May 16, 2020 •

edited by pr-preview bot

Loading

nigelmegitt left a comment

nigelmegitt May 18, 2020

palemieux May 18, 2020 •

edited

Loading

nigelmegitt May 18, 2020

vlevantovsky May 18, 2020 •

edited

Loading

nigelmegitt May 18, 2020

palemieux May 18, 2020

himorin May 19, 2020

vlevantovsky May 19, 2020 •

edited

Loading

himorin May 21, 2020

nigelmegitt May 21, 2020

css-meeting-bot commented May 21, 2020

nigelmegitt left a comment

Added introduction #552

Added introduction #552

Conversation

palemieux commented May 16, 2020 • edited by pr-preview bot Loading

nigelmegitt left a comment

Choose a reason for hiding this comment

nigelmegitt May 18, 2020

Choose a reason for hiding this comment

palemieux May 18, 2020 • edited Loading

Choose a reason for hiding this comment

nigelmegitt May 18, 2020

Choose a reason for hiding this comment

vlevantovsky May 18, 2020 • edited Loading

Choose a reason for hiding this comment

nigelmegitt May 18, 2020

Choose a reason for hiding this comment

palemieux May 18, 2020

Choose a reason for hiding this comment

himorin May 19, 2020

Choose a reason for hiding this comment

vlevantovsky May 19, 2020 • edited Loading

Choose a reason for hiding this comment

himorin May 21, 2020

Choose a reason for hiding this comment

nigelmegitt May 21, 2020

Choose a reason for hiding this comment

css-meeting-bot commented May 21, 2020

nigelmegitt left a comment

Choose a reason for hiding this comment

palemieux commented May 16, 2020 •

edited by pr-preview bot

Loading

palemieux May 18, 2020 •

edited

Loading

vlevantovsky May 18, 2020 •

edited

Loading

vlevantovsky May 19, 2020 •

edited

Loading