Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added introduction #552

Merged
merged 6 commits into from
Jun 2, 2020
Merged

Added introduction #552

merged 6 commits into from
Jun 2, 2020

Conversation

palemieux
Copy link
Contributor

@palemieux palemieux commented May 16, 2020

@palemieux palemieux added this to the IMSC1.2-PR milestone May 16, 2020
@palemieux palemieux self-assigned this May 16, 2020
Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way this is going - it does seem to be an improvement to bring the introductory bits together up front to set the scene.

are intended to be used across subtitle and caption delivery applications worldwide. It defines extensions to [[ttml2]], as well
as incorporates extensions specified in [[SMPTE2052-1]] and [[EBU-TT-D]].</p>

<p>In the <a>Text Profile</a>, timed text is expressed using Unicode text exclusively, whereas, in the <a>Image Profile</a>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a minor nit that we can quietly move along from, but I just wanted to note that the phrase "Unicode text" made me stop and do research. (note: this wording was present previously and moved here from what was §5.1)

Two things:

  1. I'm not sure how well understood "Unicode text" is as a general concept, and
  2. whereas Unicode assigns a unique number to every character, the Unicode encoding that we use also allows private use area codes to be specified, which feel like they are "not Unicode" in some sense, even though they are still encoded in the same way. In v1.2 we actually have a specific use for the PUA codes associated with the fonts we are referencing.

If anyone knows a better phrasing here, please suggest it!

Copy link
Contributor Author

@palemieux palemieux May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Character Information Item is the formal term if I am not mistaken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear, that's not a common everyday term!

Copy link

@vlevantovsky vlevantovsky May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas Unicode assigns a unique number to every character, the Unicode encoding that we use also allows private use area codes to be specified, which feel like they are "not Unicode" in some sense ...

Correct! One of the basic principles of the Unicode Standard is to separate text content encoding from any specific requirements for text display (e.g. such as ligatures). Prior to Unicode, certain code points have already been occupied by ligatures and other special presentation features, so in an effort to avoid conflicts and not introduce ambiguities, the existing presentations features (and other special purpose symbols) were left as PUA codes of the Unicode. They do violate basic principles that the Unicode is built on, and thus "not Unicode" in some sense, but for sake of backward compatibility were left as is. It is clear why ligatures (for example) should not be encoded as part of the text (if you want text to be editable and searchable), and for many "new" languages (e.g. Devanagari) ligatures have never had PUA codes assigned, but for legacy implementations certain presentation features had to be accommodated. The Unicode standard itself is a much more than just a list of code points, so compliant text encoding also implies compliance to applicable rules.

Suggestion: replace all references to "Unicode text" with "Unicode-compliant text string", or "text encoded according to the Unicode Standard", or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vlevantovsky either of those suggestions would work for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c2470d7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. in section 8.1, document instance is supposed to be encoded by UTF-8, but not allowing the all encodings in Unicode. Does this really encoding, or code points?

cf. https://unicode.org/standard/principles.html

The Unicode Standard defines codes for characters used in all the major languages written today.
The Unicode Standard defines three encoding forms that allow the same data to be transmitted in a byte, word or double word oriented format (i.e. in 8, 16 or 32-bits per code unit). All three encoding forms encode the same common character repertoire and can be efficiently transformed into one another without loss of data.

Copy link

@vlevantovsky vlevantovsky May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your concern. The paragraph you mentioned clearly says that "all three encoding forms encode the same common character repertoire and can be efficiently transformed into one another", so regardless whether the spec allows any encoding form to be used, or just one of them, the resulting text string is still compliant with the Unicode Standard.

My original comment was specifically related to Unicode having a provision for PUA code points, which @nigelmegitt rightfully described as seemingly "not Unicode" in spirit. One of the basic Unicode principles is to encode text in the logical order of characters , without any concern for language, writing direction, and any particular presentation features - the encoding conveys text content and remains neutral to anything related to text display. Any use of PUA codepoint that encodes a presentation feature (e.g. a ligature that replaces a combination of characters in a word) is a violation of this principle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

timed text is expressed exclusively using code for characters defined in [[[Unicode]]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlevantovsky this text changed around the Unicode text part that you commented on, to resolve comments by @himorin - you might want to take a look.

imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
imsc1/spec/ttml-ww-profiles.html Outdated Show resolved Hide resolved
@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed IMSC 1.2 Introduction, and agreed to the following:

  • SUMMARY: @nigelmegitt to open new issue for example, normal PR review to continue.
The full IRC log of that discussion <nigel> Topic: IMSC 1.2 Introduction
<nigel> github: https://github.com//pull/552
<nigel> Nigel: It feels like what's there now is probably good enough, though I think the main
<nigel> .. remaining comments are from me.
<nigel> Pierre: I want to make sure that Atsushi's comment got resolved.
<nigel> Nigel: Atushi's comment was about the Unicode text wording.
<nigel> Atsushi: I assume this part wants to mention that this specification should mention that
<nigel> .. Unicode code points should be used in the encoding but not anything else.
<nigel> Pierre: To answer that, Unicode is being used maybe not very formally here. To the casual
<nigel> .. reader Unicode text means something.
<nigel> Atsushi: I think I should point to some reference here but sorry I haven't. I'm curious about
<nigel> .. using the word "encoding" here.
<nigel> .. The actual definition is "code point" in Unicode.
<nigel> Nigel: Is there something misleading about the current wording "encoded according to the Unicode standard"?
<nigel> Atsushi: 3 encodings are defined. Encoding is a transformation from code point identifier to byte stream.
<nigel> .. PUA has no meaning in encoding, it's within a code point of Unicode.
<nigel> Nigel: PUA is not mentioned, it's something that is understood by experts.
<atsushi> > The Unicode Standard defines codes for characters used in all the major languages written today.
<nigel> Atsushi: [proposes to say that the document consists of Unicode code points]
<nigel> Pierre: That's fine by me
<nigel> Nigel: Is PUA included in that set?
<nigel> Atsushi: Included.
<nigel> .. PUA is defined by each party, not standardised with a match between character and code point.
<nigel> Pierre: I would be really happy to see the exact proposal on the ticket, because that
<nigel> .. would also allow @vlevantovski to comment. Could I ask you to make a proposal in
<nigel> .. the pull request for the exact text? That would be great.
<nigel> Atsushi: Let me do that now.
<nigel> Nigel: While Atsushi is doing that, I think it's safe to mention that my comments that
<nigel> .. are still outstanding (thank you for addressing the others), are all about adding an
<nigel> .. example. I think what we have already is good enough, and a clear improvement,
<nigel> .. and crucially, satisfies the APA WG issue, so the best thing seems to me to be to
<nigel> .. move addition of an example to a new issue, and I should try to prepare a pull request
<nigel> .. for that separately. It would be great to do it before IMSC 1.2 PR, but not essential.
<nigel> .. In other words, it could go to a next version.
<nigel> Pierre: Atsushi's change is fine with me.
<atsushi> https://glyphwiki.org/wiki/u3110
<nigel> Nigel: I might have used "character codes"
<nigel> Pierre: Or "code points"
<nigel> Atsushi: This U+3110 is a code point defined by Unicode.
<nigel> .. 3110 is the code point, and this will be transformed into several formed, like in UTF
<nigel> .. it will be 3 bytes.
<nigel> Pierre: Understood. How about my proposal "using code points defined in Unicode"
<nigel> Atsushi: Should be fine also.
<nigel> Pierre: I will make the change now.
<nigel> .. I just want to point out that because the only representation is UTF-8 it is true
<nigel> .. that the only representation is Unicode, right, but you're saying that is too specific?
<nigel> .. In other words it is not wrong to say it is encoded according to Unicode.
<nigel> Atsushi: I actually wondered if people would think other encodings would be valid
<nigel> .. like UTF-16, which is a Unicode encoding.
<nigel> Pierre: Right, and it's forbidden in IMSC.
<nigel> Atsushi: I just wanted to be clear about that.
<nigel> Pierre: [makes the change]
<nigel> .. Nigel, you will resolve your review comment and open a new issue?
<nigel> Nigel: Yes.
<nigel> Pierre: Then we can close this after our usual 2 week period.
<nigel> Nigel: Yes.
<nigel> .. Any other comments on the introduction text before we move on?
<nigel> SUMMARY: @nigelmegitt to open new issue for example, normal PR review to continue.

Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@palemieux palemieux merged commit 3091d08 into master Jun 2, 2020
@palemieux palemieux deleted the issues/0522-add-introduction branch June 4, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants