Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMSC subtitle/caption rendering appears to need(?) VTTCue class which is not in our spec #211

Closed
jpiesing opened this issue Aug 20, 2019 · 29 comments

Comments

@jpiesing
Copy link

The Web Media API spec doesn't include WebVTT - a situation I wholly agree with.
But ...
When researching how subtitle rendering works when video & audio is presented via MSE, I was referred to this file from DASH.js.

https://github.com/Dash-Industry-Forum/dash.js/blob/master/src/streaming/text/TextTracks.js

In this code, the IMSC subtitles/captions are parsed and rendered in JS but the timing is all done through HTML5 by creating *Cue objects using the constructor, adding them to TextTracks and relying on the time marches on algorithm to fire onenter and onexit events as appropriate.

The problem is that the constructor for TextTrackCue is not intended to be called. You can find various references to this via Google. DASH.js includes the following logic for this;

Cue = window.VTTCue || window.TextTrackCue;

If the reality is that the timing aspect of rendering IMSC subtitles requires the VTTCue class then we should consider including that class in the Web Media API spec.

@wilaw
Copy link

wilaw commented Aug 20, 2019

But that code snippet suggests that VTTCue class is optional as long as TextTrackCue is supported (since its an OR). Therefore it seems fine to omit VTTCue from WMAS as long as TextTrackCue is included?

@chrisn
Copy link
Member

chrisn commented Aug 20, 2019

TextTrackCue doesn't have a constructor (see WHATWG HTML spec), so only VTTCue objects can be created from script. So it's not clear that that code snippet actually works, unless an earlier iteration of TextTrackCue did have a constructor?

@jpiesing
Copy link
Author

But that code snippet suggests that VTTCue class is optional as long as TextTrackCue is supported (since its an OR). Therefore it seems fine to omit VTTCue from WMAS as long as TextTrackCue is included?

The Cue variable is used in two places - to provide the constructor for a TextTrackCue or a VTTCue.

cue = new Cue(currentItem.start - timeOffset, currentItem.end - timeOffset, currentItem.data);

or

      } else {
            if (currentItem.data) {
                cue = new Cue(currentItem.start - timeOffset, currentItem.end - timeOffset, currentItem.data);

To quote MDN on TextTrackCue ...

TextTrackCue is an abstract class which is used as the basis for the various derived cue types, such as VTTCue; you will instead work with those derived types. These cues represent a string of text that is presented for some duration of time during the performance of a TextTrack. The cue includes the start time (the time at which the text will be displayed) and the end time (the time at which it will be removed from the display), as well as other information.

Chrome prevents construction of TextTrackCue instances since 2014.

https://www.chromestatus.com/feature/5818821692620800

The VTTCue interface, part of WebVTT. Previously the members of VTTCue were on the TextTrackCue interface and there was a TextTrackCue constructor, now the VTTCue constructor must be used instead.

@JohnRiv
Copy link
Member

JohnRiv commented Aug 20, 2019

Looking into that line of code further, I don't think VTTCue is required. Of course, the best way to determine this is to have someone test & verify, but here's what my brief digging around the Internet turned up that lead me to thinking that:

  1. It appears VTTCue is not supported in Internet Explorer or Edge

  2. I found an issue citing that specific line of code: Captions don't render on IE11/Edge videojs/videojs-contrib-dash#191. Note the link to the code in the issue is for a different line now because the link points to the development branch, but if you look at that line of code at the time that issue was opened, you'll see it's the same line. There are references to other issues & PRs in that issue that are worth reviewing, but this one comment in particular caught my attention & lead me to believe VTTCue is not required for captions to function

@jpiesing
Copy link
Author

2\. There are references to other issues & PRs in that issue that are worth reviewing, but [this one comment in particular](https://github.com/videojs/videojs-contrib-dash/pull/192#pullrequestreview-44361362) caught my attention & lead me to believe `VTTCue` is not required for captions to function

You're far more familiar with the types of magic that can be achieved with JavaScript but and this is probably being far too naive but ...

If the 'time marches on' algorithm is to be used for subtitle / caption timing then somewhere, something has to be created that is close enough to TextTrackCue that it can successfully be passed to the TextTrack.addCue method.
If that something is not created by calling the constructor for VTTCue or the (former) constructor for TextTrackCue then where/how is it created? Is it possible to define a subclass of TextTrackCue in JavaScript?

@JohnRiv
Copy link
Member

JohnRiv commented Aug 20, 2019

I tested in Internet Explorer and Edge on Windows, and both browsers do allow for new TextTrackCue() to be called, so despite the WHATWG spec and MDN saying TextTrackCue does not have a constructor, it does in those browsers. Since IE and Edge for Windows don't support VTTCue but other browsers do, then

Cue = window.VTTCue || window.TextTrackCue;

handles creating a reference to a cue that works across all major browsers.

Also FYI, the Chromium-based Edge that was released today does support VTTCue and throws an error if you try to call new TextTrackCue() (same behavior as Chrome).

If that something is not created by calling the constructor for VTTCue or the (former) constructor for TextTrackCue then where/how is it created?

One of those constructors will have to be called to create it (depending on which the browser supports)

Is it possible to define a subclass of TextTrackCue in JavaScript?

Although ES6 added the ability to extend a class, there still is no such thing as an abstract class or an interface in JavaScript, and TextTrackCue is essentially an abstract class for browser vendors to leverage. So I don't know of a way for a web developer to successfully extend TextTrackCue in a way that would allow for new to be called on it.

@jpiesing
Copy link
Author

I tested in Internet Explorer and Edge on Windows, and both browsers do allow for new TextTrackCue() to be called, so despite the WHATWG spec and MDN saying TextTrackCue does not have a constructor, it does in those browsers. Since IE and Edge for Windows don't support VTTCue but other browsers do, then

Cue = window.VTTCue || window.TextTrackCue;

handles creating a reference to a cue that works across all major browsers.

Does our spec then need to require that implementations must support at least one of VTTCue or the constructor for TextTrackCue?

@chrisn
Copy link
Member

chrisn commented Aug 21, 2019

Does our spec then need to require that implementations must support at least one of VTTCue or the constructor for TextTrackCue?

This would move the Web Media API Snapshot from being "comprised of references to existing specifications in W3C and other specification groups" towards specifying functionality itself.

@chrisn
Copy link
Member

chrisn commented Aug 21, 2019

As an aside, the proposed DataCue API offers a solution for arbitrary structured timed metadata and timed events, although this is a longer term solution. Input from this community is welcome to help move the proposal forward.

@jpiesing
Copy link
Author

This would move the Web Media API Snapshot from being "comprised of references to existing specifications in W3C and other specification groups" towards specifying functionality itself.

I wouldn't quite put it that way. It would be a conditional reference to part of the WebVTT spec.

@chrisn
Copy link
Member

chrisn commented Aug 21, 2019

That would be a reasonable approach. I was responding more to the specifying a TextTrack constructor part of your question. Of course, the other option there could be to work with WHATWG to add it to the HTML spec.

@jpiesing
Copy link
Author

That would be a reasonable approach. I was responding more to the specifying a TextTrack constructor part of your question. Of course, the other option there could be to work with WHATWG to add it to the HTML spec.

I apologise for giving that impression but was not what I had in mind. If implementations support a TextTrackCue constructor then there may be something we can reference.

@chrisn
Copy link
Member

chrisn commented Aug 21, 2019

Sorry, I wasn't being clear earlier. My assumption is that the only reference for a TextTrack constructor would be an old edition of the HTML spec (I haven't checked to see if there was one that included it), and so having the Web Media API refer to the current HTML spec, as well as the old edition, wouldn't be desirable.

@JohnRiv
Copy link
Member

JohnRiv commented Sep 23, 2019

At the CTA Fall Forum, there was consensus that we will explicitly agree to not include these details in the Web Media API spec.

@JohnRiv JohnRiv closed this as completed Sep 23, 2019
@jpiesing
Copy link
Author

At the CTA Fall Forum, there was consensus that we will explicitly agree to not include these details in the Web Media API spec.

Please can you share some of the reasons for this?

@JohnRiv
Copy link
Member

JohnRiv commented Sep 23, 2019

Please can you share some of the reasons for this?

Yes, sorry for not adding these details originally.

The fact that player libraries today decide between VTTCue or TextTrackCue depending on the browser is an artifact of how browsers have handled cues to date. We already reference the HTML spec in our spec, and the HTML spec includes a section on TextTrackCue, so in a way we're already covered as the HTML spec handles the Text track API.

In 2020 this would be an issue worth revisiting as we see how cues play out over the next year in W3C & WHATWG standards and in browser implementations.

@jpiesing
Copy link
Author

The fact that player libraries today decide between VTTCue or TextTrackCue depending on the browser is an artifact of how browsers have handled cues to date. We already reference the HTML spec in our spec, and the HTML spec includes a section on TextTrackCue, so in a way we're already covered as the HTML spec handles the Text track API.

Unfortunately this doesn't address the issue.
WMAS 2018 references WHATWG HTML.
In WHATWG HTML, TextTrackCue doesn't have a constructor.
Please re-open this issue.

@JohnRiv
Copy link
Member

JohnRiv commented Oct 2, 2019

Reopening...

My apologies, when we originally discussed this we said it may be too complicated to try to integrate into our spec and we should explore options and either consciously include something or consciously decide to not include something related to this... that seems to have shifted to definitely including something, which I'm happy to discuss.

So that we can decide on what that should be, I have captured what I believe to be the current state:

  1. We want the Web Media API Snapshot 2019 to define options for how text track cues should be defined

  2. The way that works in practice is most modern browsers use VTTCue and IE/Non-Chromium-Edge use TextTrackCue. In practice in either case case, the cue is instantiated via the new keyword. This is standard in the case of VTTCue and non-standard in the case of TextTrackCue.

  3. All major JS video player libraries handle statement 2

  4. We don't want to require VTTCue, at least in the 2019 iteration of the spec.

  5. We don't want to reference multiple versions of the HTML5 spec (where TextTrackCue is defined... I'm not sure if an older version ever stated that it could have a constructor, but since we don't want to reference multiple versions of the spec, I won't research if there is a previous version that does)

Given the above, my first thought is that the User Agent Integration Specifications section of our spec would be the correct place to address this, as it is not based on a spec. So we'd end up with something along the lines of:

4.4 Text Track Cues

  • Devices MUST support either the VTTCue interface [webvtt1] or the TextTrackCue interface [HTML]. If VTTCue is not supported, then the TextTrackCue interface MUST support being instantiated via new.

@jpiesing and others, what are your thoughts on the above?

@JohnRiv JohnRiv reopened this Oct 2, 2019
@jpiesing
Copy link
Author

jpiesing commented Oct 3, 2019

Given the above, my first thought is that the User Agent Integration Specifications section of our spec would be the correct place to address this, as it is not based on a spec. So we'd end up with something along the lines of:

@JohnRiv User agent integration was originally for things that someone porting a browser would need to worry about. This is nothing to do with porting the browser - once the choice of browser has been made, the integrator effectively has no choice of which constructor will be supported in that device.

There are no good solutions to this. I would be happy with something under "Media specifications" a little like this ...

Devices MUST support a mechanism to construct instances of TextTrackCue or an interface that inherits from it.
Note: Current user agent implementations meet this requirement either by supporting VTTCue or by supporting a constructor for TextTrackCue that is no longer included in [HTML].

This would avoid problematic normative references :)

@johnluther
Copy link

I think @jpiesing 's language is more accurate, but would another bullet under "Exceptions" for HTML be a better place for it?

@JohnRiv
Copy link
Member

JohnRiv commented Oct 3, 2019

Thanks for the background & suggested text @jpiesing. Better stated than my first attempt :-)

I agree with @johnluther that associating it with HTML would be preferred over Media Specifications because:

  1. The other items we currently mention under "Media Specifications" are all full specs as opposed to a section of a spec
  2. We can reference the text track cue section of the HTML spec, which mentions both TextTrackCue and VTTCue

If we put it in the HTML section of Core web specifications, I'd lean towards it being an additional note rather than an exception though, since every "exception" instance in our spec is to call out a feature that is not yet widely supported, and in our case TextTrackCue is supported across the board, just the UA implementations differ.

I could also see a case where it would make sense as a bullet under Other web specifications since we already mention other features required as part of the HTML specification there.

Based on @jpiesing's text & the above, here are a couple options:

  1. A new bullet point under HTML in Core web specifications between the current Note and the Exceptions:
  • Note: Devices MUST support a mechanism to construct instances of TextTrackCue or an interface that inherits from it.
    • Note: Current user agent implementations meet this requirement either by supporting VTTCue or by supporting a constructor for TextTrackCue that is no longer included in the HTML specification [HTML].
  1. Update the last bullet point under Other web specifications to be:
  • Note: Cross-document messaging, Channel messaging, Web storage, Web workers and text track cue are also required as part of the HTML specification [HTML].
    • Note: Current user agent implementations support a mechanism to construct instances of TextTrackCue or an interface that inherits from it either by supporting VTTCue or by supporting a constructor for TextTrackCue that is no longer included in the HTML specification [HTML].

There may be some additional background around that Other web specifications section that I'm not familiar with, so please feel free to educate me. Right now I'm leaning towards the first option but I'd be happy with either or up for another.

@jpiesing
Copy link
Author

jpiesing commented Oct 5, 2019

* Note: Devices _MUST_ support a mechanism to construct instances of [TextTrackCue](https://html.spec.whatwg.org/multipage/media.html#text-track-cue) or an interface that inherits from it.

@JohnRiv I'm not sure about putting "MUST" in a note. Some places see that as contradictory or even exclude it.

@JohnRiv
Copy link
Member

JohnRiv commented Oct 7, 2019

@JohnRiv I'm not sure about putting "MUST" in a note. Some places see that as contradictory or even exclude it.

Good to know. We could then change option 1 to be either:

  • change "MUST" to "must" (as the other note uses the non-keyword form "Must")
  • change "MUST" to "SHOULD"
  • Remove "Note:"

Or just go with option 2

@johnluther
Copy link

I vote for SHOULD.

@jpiesing
Copy link
Author

jpiesing commented Oct 7, 2019

I think it's important that it's MUST or must so would prefer to remove "Note:".

@JohnRiv
Copy link
Member

JohnRiv commented Oct 8, 2019

I could go either way here... since all UAs do either construct instances of TextTrackCue or an interface that inherits from it, it seems OK to use a "MUST" here... I'm also starting to wonder if our other Note without an all-caps MUST is an issue... how about we address both and update the full HTML bullet points to the following:

@JohnRiv
Copy link
Member

JohnRiv commented Oct 11, 2019

It's been a few days since the last comment and we need to wrap this up next week so, I'm adding the "Call for Consensus" label to this. If there are no objections the copy in my previous comment directly above will be used for the 2019 update.

@jpiesing
Copy link
Author

It's been a few days since the last comment and we need to wrap this up next week so, I'm adding the "Call for Consensus" label to this. If there are no objections the copy in my previous comment directly above will be used for the 2019 update.

Fine by me.

JohnRiv added a commit that referenced this issue Oct 15, 2019
JohnRiv added a commit that referenced this issue Oct 16, 2019
* Added link to Web browsers and other interactive user agents

Took some web searching to find that conformance class, so I figured a
link to it would be helpful for others

* Added info about TextTrackCue and removed "Note" from HTML "MUST"
comments.

This addresses #211
@JohnRiv
Copy link
Member

JohnRiv commented Oct 22, 2019

Closed via #222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants