Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text track cue event timing accuracy #5306

Closed
chrisn opened this issue Feb 19, 2020 · 2 comments
Closed

Text track cue event timing accuracy #5306

chrisn opened this issue Feb 19, 2020 · 2 comments
Labels
clarification Standard could be clearer editorial Changes that do not affect how the standard is understood. topic: media

Comments

@chrisn
Copy link
Member

chrisn commented Feb 19, 2020

The timing accuracy of TextTrackCue enter and exit events, and TextTrack cuechange events, with respect to the media timeline, is important for applications that render web content that is intended to be synchronised to audio or video media playback in some way.

An example use case (from here): a subtitle or caption author wants ensure that subtitle changes are aligned as closely as possible to shot changes in the video. At a typical frame rate of 25 frames per second, this is 40 milliseconds per frame, and so the web app would need to receive the cue enter or exit event within 20 milliseconds to be able to respond and render in time.

In practice we see some variation between user agents in how often the time marches on steps are run. For example, in Chromium, the steps are run at the lowest rate specified for timeupdate, i.e., at most every 250 milliseconds during media playback. As a consequence, web apps that need greater timing accuracy must run a timer or rAF loop and poll the media element’s currentTime, which is power intensive. (Implementation work is now in progress to improve the timing accuracy in Chromium.)

Proposals

Add a note indicating timing expectation

To clarify expectations for content authors, and to help implementation consistency, we would like to add a (non-normative) note before the time marches on steps, after the paragraph that begins “When the current playback position of a media element changes”:

To support use cases that depend on the timing accuracy of cue event firing, such as synchronizing captions with shot changes in a video, user agents should fire cue events as close as possible to their position on the media timeline, and ideally within 20 milliseconds.

Clarify wording relating to missed cues

The spec also says:

(These steps are thus run as often as possible or needed — if one iteration takes a long time, this can cause certain cues to be skipped over as the user agent rushes ahead to "catch up".)

My understanding of the time marches on steps is that it guarantees that the enter and exit events of cues that have been skipped between successive execution of time marches on will be fired (this is described in the steps involving missed cues, so the above only applies to activeCues during a cuechange event. So I would like to propose this change:

(These steps are thus run as often as possible or needed — if one iteration takes a long time, this can cause certain cues to be skipped over as the user agent rushes ahead to "catch up", and so will not appear in the activeCues list during a cuechange event handler.)

I’d be happy to draft a pull request for review if there’s implementer interest.

Raising this issue following discussion in a breakout session at TPAC 2019.

cc @foolip @nigelmegitt

@foolip foolip added topic: media clarification Standard could be clearer editorial Changes that do not affect how the standard is understood. labels Feb 20, 2020
@foolip
Copy link
Member

foolip commented Feb 20, 2020

This clarification sounds great to me. The spec as written requires a fairly detailed understanding of how the time marches on steps run more often than the "timeupdate" event, and the fact that it's not sensible to really implement it by running the steps as often as possible means a choice has to be made. Non-normative guidance on this would be useful, or we could even make it a should.

cc @whatwg/media @eric-carlson

@chrisn
Copy link
Member Author

chrisn commented May 26, 2020

This was discussed on the W3C Media Working Group call, 12 May 2020, notes here.

chrisn added a commit to chrisn/html that referenced this issue Jul 1, 2020
Added text to set non-normative timing accuracy expectation for
text track cue events.

Resolves whatwg#5306
@domenic domenic closed this as completed in 22d409a Jul 9, 2020
mfreed7 pushed a commit to mfreed7/html that referenced this issue Sep 11, 2020
This sets a timing accuracy expectation for text track cue events.

Closes whatwg#5306. See also
https://www.w3.org/2020/05/12-mediawg-minutes.html#t05.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer editorial Changes that do not affect how the standard is understood. topic: media
Development

No branches or pull requests

2 participants