Add Support for Media-Encoded Events #189

tinskip · 2017-08-22T23:11:50Z

There are various types of events / cue points which may be encoded into media containers. Examples of these are MPEG-DASH 'emsg' boxes, and SCTE-35 'tones' in MPEG2-TS. Processing of media-embedded captions/subtitles might be another use case. Currently player applications have to parse media streams in order to be aware of these events, and retrieve the data encapsulated within them, which is inefficient at best. With the advent of MSE, the player app should not have to be aware of media container internals.

Adding support for these events in MSE would probably be best in the form of JavaScript events. Perhaps adding the ability to register handlers for specific events. Ideally the data sent to the handler would be parsed into some type of event-specific message, or perhaps a dictionary. But even receiving just the raw box data would be an improvement.

dwsinger · 2017-08-22T23:18:43Z

if we do this, we should probably unify this support with the support for handling text tracks with kind=metadata, as they are similar ways to handle things (text tracks are better for representing states that change, emsg and the like are better for handling events that happen)

wilaw · 2018-08-14T23:21:17Z

Voicing support for this feature request on behalf of the DASH Industry Forum, Akamai and the CTA WAVE project. WAVE specifically is interested in establishing a reliable in-band messaging workflow around EMSG with CMAF.

With MSEv1, JS players must parse incoming segments to look for embedded EMSG boxes. We would like a cleaner implementation in which the SourceBuffer performed all box parsing operations (since it is already parsing the incoming segments), freeing the JS application to manage only the logic of handling the events.

Will Law

nigelmegitt · 2020-06-10T08:42:30Z

This was also discussed at TPAC 2019 when I raised the idea of exposing subtitles/captions through MSE, and as I recall @mwatson2 was also enthusiastic.

chrisn · 2020-07-26T15:22:24Z

The DataCue proposal (see explainer) in WICG intends to support the DASH emsg and SCTE-35 use cases, as an extension of the existing timed text track support in HTML.

mwatson2 · 2020-09-21T16:37:31Z

Is this feature request addressed by the DataCue proposal ?

chrisn · 2020-09-22T09:09:17Z

Is this feature request addressed by the DataCue proposal ?

In part, yes. DASH emsg is part of DataCue, but media-embedded captions/subtitles are not.

chrisn · 2020-09-22T10:29:47Z

Exposing subtitles/captions through MSE is a proposed topic for the upcoming joint meeting on October 15 between Timed Text WG, Media & Entertainment IG, and Media WG (agenda).

JohnRiv · 2020-09-22T17:50:43Z

This is still of interest to CTA WAVE for MSE v2.

wolenetz · 2020-09-28T21:20:58Z

We need further information and concrete proposal for how to expose subtitles/captions through MSE (along with DataCue support as well for EMSG). Please assist editors in this regard.

wolenetz · 2020-10-14T21:30:55Z

In more detail, relative to these slides expected to be discussed at TPAC tomorrow, what is the precise mapping of the content in emsg to the content in the proposed DataCue?

In particular (and not limited to), what must a UA do to determine interoperably the following when encountering an emsg (note that emsg is a top-level box, at least as presented in those slides, and in the CMAF-specific version in those slides, there can be any number of emsg associated with a CMAF chunk (which is proposed to be a CMAF media segment). If emsg processing is specifically only supported in that scenario, the PTS delta I suppose could be determined, though since none of the preceding top-level CMAF chunk boxes are required at cardinality >= 1, one or more emsg could be partially appended by the JS app before the UA's MSE implementation recognizes that it is "parsing a media segment" and restricts MSE operations like setTimestampOffset. Therefore, it could be nondeterministic what timestamp offfset is applied to the emsg PTS delta to determine the start/end times in the generated DataCue.

irajs · 2020-10-15T00:22:50Z

In particular (and not limited to), what must a UA do to determine interoperably the following when encountering an emsg (note that emsg is a top-level box, at least as presented in those slides, and in the CMAF-specific version in those slides, there can be any number of emsg associated with a CMAF chunk (which is proposed to be a CMAF media segment). If emsg processing is specifically only supported in that scenario, the PTS delta I suppose could be determined, though since none of the preceding top-level CMAF chunk boxes are required at cardinality >= 1, one or more emsg could be partially appended by the JS app before the UA's MSE implementation recognizes that it is "parsing a media segment" and restricts MSE operations like setTimestampOffset. Therefore, it could be nondeterministic what timestamp offfset is applied to the emsg PTS delta to determine the start/end times in the generated DataCue.

The emsg has an start time value in its body. So while the parsing time of emsg may vary, since the earliest presentation time of the chunk carrying the emsg (for emsg v0) or the earliest time of media presentation (for esmg v1) is known (those information should be available for the MSE implementation as it parses the chunk that carries MSE and have the media presentation time start), then the start and end time of DataCue can be precisely calculated by the MSE implementation.

technogeek00 · 2020-10-17T08:37:38Z

@wolenetz Following up from the TPAC conversation with respect to emsg timing relation and MSE API functionality:

In both the V0 (time relative to segment/chunk) and V1 (presentation timeline fixed) cases I concur the application layer adjustment of the timestamp offset will need to be properly taken into account to ensure the execution of media time relative event surfacing
Since emsg can be arbitrarily placed from a generic ISO BMFF profile perspective, this may be a case where a stricter UA processing guidelines for CMAF profiles of ISO BMFF could be utilized to guarantee behavior consistency. i.e. if an init segment appended is recognized by the UA as having CMAF conforming signals, it would then be able to enforce guarantees of delivery based on well-formed structures. Alternatively you could just call the behavior of non-conforming streams unspecified, not sure how the approach is typically taken.
Assuming we can signal restricted processing to the UA as part of the MSE initialization, it can then depend on the emsg timing being related to the timing of the next appended moof
If segment appention were to be done such that the time the emsg is mapped to is spliced out, I concur that the emsg would be removed as part of that splice and not surfaced, expecting anything else is too inconsistent.

dwsinger · 2020-10-19T18:59:20Z

FYI, MPEG is (still) working on a spec. to put events into tracks, with media-related timing, etc.

chrisn · 2022-09-30T14:51:24Z

Discussion at TPAC 2022 (minutes):

The Chromium implementation for even in-band text has been removed from MSE. Implementation would be non trivial
Pull requests against the MSE or bytestream format specs would be helpful

chrisn · 2022-09-30T16:24:54Z

This topic has been discussed in DASH-IF Events Task Force, from 30 Sep 2022 meeting: More implementation experience using player libraries such as DASH.js is needed before being ready to pursue MSE integration.

mwatson2 added feature request agenda Topic should be discussed in a group call labels Sep 21, 2020

mwatson2 modified the milestone: Backlog Sep 21, 2020

wolenetz mentioned this issue Sep 21, 2020

ISO BMFF bytestream: how can CEA 608 / 708 embedding be supported? #58

Open

chrisn mentioned this issue Sep 22, 2020

Allow non-ISO/IEC14496-12 top-level boxes in ISOBMFF Byte Streams #174

Open

mwatson2 removed the agenda Topic should be discussed in a group call label Sep 28, 2020

wolenetz added the question label Sep 28, 2020

wolenetz added this to the Backlog milestone Sep 28, 2020

chrisn mentioned this issue Nov 2, 2020

Mapping of emsg boxes to the media timeline WICG/datacue#26

Open

wolenetz added the TPAC-2022-discussion Marked for discussion at TPAC 2022 Media WG meeting Sep 16 label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Media-Encoded Events #189

Add Support for Media-Encoded Events #189

tinskip commented Aug 22, 2017

dwsinger commented Aug 22, 2017

wilaw commented Aug 14, 2018

nigelmegitt commented Jun 10, 2020

chrisn commented Jul 26, 2020

mwatson2 commented Sep 21, 2020

chrisn commented Sep 22, 2020

chrisn commented Sep 22, 2020

JohnRiv commented Sep 22, 2020

wolenetz commented Sep 28, 2020

wolenetz commented Oct 14, 2020

irajs commented Oct 15, 2020 •

edited

technogeek00 commented Oct 17, 2020

dwsinger commented Oct 19, 2020

chrisn commented Sep 30, 2022

chrisn commented Sep 30, 2022

Add Support for Media-Encoded Events #189

Add Support for Media-Encoded Events #189

Comments

tinskip commented Aug 22, 2017

dwsinger commented Aug 22, 2017

wilaw commented Aug 14, 2018

nigelmegitt commented Jun 10, 2020

chrisn commented Jul 26, 2020

mwatson2 commented Sep 21, 2020

chrisn commented Sep 22, 2020

chrisn commented Sep 22, 2020

JohnRiv commented Sep 22, 2020

wolenetz commented Sep 28, 2020

wolenetz commented Oct 14, 2020

irajs commented Oct 15, 2020 • edited

technogeek00 commented Oct 17, 2020

dwsinger commented Oct 19, 2020

chrisn commented Sep 30, 2022

chrisn commented Sep 30, 2022

irajs commented Oct 15, 2020 •

edited