Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Media-Encoded Events #189

Open
tinskip opened this issue Aug 22, 2017 · 15 comments
Open

Add Support for Media-Encoded Events #189

tinskip opened this issue Aug 22, 2017 · 15 comments
Labels
feature request question TPAC-2022-discussion Marked for discussion at TPAC 2022 Media WG meeting Sep 16
Milestone

Comments

@tinskip
Copy link

tinskip commented Aug 22, 2017

There are various types of events / cue points which may be encoded into media containers. Examples of these are MPEG-DASH 'emsg' boxes, and SCTE-35 'tones' in MPEG2-TS. Processing of media-embedded captions/subtitles might be another use case. Currently player applications have to parse media streams in order to be aware of these events, and retrieve the data encapsulated within them, which is inefficient at best. With the advent of MSE, the player app should not have to be aware of media container internals.

Adding support for these events in MSE would probably be best in the form of JavaScript events. Perhaps adding the ability to register handlers for specific events. Ideally the data sent to the handler would be parsed into some type of event-specific message, or perhaps a dictionary. But even receiving just the raw box data would be an improvement.

@dwsinger
Copy link

if we do this, we should probably unify this support with the support for handling text tracks with kind=metadata, as they are similar ways to handle things (text tracks are better for representing states that change, emsg and the like are better for handling events that happen)

@wilaw
Copy link

wilaw commented Aug 14, 2018

Voicing support for this feature request on behalf of the DASH Industry Forum, Akamai and the CTA WAVE project. WAVE specifically is interested in establishing a reliable in-band messaging workflow around EMSG with CMAF.

With MSEv1, JS players must parse incoming segments to look for embedded EMSG boxes. We would like a cleaner implementation in which the SourceBuffer performed all box parsing operations (since it is already parsing the incoming segments), freeing the JS application to manage only the logic of handling the events.

Will Law

@nigelmegitt
Copy link

This was also discussed at TPAC 2019 when I raised the idea of exposing subtitles/captions through MSE, and as I recall @mwatson2 was also enthusiastic.

@chrisn
Copy link
Member

chrisn commented Jul 26, 2020

The DataCue proposal (see explainer) in WICG intends to support the DASH emsg and SCTE-35 use cases, as an extension of the existing timed text track support in HTML.

@mwatson2 mwatson2 added feature request agenda Topic should be discussed in a group call labels Sep 21, 2020
@mwatson2 mwatson2 modified the milestone: Backlog Sep 21, 2020
@mwatson2
Copy link
Contributor

Is this feature request addressed by the DataCue proposal ?

@chrisn
Copy link
Member

chrisn commented Sep 22, 2020

Is this feature request addressed by the DataCue proposal ?

In part, yes. DASH emsg is part of DataCue, but media-embedded captions/subtitles are not.

@chrisn
Copy link
Member

chrisn commented Sep 22, 2020

Exposing subtitles/captions through MSE is a proposed topic for the upcoming joint meeting on October 15 between Timed Text WG, Media & Entertainment IG, and Media WG (agenda).

@JohnRiv
Copy link
Member

JohnRiv commented Sep 22, 2020

This is still of interest to CTA WAVE for MSE v2.

@mwatson2 mwatson2 removed the agenda Topic should be discussed in a group call label Sep 28, 2020
@wolenetz wolenetz added this to the Backlog milestone Sep 28, 2020
@wolenetz
Copy link
Member

We need further information and concrete proposal for how to expose subtitles/captions through MSE (along with DataCue support as well for EMSG). Please assist editors in this regard.

@wolenetz
Copy link
Member

In more detail, relative to these slides expected to be discussed at TPAC tomorrow, what is the precise mapping of the content in emsg to the content in the proposed DataCue?

In particular (and not limited to), what must a UA do to determine interoperably the following when encountering an emsg (note that emsg is a top-level box, at least as presented in those slides, and in the CMAF-specific version in those slides, there can be any number of emsg associated with a CMAF chunk (which is proposed to be a CMAF media segment). If emsg processing is specifically only supported in that scenario, the PTS delta I suppose could be determined, though since none of the preceding top-level CMAF chunk boxes are required at cardinality >= 1, one or more emsg could be partially appended by the JS app before the UA's MSE implementation recognizes that it is "parsing a media segment" and restricts MSE operations like setTimestampOffset. Therefore, it could be nondeterministic what timestamp offfset is applied to the emsg PTS delta to determine the start/end times in the generated DataCue.

@irajs
Copy link

irajs commented Oct 15, 2020

In particular (and not limited to), what must a UA do to determine interoperably the following when encountering an emsg (note that emsg is a top-level box, at least as presented in those slides, and in the CMAF-specific version in those slides, there can be any number of emsg associated with a CMAF chunk (which is proposed to be a CMAF media segment). If emsg processing is specifically only supported in that scenario, the PTS delta I suppose could be determined, though since none of the preceding top-level CMAF chunk boxes are required at cardinality >= 1, one or more emsg could be partially appended by the JS app before the UA's MSE implementation recognizes that it is "parsing a media segment" and restricts MSE operations like setTimestampOffset. Therefore, it could be nondeterministic what timestamp offfset is applied to the emsg PTS delta to determine the start/end times in the generated DataCue.

The emsg has an start time value in its body. So while the parsing time of emsg may vary, since the earliest presentation time of the chunk carrying the emsg (for emsg v0) or the earliest time of media presentation (for esmg v1) is known (those information should be available for the MSE implementation as it parses the chunk that carries MSE and have the media presentation time start), then the start and end time of DataCue can be precisely calculated by the MSE implementation.

@technogeek00
Copy link

@wolenetz Following up from the TPAC conversation with respect to emsg timing relation and MSE API functionality:

  • In both the V0 (time relative to segment/chunk) and V1 (presentation timeline fixed) cases I concur the application layer adjustment of the timestamp offset will need to be properly taken into account to ensure the execution of media time relative event surfacing
  • Since emsg can be arbitrarily placed from a generic ISO BMFF profile perspective, this may be a case where a stricter UA processing guidelines for CMAF profiles of ISO BMFF could be utilized to guarantee behavior consistency. i.e. if an init segment appended is recognized by the UA as having CMAF conforming signals, it would then be able to enforce guarantees of delivery based on well-formed structures. Alternatively you could just call the behavior of non-conforming streams unspecified, not sure how the approach is typically taken.
  • Assuming we can signal restricted processing to the UA as part of the MSE initialization, it can then depend on the emsg timing being related to the timing of the next appended moof
  • If segment appention were to be done such that the time the emsg is mapped to is spliced out, I concur that the emsg would be removed as part of that splice and not surfaced, expecting anything else is too inconsistent.

@dwsinger
Copy link

FYI, MPEG is (still) working on a spec. to put events into tracks, with media-related timing, etc.

@wolenetz wolenetz added the TPAC-2022-discussion Marked for discussion at TPAC 2022 Media WG meeting Sep 16 label Sep 16, 2022
@chrisn
Copy link
Member

chrisn commented Sep 30, 2022

Discussion at TPAC 2022 (minutes):

  • The Chromium implementation for even in-band text has been removed from MSE. Implementation would be non trivial
  • Pull requests against the MSE or bytestream format specs would be helpful

@chrisn
Copy link
Member

chrisn commented Sep 30, 2022

This topic has been discussed in DASH-IF Events Task Force, from 30 Sep 2022 meeting: More implementation experience using player libraries such as DASH.js is needed before being ready to pursue MSE integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request question TPAC-2022-discussion Marked for discussion at TPAC 2022 Media WG meeting Sep 16
Projects
None yet
Development

No branches or pull requests

10 participants