Clarify how track buffer ranges are updated. #15

wolenetz · 2015-10-13T22:45:59Z

Migrated from w3c bugzilla tracker. For history prior to migration, please see:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242

jdsmith3000 · 2015-10-19T22:57:40Z

We've looked at this some more, and think we might want to reconsider putting this on V.Next. The issue looks like an edge case that would not be encountered in normal streaming media, as in the long frame display example. It seems like a fit on V1 only if a clear change can be defined for it soon.

wolenetz · 2015-10-28T21:21:32Z

There were multiple issues identified in the previous w3c bug that still need to be broken out into distinct spec issues (which I'll do as part of follow-up). See especially acolwell's response in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242#c5.

With respect to the original issue (which demonstrated lack of spec clarity around what should buffered show for IPBBB.. when not all of the B frames have been appended yet), I believe a much of the direction was provided in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242#c2. @jdsmith3000, to which example were you referring to with "the long frame display example"?

wolenetz · 2015-10-29T01:55:47Z

Action on me: split this into multiple (one covering the original, with direction of fix outlined in bugzilla's comment 2, and multiple outlined in bugzilla's comment 5.)

dmlap · 2016-03-02T23:00:20Z

We've run into a scenario that I don't think is completely captured in the bugzilla issue but seems very closely related so I'll record it here.

The current extensions to HTMLMediaElement.buffered specify the buffered attribute return the intersection of the buffered TimeRanges of all active SourceBuffers when a MediaSource is being used. In some cases, this doesn't align with the goal behavior described in the bugzilla issue:

I think the principle should be that if the playback would stall at some point, then this would be indicated as a gap in the buffer ranges, but if playback would continue then there should be no gap.

Given the sampling rates for audio and video are different, it's often the case that audio starts a short while after the video, or vice versa. In those cases, the user agent reports a non-zero starting time for the first buffered time range at the Media Element level, even when that gap would be happily ignored for the purposes of playback. This can cause confusion for application code that attempts to be smart and backfill gaps that appear in the buffer.

wolenetz · 2016-03-16T22:38:10Z

I believe at least some minimum clarification for how a jagged-starting stream's (audio and video tracks starting slightly offset from each other) buffered TimeRanges are calculated for each of two cases:

muxed AV tracks: the coded frame group start time should be used as the start time for the muxed AV tracks' buffered ranges
A and V in distinct SourceBuffers: In practice, these commonly don't both start at precisely PTS 0, rather there is some jaggedness. At least a non-normative note needs to be added to describe acceptable implementation choices for how much of the "jaggedness" is allowed while retaining the buffered range starting at 0. FWIW, in Chrome, we allow 1 second maximum.

I believe addressing these specific concerns is appropriately scoped to V1, and does not require breaking this out into multiple issues (in V1 time-frame).

wolenetz · 2016-03-16T22:45:03Z

For clarity, I think this falls into V1 per step 6 of the triage process currently.

wolenetz · 2016-04-12T02:38:18Z

#15 (comment) tracks the approach planned for solving this issue. In addition, should we consider formalizing what to do in the following case: ?
3) muxed AV tracks overlapping the end of a previously appended set of muxed AV frames: should the new coded frame group start time should be used as the overlap-removal start time in the coded frame processing algorithm step 14's first case ("If highest end timestamp for track buffer is not set") instead of presentation timestamp?
Note that we currently have no 'segments' appendMode memory of "coded frame group start time", necessary for fixing 1) and 3). I think we might be able to re-use "group start time", so long as we also update that variable appropriately in 'segments' appendMode.
@jdsmith3000 Before I prepare a PR for these, what do you think? 3) is less important to clarify, I think (though it's how Chrome manages this), since it's quite an edge case. 1) and especially 2) are more commonly encountered in media, so clarifying their behavior in the spec will help interoperability.

wolenetz · 2016-04-13T17:37:33Z

@jdsmith3000 friendly ping :)

jdsmith3000 · 2016-04-13T21:44:43Z

I need to confirm details of this with our Media team. I've discussed it with them and they've committed to a response by Friday. I'll comment again then.

wolenetz · 2016-04-13T22:11:32Z

Thanks @jdsmith3000 . I look forward to a response Friday so we can get this fixed ASAP.

jdsmith3000 · 2016-04-15T23:15:31Z

It appears these issues arise because MSE cannot describe the buffered ranges accurately for a sourceBuffer with muxed AV content because there is a single buffered range for the sourceBuffer even though it may contain multiple tracks. In practice, if apps are using the buffered range as an indicator of fullness to control buffering logic, this is not a problem. However, if apps are trying to perform frame accurate edits, issues like these will occur. That's what's behind my previous comment that these issues do not seem like V1 topics.

On the three issues you highlight:

This sounds like you mean “the earliest coded frame group start time across all tracks in the sourceBuffer”. Baring a larger MSE change to describe separate buffered ranges for each track within a sourceBuffer, this seems a sensible clarification and is simple to make. It should consider not just the start time, but the end of the range may not line up precisely across the tracks as well. For muxed AV content in a single sourceBuffer, there are multiple track buffers, acknowledged in step 5: "Let track buffer equal the track buffer that the coded frame will be added to."
This might be resolved with a non-normative clarification about implementations choosing to allow playback when there is a “jagged” start. We don't recommend making the buffered range start at 0 in that case as the app should know the actual buffered range. In our implementation we report the actual buffered range, but we do allow playback to start (without pausing to buffer) if there is up to a 1 second gap.
It would make sense to use separate coded frame group start times from each of the tracks, perform the removal on that basis, and then re-determine the overall buffered range. Otherwise, if you pick a single timestamp to remove across all tracks, and then append the new data, you may end up with a gap in one of the tracks. That seems like a bigger change than we want at this stage. Since this is an edge case, it seems like one we should postpone.

Also, since “sequence” mode is mentioned, here are some thoughts on it. “Sequence” append mode doesn't work well for muxed AV content in a single sourceBuffer due to the same issues described above, and the fact that there is only one timestampOffset that would get applied to all tracks even though appending a single MEDIA segment will result in slightly different durations appended for each track within the sourceBuffer. Audio and video durations just do not align precisely.

That shouldn't matter in practice. “Sequence” append mode should ideally not be used for muxed AV content, and should only be used for appending elementary audio streams. We might want to consider limiting “sequence” append mode to audio elementary stream formats.

paulbrucecotton · 2016-04-22T05:39:40Z

@wolenetz: Jerry has provided his input here. What are the next steps here?

wolenetz · 2016-05-02T20:42:36Z

@paulbrucecotton I'll produce a PR shortly. I think @jdsmith3000's response to my #15 (comment) is pretty much what I was thinking.

Regarding the 'sequence' appendMode clarifications, those are out of scope of this issue, and I'm not sure they need clarification in the MSE spec. Note that 'sequence' appendMode relies on the same discontinuity detection logic as 'segments' appendMode, though 'sequence' uses it to adjust timestampOffset. In 'sequence' appendMode, certainly there could be problems with muxed AV, especially audio, where it might not play "continuously" and align with the resequenced timestamps since there is allowance in the discontinuity detection logic for some small gap. Likewise, I've seen 'sequence' appendMode most applicable in single-stream SourceBuffers, especially audio, for things like gapless playback. In fact, audio/aac and audio/mpeg MSE bytestreams require 'sequence' mode behavior. @jdsmith3000 perhaps clarifying with some non-normative note our thoughts/warnings about the probably playback experience for muxed AV in 'sequence' appendMode might qualify as a distinct V1NonBlocking spec issue : please file one if you agree.

guest271314 · 2017-09-22T04:29:37Z

@jdsmith3000

Also, since “sequence” mode is mentioned, here are some thoughts on it. “Sequence” append mode doesn't work well for muxed AV content in a single sourceBuffer due to the same issues described above, and the fact that there is only one timestampOffset that would get applied to all tracks even though appending a single MEDIA segment will result in slightly different durations appended for each track within the sourceBuffer. Audio and video durations just do not align precisely.

That shouldn't matter in practice. “Sequence” append mode should ideally not be used for muxed AV content, and should only be used for appending elementary audio streams. We might want to consider limiting “sequence” append mode to audio elementary stream formats.

Have been able to use "sequence" mode to play multiple media tracks in sequence at both Firefox and Chromium. Using "segments" mode with same code does not render expected result at Chromium, only one second of media playback is rendered.

wolenetz self-assigned this Oct 13, 2015

wolenetz added the needs follow-up label Oct 15, 2015

wolenetz added this to the V1 milestone Oct 15, 2015

jdsmith3000 closed this as completed Oct 19, 2015

jdsmith3000 reopened this Oct 19, 2015

wolenetz added needs implementation and removed needs follow-up labels Mar 29, 2016

wolenetz mentioned this issue Apr 14, 2016

CTS vs DTS, which is correct? #54

Closed

wolenetz closed this as completed in fb01530 May 12, 2016

tidoust mentioned this issue Jun 23, 2016

"starting presentation timestamp" in duration change algorithm text is confusing. Drop "starting". #102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify how track buffer ranges are updated. #15

Clarify how track buffer ranges are updated. #15

wolenetz commented Oct 13, 2015

jdsmith3000 commented Oct 19, 2015

wolenetz commented Oct 28, 2015

wolenetz commented Oct 29, 2015

dmlap commented Mar 2, 2016

wolenetz commented Mar 16, 2016

wolenetz commented Mar 16, 2016

wolenetz commented Apr 12, 2016

wolenetz commented Apr 13, 2016

jdsmith3000 commented Apr 13, 2016

wolenetz commented Apr 13, 2016

jdsmith3000 commented Apr 15, 2016

paulbrucecotton commented Apr 22, 2016

wolenetz commented May 2, 2016

guest271314 commented Sep 22, 2017

Clarify how track buffer ranges are updated. #15

Clarify how track buffer ranges are updated. #15

Comments

wolenetz commented Oct 13, 2015

jdsmith3000 commented Oct 19, 2015

wolenetz commented Oct 28, 2015

wolenetz commented Oct 29, 2015

dmlap commented Mar 2, 2016

wolenetz commented Mar 16, 2016

wolenetz commented Mar 16, 2016

wolenetz commented Apr 12, 2016

wolenetz commented Apr 13, 2016

jdsmith3000 commented Apr 13, 2016

wolenetz commented Apr 13, 2016

jdsmith3000 commented Apr 15, 2016

paulbrucecotton commented Apr 22, 2016

wolenetz commented May 2, 2016

guest271314 commented Sep 22, 2017