Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how track buffer ranges are updated. #15

Closed
wolenetz opened this issue Oct 13, 2015 · 14 comments
Closed

Clarify how track buffer ranges are updated. #15

wolenetz opened this issue Oct 13, 2015 · 14 comments
Assignees
Milestone

Comments

@wolenetz
Copy link
Member

Migrated from w3c bugzilla tracker. For history prior to migration, please see:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242

@wolenetz wolenetz self-assigned this Oct 13, 2015
@wolenetz wolenetz added this to the V1 milestone Oct 15, 2015
@jdsmith3000
Copy link
Contributor

We've looked at this some more, and think we might want to reconsider putting this on V.Next. The issue looks like an edge case that would not be encountered in normal streaming media, as in the long frame display example. It seems like a fit on V1 only if a clear change can be defined for it soon.

@wolenetz
Copy link
Member Author

There were multiple issues identified in the previous w3c bug that still need to be broken out into distinct spec issues (which I'll do as part of follow-up). See especially acolwell's response in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242#c5.

With respect to the original issue (which demonstrated lack of spec clarity around what should buffered show for IPBBB.. when not all of the B frames have been appended yet), I believe a much of the direction was provided in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27242#c2. @jdsmith3000, to which example were you referring to with "the long frame display example"?

@wolenetz
Copy link
Member Author

Action on me: split this into multiple (one covering the original, with direction of fix outlined in bugzilla's comment 2, and multiple outlined in bugzilla's comment 5.)

@dmlap
Copy link

dmlap commented Mar 2, 2016

We've run into a scenario that I don't think is completely captured in the bugzilla issue but seems very closely related so I'll record it here.

The current extensions to HTMLMediaElement.buffered specify the buffered attribute return the intersection of the buffered TimeRanges of all active SourceBuffers when a MediaSource is being used. In some cases, this doesn't align with the goal behavior described in the bugzilla issue:

I think the principle should be that if the playback would stall at some point, then this would be indicated as a gap in the buffer ranges, but if playback would continue then there should be no gap.

Given the sampling rates for audio and video are different, it's often the case that audio starts a short while after the video, or vice versa. In those cases, the user agent reports a non-zero starting time for the first buffered time range at the Media Element level, even when that gap would be happily ignored for the purposes of playback. This can cause confusion for application code that attempts to be smart and backfill gaps that appear in the buffer.

@wolenetz
Copy link
Member Author

I believe at least some minimum clarification for how a jagged-starting stream's (audio and video tracks starting slightly offset from each other) buffered TimeRanges are calculated for each of two cases:

  1. muxed AV tracks: the coded frame group start time should be used as the start time for the muxed AV tracks' buffered ranges
  2. A and V in distinct SourceBuffers: In practice, these commonly don't both start at precisely PTS 0, rather there is some jaggedness. At least a non-normative note needs to be added to describe acceptable implementation choices for how much of the "jaggedness" is allowed while retaining the buffered range starting at 0. FWIW, in Chrome, we allow 1 second maximum.

I believe addressing these specific concerns is appropriately scoped to V1, and does not require breaking this out into multiple issues (in V1 time-frame).

@wolenetz
Copy link
Member Author

For clarity, I think this falls into V1 per step 6 of the triage process currently.

@wolenetz
Copy link
Member Author

#15 (comment) tracks the approach planned for solving this issue. In addition, should we consider formalizing what to do in the following case: ?
3) muxed AV tracks overlapping the end of a previously appended set of muxed AV frames: should the new coded frame group start time should be used as the overlap-removal start time in the coded frame processing algorithm step 14's first case ("If highest end timestamp for track buffer is not set") instead of presentation timestamp?
Note that we currently have no 'segments' appendMode memory of "coded frame group start time", necessary for fixing 1) and 3). I think we might be able to re-use "group start time", so long as we also update that variable appropriately in 'segments' appendMode.
@jdsmith3000 Before I prepare a PR for these, what do you think? 3) is less important to clarify, I think (though it's how Chrome manages this), since it's quite an edge case. 1) and especially 2) are more commonly encountered in media, so clarifying their behavior in the spec will help interoperability.

@wolenetz
Copy link
Member Author

@jdsmith3000 friendly ping :)

@jdsmith3000
Copy link
Contributor

I need to confirm details of this with our Media team. I've discussed it with them and they've committed to a response by Friday. I'll comment again then.

@wolenetz
Copy link
Member Author

Thanks @jdsmith3000 . I look forward to a response Friday so we can get this fixed ASAP.

@jdsmith3000
Copy link
Contributor

It appears these issues arise because MSE cannot describe the buffered ranges accurately for a sourceBuffer with muxed AV content because there is a single buffered range for the sourceBuffer even though it may contain multiple tracks. In practice, if apps are using the buffered range as an indicator of fullness to control buffering logic, this is not a problem. However, if apps are trying to perform frame accurate edits, issues like these will occur. That's what's behind my previous comment that these issues do not seem like V1 topics.

On the three issues you highlight:

  1. This sounds like you mean “the earliest coded frame group start time across all tracks in the sourceBuffer”. Baring a larger MSE change to describe separate buffered ranges for each track within a sourceBuffer, this seems a sensible clarification and is simple to make. It should consider not just the start time, but the end of the range may not line up precisely across the tracks as well. For muxed AV content in a single sourceBuffer, there are multiple track buffers, acknowledged in step 5: "Let track buffer equal the track buffer that the coded frame will be added to."
  2. This might be resolved with a non-normative clarification about implementations choosing to allow playback when there is a “jagged” start. We don't recommend making the buffered range start at 0 in that case as the app should know the actual buffered range. In our implementation we report the actual buffered range, but we do allow playback to start (without pausing to buffer) if there is up to a 1 second gap.
  3. It would make sense to use separate coded frame group start times from each of the tracks, perform the removal on that basis, and then re-determine the overall buffered range. Otherwise, if you pick a single timestamp to remove across all tracks, and then append the new data, you may end up with a gap in one of the tracks. That seems like a bigger change than we want at this stage. Since this is an edge case, it seems like one we should postpone.

Also, since “sequence” mode is mentioned, here are some thoughts on it. “Sequence” append mode doesn't work well for muxed AV content in a single sourceBuffer due to the same issues described above, and the fact that there is only one timestampOffset that would get applied to all tracks even though appending a single MEDIA segment will result in slightly different durations appended for each track within the sourceBuffer. Audio and video durations just do not align precisely.

That shouldn't matter in practice. “Sequence” append mode should ideally not be used for muxed AV content, and should only be used for appending elementary audio streams. We might want to consider limiting “sequence” append mode to audio elementary stream formats.

@paulbrucecotton
Copy link

@wolenetz: Jerry has provided his input here. What are the next steps here?

@wolenetz
Copy link
Member Author

wolenetz commented May 2, 2016

@paulbrucecotton I'll produce a PR shortly. I think @jdsmith3000's response to my #15 (comment) is pretty much what I was thinking.

Regarding the 'sequence' appendMode clarifications, those are out of scope of this issue, and I'm not sure they need clarification in the MSE spec. Note that 'sequence' appendMode relies on the same discontinuity detection logic as 'segments' appendMode, though 'sequence' uses it to adjust timestampOffset. In 'sequence' appendMode, certainly there could be problems with muxed AV, especially audio, where it might not play "continuously" and align with the resequenced timestamps since there is allowance in the discontinuity detection logic for some small gap. Likewise, I've seen 'sequence' appendMode most applicable in single-stream SourceBuffers, especially audio, for things like gapless playback. In fact, audio/aac and audio/mpeg MSE bytestreams require 'sequence' mode behavior. @jdsmith3000 perhaps clarifying with some non-normative note our thoughts/warnings about the probably playback experience for muxed AV in 'sequence' appendMode might qualify as a distinct V1NonBlocking spec issue : please file one if you agree.

@guest271314
Copy link

@jdsmith3000

Also, since “sequence” mode is mentioned, here are some thoughts on it. “Sequence” append mode doesn't work well for muxed AV content in a single sourceBuffer due to the same issues described above, and the fact that there is only one timestampOffset that would get applied to all tracks even though appending a single MEDIA segment will result in slightly different durations appended for each track within the sourceBuffer. Audio and video durations just do not align precisely.

That shouldn't matter in practice. “Sequence” append mode should ideally not be used for muxed AV content, and should only be used for appending elementary audio streams. We might want to consider limiting “sequence” append mode to audio elementary stream formats.

Have been able to use "sequence" mode to play multiple media tracks in sequence at both Firefox and Chromium. Using "segments" mode with same code does not render expected result at Chromium, only one second of media playback is rendered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants