Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing duration change consequences and inconsistent outcome for user agents. #124

Closed
jyavenard opened this issue Jul 16, 2016 · 19 comments
Assignees
Milestone

Comments

@jyavenard
Copy link
Member

In the updated spec, we have:
http://w3c.github.io/media-source/#duration-change-algorithm

"5. If a user agent is unable to partially render audio frames or text cues that start before and end after the duration, then run the following steps:
Note

This condition can occur because the coded frame removal algorithm preserves audio frames and text cues that start before and end after the duration.

1. Update new duration to the highest end time reported by the buffered attribute across all SourceBuffer objects in sourceBuffers.
2. Update duration to new duration.

"

I'll explain the issue with an example. Assuming we have a video sourcebuffer with the following 3 samples:
[0,2)[2,4)[4,6)

if we call remove with the arguments (3, 6), the resulting sourcebuffer will now contain:
[0,2)[2,4)

as a sample is only removed as : "Remove all media data, from this track buffer, that contain starting timestamps greater than or equal to start and less than the remove end timestamp." (from Coded Frame Removal)

The sample [2,4) isn't truncated.

When setting duration to 3 however, if the usage agent is able to partially render audio frames, then the resulting duration will stay 3.
However, for a user agent enable to partially render audio frames, the duration will now be

"Update new duration to the highest end time reported by the buffered attribute across all SourceBuffer objects in sourceBuffers."

so duration for those user agents will be 4 instead.

Both type of user agents will have a buffered range of [0,4), yet one has a duration of 3 and the other 4 (which is even more confusing as we do not have to worry about audio frames here)

I don't believe that there is anything in the spec, indicating that playback would have to stop once currentTime reaches duration; on the contrary:
https://w3c.github.io/html/semantics-embedded-content.html#offsets-into-the-media-resource

"The duration attribute must return the time of the end of the media resource, in seconds, on the media timeline. ".

When played, that last sample will update the duration, in which case once playback has ended we would have a final duration = 4.

As such, I believe that

"5.If a user agent is unable to partially render audio frames or text cues that start before and end after the duration, then run the following steps" should be removed to ensure consistency across user agents. Step 4 "Update duration to new duration."

would need to be removed.

Additionally, as side effects of those steps that change the duration to what was originally set.
if you do:
mediaSource.duration = 3;
mediaSource.duration = 3;

as in both cases the duration was actually set to 4, the first step of the duration change algorithm

"1. If the current value of duration is equal to new duration, then return."

will always be false; and the durationchange event will be fired twice (this issue arised in the web-platform-test media-source/mediasource-duration.html which specifically check that durationchange event is not fired when setting the duration to the same value multiple times in a row).

As a workaround for this issue, I suggest that in the final step, prior "Update the media duration to new duration and run the HTMLMediaElement duration change algorithm."

we do: "If the value of old duration is equal to new duration, then return."
This will ensure that no durationchange event will be fired.

@jyavenard
Copy link
Member Author

Also pointing out another discrepancy related to the duration.

Say we have the following two tracks, one audio and one video.
video: [0,2)[2,4)[4,6)
audio: [0,2.5)[2.5,5)[5,7.5)

after a call to each sourcebuffer with remove(3,6) we now have:
video: [0,2)[2,4)
audio: [0,2.5)[2.5,5)

doing mediaSource.duration = 3
will cause mediaSource.duration to be 4 when the mediaSource isn't in ended mode.

a follow up call to mediaSource.endOfStream() will now call the duration change algorithm with the value of the highest end time which is now 5.

and so mediaSource.duration is now 5.

So, 3, 4 or 5 are all possible returned values when setting the duration to 3.

BTW, this exact scenario is happening in the media-source/mediasource-duration.html platform test (available at https://github.com/w3c/web-platform-tests/blob/master/media-source/mediasource-duration.html)

(which now reports failure after implementing the new MSE spec)

@wolenetz
Copy link
Member

wolenetz commented Jul 19, 2016

@jyavenard Thanks for reporting.
Briefly, it looks like there are three issues here:

  1. Pre-existing problem with MediaSource.buffered (and SourceBuffer.buffered) on user agents that do support partially rendering frames: the duration could be successfully set to less than the highest range end time. It seems to me we should either:
    a) prohibit partially rendering case completely, or
    b) adjust the SourceBuffer.buffered range logic to "Let highest end time be the minimum of the parent MediaSource's duration and the largest track buffer ranges end time across all the track buffers managed by this SourceBuffer object. This would have weird side-effect where simply slightly increasing a previously decreased duration could also lengthen the presentation interval of the highest buffered range, or
    c) something else which I haven't thought of yet. Suggestions welcome.

  2. Repeated setting of mediaSource.duration = A, on an implementation which doesn't support partially rendering frames, leads to multiple durationchange events being fired. Assuming we don't prohibit partially rendering case completely in (1), then I agree we should probably suppress those redundant durationchanges as you suggest.

  3. endOfStream can increase duration, too, confusingly, depending on whether or not the user agent supports partially rendering frames (if there was a previous duration reduction). I'm thinking this also depends on solving (1) first.

@jdsmith3000 What do you think?

@jyavenard
Copy link
Member Author

I don't really understand what the ability to render partial frames has anything to do with the change of duration, the removal of frames via the range removal algorithm or buffered range calculation.

The range removal algorithm will never truncate a frame; either a whole frame is removed or none of it. The buffered range calculation also only uses whole frames (as they are now stored in the track buffer).

The only time partial rendering of frames matters is during the coded frame processing algorithm where a new added frame may overlap an existing one. But once the overlap is done, the frame can be considered as whole again (its end timestamp has been adjusted and the original duration of that frame is no longer available).

It could have mattered in the past, where changing the duration could have removed frames (and one could have considered the possibility of truncating partial frames then).

As such, should the ability to render partial frame matters at all when it comes to reporting the buffered range or changing the duration? I think not.

So I believe that the ability to render partial frames should be excluded. It can still be allowed considering the above (and it's quite nice feature for gapless transition and so on).

The inconsistent behaviour of the duration change algorithm makes it difficult to write universal tests such as the web platform test.

@wolenetz
Copy link
Member

I'll also investigate how Chromium does this, since that might provide the least-regressing route to clarification in the spec. I suspect it caps the buffered range end times to not exceed the current duration value, thereby side-stepping some of the resulting spec confusion.

@wolenetz
Copy link
Member

Indeed, Chromium caps SourceBuffer.buffered's highest end time to be <= highest_end_time (per spec), and also to <= current duration (not in spec, but a perhaps helpful clarification): https://cs.chromium.org/chromium/src/media/filters/chunk_demuxer.cc?sq=package:chromium&rcl=1469027825&l=153

@jyavenard
Copy link
Member Author

That could be an easy alternative, especially now that we can't set a duration to a value that would require removing samples from the source buffer.

However, we have to be very careful when drafting this and not introduce a cycling-dependency.
The duration change algorithm rely on the buffered range to calculate the effective duration, and so if the buffered range calculation also rely on that same duration it's going to cause issues (similar to the original range removal algorithm that relied on the duration being set).

What was the reason for chromium to implement that change? it's disappointing when a user agent do their own things, with chromium market share that now makes it a de-facto rule.

@jdsmith3000
Copy link
Contributor

I am looking at this now. It is true that the partial audio/text frame logic can lead to different durations based on implementation ability, but it would be consistent for a given UA, right?

I do wonder about the benefit of allow this variance. It seems like the rules for implementations that can't handle the partial frames could perhaps be applied everywhere. That way, duration would always be greater than or equal to what's buffered. @wolenetz: Is this what you are suggesting?

@paulbrucecotton
Copy link

@jdsmith3000 and @wolenetz: Should this be tagged V1Editorial?

@jdsmith3000
Copy link
Contributor

I'm looking into our implementation. Having the can handle it/can't handle it options seems undesirable. @wolenetz Please confirm that you adjust duration to new duration always when partial audio or text tracks are buffered beyond duration limits.

I'm not sure if this is just editorial, since it could affect implementations.

@wolenetz wolenetz self-assigned this Aug 9, 2016
@wolenetz
Copy link
Member

wolenetz commented Aug 10, 2016

@jyavenard (#124 (comment)): Chromium's detection of playback reaching end of media resource is a bit broken: it renders everything read from each of the streams until all the currently active streams have signalled that the end of stream has been reached. This should only occur if endOfStream() was previously called. The actual duration of rendered media might be beyond the set duration, since nothing in Chromium's duration change algorithm updates the streams or renderers to indicate a partial frame. Hence, Chromium appears to be doing a hybrid approach of the duration change logic (it effectively doesn't support partial end-of-media-resource-frame-rendering), yet it clamps the buffered range logic to make it look somewhat like it does support such partial frame rendering.

Plan A: I propose, to not break existing web apps which might get confused if indeed a buffered range end time was seen to be > duration (not possible in Chromium's hybrid/broken implementation currently), to:

  1. Change Chromium to not do this hybrid approach; rather: never set duration less than the buffered range end time.
  2. Change the spec and Chromium implementations (and other UAs might need to do similar change) (to address Confusing duration change consequences and inconsistent outcome for user agents. #124 (comment)) to:
    a) disallow the partial-frame-rendering logic. assume no UA can do this, and
    b) to ensure that duration is always >= the highest end time of any buffered frame, update the duration change algorithm to, instead of "Update new duration to the highest end time reported by the buffered attribute across all SourceBuffer objects in sourceBuffers.", instead be "Update new duration to the highest end time of any buffered coded frames for all SourceBuffer objects in sourceBuffers", and
    c) to eliminate redundant durationchange events, move the "update duration to new duration" to after all updates to new duration have completed, and after a second check for "if the current value of duration is equal to new duration, then return".

@jyavenard : would this approach best match FF's implementation? Per @jdsmith3000 (who's currently OoO) in #124 (comment), it seems Edge might already not support handling partial frames.

If no UA currently handles partial end-of-media-resource frame rendering, the net observable change would be:

  1. elimination of potentially redundant "durationchange" events, if the duration didn't actually change, after all the adjustments based on what's currently buffered.
  2. for muxed SourceBuffers whose intersection ranges have a highest end time which is < the highest buffered coded frame end time in the SourceBuffer, the new duration would be increased to that highest buffered coded frame end time (e.g. like what occurs when readyState is "ended" currently), even if not "ended".

This seems partially substantive, especially if these net observable changes are important to web authors, or any UA makes reliance on the previous allowance for partial end-of-media-resource frame rendering. @paulbrucecotton Please advise on potentially needing a CfC for resolving this. @jyavenard please advise on impact to FF implementation.

--edit: marked this proposal as Plan A.

@wolenetz
Copy link
Member

Plan B: As one alternative to the approach presented in #124 (comment), and with less substantive change, we could reduce the net observable change by dropping 2.c. This still assumes no UA currently supports partial end-of-media-resource frame rendering, but still fixes the existing problem around muxed SourceBuffers.

As in Plan A, Chromium would need to be fixed to not do its non-standard buffered range clipping to duration (which is effectively a no-op due to also fixing Chromium to comply with the steps around not supporting partial end-of-media-resource frame rendering).

@jyavenard
Copy link
Member Author

@wolenetz:

. The actual duration of rendered media might be beyond the set duration, since nothing in Chromium's duration change algorithm updates the streams or renderers to indicate a partial frame.

ok, I understand the problem you were describing at first (it got me confused for a while).

Ignoring duration change for the moment
A frame can only ever be added whole, and the frame removal algorithm also never remove partial frames.

So duration will always be superior or equal to the highest end time found in any track.

I'm rather confused on where this talk about rendering partial frames originate from and why it has any impact on the way duration is calculated.

However, as currently the duration change does take into account the ability to render partial frame. So it's the only place where this partial frame business is coming into account.
Am I to understand that implicitly, it means that if the duration has been set to between the start and the end of the last frame; then we should in theory not render that last portion of the partial frame?

If neither Blink, Edge or Firefox, and I'm guessing neither webkit have this ability (not that it would be a hard thing to do, Firefox in particular already has the ability to seek within a frame) perhaps removing that ambiguity altogether is the best thing to do.
Gecko do not clamp, never had. It has always been doing things fully as per spec. So I'm hoping that there's nothing we can break seeing that it's always been that way and would already been broken. So that backward compatibility shouldn't be an issue

BTW, Firefox does the same as what you describe as to how frames are played, when endOfStream has been called, and normal playback is going (e.g. not seeking) it will simply demux all frames until each stream reports EOS and it will continue playing until there are no more frames to decode.
Seeing that the duration has already been adjusted to be exactly at the end of the last frame, currentTime when playback ends is always exactly equal to the set duration, and no durationchange event will be fired.

What I do believe however, is that if Chrome is doing something in a particular, yet reasonable way, then we should amend the spec so that it's well documented and prevent getting other UA to get bug reports because "Chrome doesn't do it that way" :)

@wolenetz wolenetz added this to the V1 milestone Aug 16, 2016
@wolenetz
Copy link
Member

I'll produce a PR.

@wolenetz
Copy link
Member

I think Plan B would be best (less substantive, and also retains some notification to web apps that something about duration has changed, in this (redundant durationchange) case it indicates that the desired duration (different than current duration) was adjusted. only in the case where the desired duration is exactly the current duration will durationchange be suppressed.

@jdsmith3000
Copy link
Contributor

I agree with removing text here about partial frame rendering. @wolenetz: I believe you plan to remove step 4 (the partial frame conditional statement) and renumber 4.1 and 4.2 as steps that always execute. Correct? If so, that makes sense to me.

@wolenetz
Copy link
Member

On looking deeper into this, I wonder if this might break web apps, if going with plan A or B instead of allowing for behavior like what Chromium's hybrid currently implements (giving apps the impression of support for partial frame rendering, capping buffered range highest end times to duration, yet perhaps for some stream types like audio playing the last few decoded samples which might be beyond the duration (but within a coded frame presentation interval which begins at or below duration).

App breakage or implementation/spec fragility arises out of plan A or B in the cases where duration is set by the app but results in something the app wasn't expecting (a higher duration). In degenerate cases, such as an extremely long text track cue, the resulting duration could be huge.

So, plan C will be what I'll produce a PR for. It will include a non-normative note which allows implementations which cannot render partial audio frames to play those last few samples, so long as the app-visible currentTime never exceeds duration (Chromium looks like it might have a bug here, where currentTIme might exceed duration even in this current implementation).

@jdsmith3000 @jyavenard Please comment ASAP if you think Plan C is severely flawed.

@jdsmith3000
Copy link
Contributor

I disagree. Plan B could change behavior if a particular UA supports partial frame playback now and changes to the revised spec that eliminates it; but it standardizes on a valid behavior under the current spec and so should not be considered a breaking change. The Chrome hybrid, if anything, changes playback behavior because it poses as a partial frame UA, but play beyond duration. How can that be better?

@wolenetz
Copy link
Member

Plan B's change of "duration" to something the app did not expect afford less interoperability than a few milliseconds of audio perhaps playing beyond the duration. Note that the allowance in Plan C is not normative; implementations can choose to improve their quality and not play those few decoded audio samples that exceed duration. Even without that improvement, plan C forbids implementations from reporting a currentTime value that exceeds duration. The concern here is of app breakage, which I believe you also mentioned concerns about during the editors' sync yesterday, @jdsmith3000.

@wolenetz
Copy link
Member

wolenetz commented Aug 18, 2016

Following discussion in #148, the plan B is the way forward. I'm producing a new PR for that now.

-- edit: undid accidental issue closure..

@wolenetz wolenetz reopened this Aug 18, 2016
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 30, 2016
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu

--HG--
extra : rebase_source : b25f0e2a76c517c0dca0a9def00edd6eff38d8ad
xeonchen pushed a commit to xeonchen/gecko-cinnabar that referenced this issue Aug 30, 2016
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu
kuoe0 pushed a commit to kuoe0/gecko-dev that referenced this issue Sep 7, 2016
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Sep 30, 2019
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu

UltraBlame original commit: 7f33fad15f7c48a7544f2045179f36bfee81a9be
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Sep 30, 2019
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu

UltraBlame original commit: 7f33fad15f7c48a7544f2045179f36bfee81a9be
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Sep 30, 2019
…c. r=gerald

The MSE spec was recently updated to use the highest end time across tracks rather than across the buffered ranges.

See w3c/media-source#124 and the fix described in w3c/media-source#154

MozReview-Commit-ID: 4CqI8d2e9gu

UltraBlame original commit: 7f33fad15f7c48a7544f2045179f36bfee81a9be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants