New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

Open
wolenetz opened this Issue Oct 13, 2015 · 10 comments

Comments

Projects
None yet
7 participants
@wolenetz
Copy link
Contributor

wolenetz commented Oct 13, 2015

Migrated from w3c bugzilla tracker. For history prior to migration, please see:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28379

It was previously assigned to Adrian Bateman. Editors will sync soon to determine who to take this bug.

@wolenetz wolenetz self-assigned this Oct 15, 2015

@wolenetz wolenetz added this to the V.Next milestone Oct 15, 2015

@wolenetz

This comment has been minimized.

Copy link
Contributor

wolenetz commented Oct 15, 2015

It sounds like a set/get low latency API might solve this.

@jdsmith3000

This comment has been minimized.

Copy link
Contributor

jdsmith3000 commented Oct 19, 2015

This issue requests app control over the latency model, and that's clearly a new feature request. It might be possible to detect a live stream and set lower latency buffering, but it's not clear that would be the best thing to do on all live streams. An API that lets the app communicate intent is likely needed to resolve this adequately.

On V.Next already.

@paulbrucecotton

This comment has been minimized.

Copy link

paulbrucecotton commented Nov 17, 2015

The Media Task Force has agreed to designate this issue as V.Next:
https://lists.w3.org/Archives/Public/public-html-media/2015Nov/0027.html

@greentorus

This comment has been minimized.

Copy link

greentorus commented Aug 2, 2016

Feature proposals/"requests":

a) The low latency mode should also support video streams (e.g. H.264) with an initial single key frame followed by P-frames only.
Because having key-frames from time to time means significantly larger packets from time to time which take longer time to transmit and therefore arrive later at the client, which are no issue for buffered VOD situations, but cause stuttering in case of low-latency situations with close to zero buffering. Having only P-frames looses seeking, but low-latency use-cases like video chat or cloud gaming does not need that anyway.

b) The low-latency mode should work well with adding each new video frame individually to the source buffer.
Because adding multiple video frames together to the source buffer would introduce an unnecessary buffering and therefore increase the delay.

@jyavenard

This comment has been minimized.

Copy link

jyavenard commented Aug 2, 2016

What you want to do, and the type of video data you use (a single starting keyframe, followed by P-frame), is currently fundamentally incompatible with the sourcebuffer architecture and spirit.

MSE requires regularly spaced keyframes to work, in particular in order to be able to evict data from the sourcebuffer.
The concept of dealing with individual frames would have to be removed, and allowing to evict data using a binary offset only.

An alternative would be to sourcebuffer::remove to take either a percentage, or a byte offset. seeking would have to be disallowed.
the live seekable attribute would always return an empty range.

@greentorus

This comment has been minimized.

Copy link

greentorus commented Aug 2, 2016

Yes, I see the point that it is for now fundamentally incompatible with the current MSE philosophy. But what I have in mind is: Low-latency MSE is a very interesting feature for many applications, and as this issue shows we are not the first ones being interested into that ;-) And single-keyframe video streams are one important aspect for good low-latency I think. So extending the MSE architecture to make that possible would be useful and worth it.
Maybe there are very simple approaches, simpler than percentages or byte offsets: For example (as mentioned in the mozilla board), low-latency use-cases are usually personalized and interactive and therefore don't need seeking anyway. So one simple solution could be that seeking and sourcebuffer::remove is officially simply not possible (returning an error) if the video has only one keyframe (so far).

@Codeusa

This comment has been minimized.

Copy link

Codeusa commented Sep 26, 2018

Have there been any updates on this or a real live low latency mode for MSE vNext?

@wolenetz

This comment has been minimized.

Copy link
Contributor

wolenetz commented Oct 2, 2018

Not tangible, though I have discussed some approaches face-to-face with @jyavenard earlier this year.

@wolenetz

This comment has been minimized.

Copy link
Contributor

wolenetz commented Oct 2, 2018

@greentorus / #21 (comment): It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

@wolenetz wolenetz changed the title should buffering model be an option? Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" Oct 2, 2018

@mmmmichael

This comment has been minimized.

Copy link

mmmmichael commented Oct 3, 2018

@greentorus / #21 (comment): It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

Yes, we are also considering the MediaStream/WebRTC API.

However, compared to MSE, MediaStream/WebRTC involves a lot of unnecessary high-level complexity and protocol restrictions for only displaying a live video stream.

Also, as a minor secondary reason, it seems the MSE video pipeline is better optimized for higher resolution in many browser implementations. For example, the MSE implementation in Firefox under Windows seems to use hardware decoding based on the Windows Media Foundation, but its MediaStream implementation seems to use software-only decoding.

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

We don't care about what the solution is, as long as it provides low-latency. So an explicit latency model sounds good.

However, real low-latency seems to be not possible without "single keyframe plus a lots of P frames".

For example, suppose the user has a 20 Mbps network connection. This supports a 2160p 60fps video stream, typically with 1 keyframe per second. Depending on the scenario, in many situations a keyframe often consumes up to 1/2 of that total bandwidth of even more (in this case around 10 Mb), while the P frames are very small (around 150 kb). This is no problem when using high-latency buffering. But it means that transferring a keyframe takes 1/2 second. This means the minimum possible latency is also 1/2 second. When only using P frames, the minimum possible latency is 1/2 of 1/60 = 1/120 second. Note that decreasing the number of keyframes per second decreases bandwidth, but doesn't decrease latency, which will stay at 1/2 second. The only exception seems to be: Not sending any keyframes anymore after a single initial one. Then, after some initialization hickup, the minimum latency is 1/120 latency.

This problem was the reason why we started experimenting with a "single keyframe plus a lots of P frames".

How could this problem be avoided in the "low vs 'smoothing' latency model" proposal when still having regular keyframes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment