Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the actual decoders, or provide "coded frame" bytestream or append API #184

Open
Codeusa opened this issue Jun 15, 2017 · 19 comments

Comments

Projects
None yet
7 participants
@Codeusa
Copy link

commented Jun 15, 2017

It seems rather counterintuitive to force boxing of video frames for the API. When attempting to do real-time interactive applications like web based remote desktop, low latency is key and MSE forces a lot of overhead.

In an ideal situation allowing raw H.264 encoded frames to be passed to the hardware accelerated decoder and pushed into a video object solves these issues.

@dwsinger

This comment has been minimized.

Copy link

commented Jun 15, 2017

Hm. I think MSE was designed to support use-cases like DASH and HLS. If you are doing real-time, I would have thought that the WebRTC infrastructure may be more appropriate?

@Codeusa

This comment has been minimized.

Copy link
Author

commented Jun 15, 2017

WebRTC has its own overhead, you'll need to go through the process of setting up a STUN/TURN framework and then the hacky solution of making it think a media source (webcam) is your stream.

When it comes to real-time video other platforms you're able to access the decoders at the lowest level. You shouldn't have to over complicate the solution to a "simple" problem.

@jyavenard

This comment has been minimized.

Copy link

commented Jun 16, 2017

Mozilla had opened a similar bug to investigate this problem (https://bugzilla.mozilla.org/show_bug.cgi?id=1325491)

You would still need to wrap the data in a container of some kind... Because the plain raw data doesn't provide sufficient information to properly display those frames.

I do believe that we can improve MSE to be more real time friendly. However, I'm not convinced using raw data will help much here. The overhead required in wrapping the content in an mp4 or a webm is rather low.

@Codeusa

This comment has been minimized.

Copy link
Author

commented Jun 16, 2017

In solutions I've created outside the web I've only used raw data to achieve 60FPS real-time video, so I can't speak much to container format solutions.

The benefit of MSE is the hardware acceleration, however I do know that in my efforts to get real-time streaming working via MSE, delays often show up due to the I-frame delay present when sending over fragmented MP4's. A work around to this is sending frames individually as soon as they are captured, which is less than ideal since they each have to be boxed and every MS counts.

If you have any suggestions for approach with the standard we currently have, I'd appreciate fresh eyes.

@jyavenard

This comment has been minimized.

Copy link

commented Jun 16, 2017

I think you're making too many assumptions as to how MSE implementations work internally.

Sending raw frames vs having them muxed in a MP4 container would make zero difference in regards to speed of decoding, or the ability to use hardware decoding vs software. Both would be identical.
Same in regards to WebRTC vs MSE, using MSE doesn't suddenly open the world of hardware acceleration.

The only thing you would save with raw frame, is the time it takes to demux a MP4, which really, is barely relevant in regards to the processing required to decode a frame.

Using individual frame in a fragmented MP4 vs using multiple frames in a MP4 would also make no difference in practice:
The H264 hardware decoder available on Windows has a latency of over 30 frames. You need to input over 30 frames before the first one comes out. This is what is causing latency, not how many frames you're adding at a time, if they are muxed in a MP4 or not.

If you were to package 30 frames in a single MP4 fragment, or using 30 fragments of 1 frame, the latency would still be the same (as far as the first decoded sample is concerned).
In fact, I can assure you that, at least with Firefox, doing a single fragment with a single frame really adds a lot of processing time, and packaging say 10 frames per fragment would give much better results.

@roman380

This comment has been minimized.

Copy link

commented Jun 16, 2017

BTW hardware decoder in Windows might be instructed to enable low-delay mode (CODECAPI_AVLowLatencyMode). I would expect this to reduce decoding latency. However, generally speaking, it is unlikely that even standard mode has such processing latency, which basically disqualify the from real-time video scenarios. Encoders have it for their own reason, but not decoders.

Also I recalled DXVA H.264 decoder experience and it did produce output with reasonably small delay in terms of additional data on its input. It does require some processing time because, for example, it is multithreaded internally and certain synchronization is involved, however it is not as long as many additional input frames of payload data.

@jyavenard

This comment has been minimized.

Copy link

commented Jun 16, 2017

CODECAPI_AVLowLatencyMode is only available on Windows 8 and later (and you need a SP). We had to disable also because it caused crashes easily (see https://bugzilla.mozilla.org/show_bug.cgi?id=1205083).
It also is incompatible if the content has B-Frame.

FWIW, even with CODECAPI_AVLowLatencyMode and H264, the latency is around 10 frames (until that MF_E_TRANSFORM_NEED_MORE_INPUT is returned).

As for disputing that the latency is that high without it, it may worth trying yourself first

@roman380

This comment has been minimized.

Copy link

commented Jul 12, 2017

it may worth trying yourself first

I finally had a chance to check decoder output and whether low latency has effect, in Windows 10.

As I assumed decoder MFT does not need 10+ frames on the input before output is produced. Indeed, in default mode there is some latency and you keep feeding input before output is available.

In low latency mode it's "one in - one out" and it works great.

Let me make it absolutely clear. In low latency mode one does IMFTransform.ProcessInput, and the following ProcessOutput call delivers a decoded frame instead of returning MF_E_TRANSFORM_NEED_MORE_INPUT.

It could so happen it had issues in past, quite possible. But eventually it works and low latency mode has great value for near real-time video apps.

@Andrey-M-C

This comment has been minimized.

Copy link

commented Apr 2, 2018

@roman380
Did you try the low latency attribute on HEVC/H.265 decoder?
From my experince, I don't see this attribute set by defaulte. And even if I set it, the decoder output is 3 frames behind.

@roman380

This comment has been minimized.

Copy link

commented Apr 10, 2018

@Andrey-M-C
I tried a random HEVC encoded file (presumably there might be factors affecting the behavior including hardware, OS and the footage) and here is what I got:

measuredecodelatency-hevc

Three frames behind on DXVA2-enabled decoding.

@Andrey-M-C

This comment has been minimized.

Copy link

commented Apr 12, 2018

@roman380 Thanks for the response! I see the same pattern. If you set CODECAPI_AVDecNumWorkerThreads to 1 for the software decoder than you'll be 4 frames behind, since will be only one decoder thread spawned instead of default four threads. Is there any way to get a clarification from Microsoft about the absence of the low latency mode in HEVC MFT?

@roman380

This comment has been minimized.

Copy link

commented Apr 12, 2018

@Andrey-M-C I agree that decoder lacks flexibility and low delay mode does not even look like available. In particular, a sequence of just key frames still results in 9 frame latency with the software decoder which suggests the latency is there somehow by design (?).

The best place to ask MS comment (except opening an issue with support directly) that I am aware of is MSDN Forums here, however the comments there are still late and not so frequent.

@wolenetz

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

I think this issue merits a slight re-framing (pun intended):

  1. Low latency model/API for letting app explicitly and normatively modify how the MSE implementation treats output of decoder: queue and try to smooth rates versus "show ASAP, unless PTS interval was missed (drop in that case)" in video context, and "let app normatively describe tolerance and desired behavior w.r.t. buffered range gaps" for audio and video are being discussed (see #21 and #160), independent of:

  2. Find an alternative to re-muxing into a supported bytestream (e.g. MP4, WebM, etc) to let apps more rapidly and ergonomically buffer media in MSE.

I propose this issue be refocused to target the latter.

@wolenetz wolenetz changed the title Expose the actual decoders Expose the actual decoders, or provide "coded frame" bytestream or append API Oct 2, 2018

@Codeusa

This comment has been minimized.

Copy link
Author

commented Oct 2, 2018

We've actually managed to "trick" Chrome and Firefox into decoding in ultra-low latency mode -- of course gaps in data are still a potential issue but in the linked example, its only 7 MS of delay between the host and client. So at least we know its possible.

@wizziwig

This comment has been minimized.

Copy link

commented Oct 8, 2018

We've actually managed to "trick" Chrome and Firefox into decoding in ultra-low latency mode -- of course gaps in data are still a potential issue but in the linked example, its only 7 MS of delay between the host and client. So at least we know its possible.

Can you provide any details on how you tricked Chrome and Firefox into hardware decoding h.264 fast enough to allow less than 7ms total presentation latency? I would like to try reproducing your results. Was that just for decoding or total end-to-end including encoding, network transport, decoding, and windows desktop rendering/composition? Thanks.

@jyavenard

This comment has been minimized.

Copy link

commented Oct 9, 2018

With the right content, the Windows WMF h264 decoder may have no latency.
In Firefox you need to set the preference media.wmf.low-latency.enabled to true.

That mode is enabled by default in Chrome, though the Microsoft documentation does state that it's not supposed to work with content having B-frames.

@roman380

This comment has been minimized.

Copy link

commented Oct 9, 2018

.. Microsoft documentation does state that it's not supposed to work with content having B-frames.

Documentation quote: "B slices/frames can be present as long as they do not introduce any frame re-ordering in the encoder."

@jyavenard

This comment has been minimized.

Copy link

commented Oct 10, 2018

Almost all YT content as B-Frame requiring re-ordering, as most B-frames do. And yet chrome always enable the low latency mode and it obviously works.

Edit: oh, I just notice that the comment about B-frames is in relation to the encoder only

We disabled it on Firefox because it caused some crashes with some version of Windows 8.

@roman380

This comment has been minimized.

Copy link

commented Mar 17, 2019

This bug Enable low-latency decoding on Windows 10 and later suggests that we might finally have CODECAPI_AVLowLatencyMode back with default settings, doesn't it? I think it's been working well in Chrome for quite some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.