Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoFrames to VideoDecoderOutputCallback in decoding order, not presentation order. #55

Closed
riju opened this issue May 11, 2020 · 14 comments · Fixed by #320
Closed

VideoFrames to VideoDecoderOutputCallback in decoding order, not presentation order. #55

riju opened this issue May 11, 2020 · 14 comments · Fixed by #320
Labels
need-definition An issues where something needs to be specified normatively

Comments

@riju
Copy link

riju commented May 11, 2020

Modern standards allow a frame to predict its content temporally from both past and future (B-frames). Encoded chunks are usually stored in encoding/decoding order which may be different from presentation order. VideoFrames are given to VideoEncoder in presentation order which may reorder them before coding and output encoded video chunks in decoding order. Similarly, encoded chunks are given in decoding order to VideoDecoder.

What if VideoDecoder calls VideoDecoderOutputCallback as soon as a video frame decoding has finished, i.e. it gives VideoFrames to VideoDecoderOutputCallback in decoding order, not presentation order. In use cases like video playback Web App then needs to reorder the VideoFrame into presentation order. Although this increases a bit Web App complexity, it has several advantages:

  • UA does not need to maintain a queue of decoded VideoFrames that will be given to the Web App sometimes later after other frames.
  • VideoFrames are given to Web App as soon as possible which gives the Web App more time to do some processing to frames before displaying them.
  • Useful if the Web App is in full control of the synchronization and displaying the frames

How about adding decodingSequence and presentationSequence like -

interface EncodedVideoChunk {
  ... 
  readonly attribute unsigned long decodingSequence;
};

[Exposed=(Window)]
interface VideoFrame {
  ...
   readonly attribute unsigned long presentationSequence;
};

In EncodedVideoChunk decodingSequence would denote the coded order and be filled by the Web App before giving the chunk to VideoDecoder. VideoDecoder would fill presentationSequence of VideoFrame before calling OutputCallback.

There has been similar discussions in #7.

@sandersdan
Copy link
Contributor

This actually is very similar to #7, perhaps they should be merged.

Two key things to consider:

  1. Not all platform decoders can operate in this mode. A WebCodecs implementation may need to rewrite slice headers to remove reordering to get the described behavior, and this requires parsing of the bitstream.
  2. This only really applies to H.264/H.265. VPx and AV1 use a different system of non-output frames that doesn't fit this shape well. Because of that it is more consistent to output in presentation order.

On the plus side, outputting in decode order resolves some difficult questions about backpressure.

@chrisn
Copy link
Member

chrisn commented May 12, 2020

Are there other use cases where a web app would want to receive frames in decode order? I wonder if this flexibility is needed, and if it's worth the additional complexity (from the web app's point of view).

@kenchris
Copy link

Nit: decoding_sequence should be called decodingSequence to follow our naming policies (check TAG design principles)

@jyavenard
Copy link
Member

"UA does not need to maintain a queue of decoded VideoFrames that will be given to the Web App sometimes later after other frames."

Of all decoding APIs, only Apple VideoToolbox decoder output frames in decoding order.
FFmpeg, Microsoft's WMF, Android MediaCodec all return frames in presentation order, and this is what you want.
The UA typically doesn't maintain a re-ordering queue, the low-level decoder itself does.

I don't believe that there's a use case that could ever be useful to the user in returning in decode order.
It makes the life of the user more complicated, requires them to write more code and it's not something the user-agent could even provide even if it wanted to.

As mentioned, only H264/H265 require such re-ordering queue.

@padenot
Copy link
Collaborator

padenot commented Jul 6, 2020

I believe the most important point is to decide if there ever is a use-case where having the frames in decode order is useful. For now there seem to be a clear preference for outputting in presentation order, except in the initial message in this issue, and the second part of #55 (comment), but it's not an argument strong enough to decide considering the Web API design principles (authors come before implementers/speccers). @sandersdan, can you clarify what you mean by "resolving difficult questions about backpressure"?

Some more comments:

* UA does not need to maintain a queue of decoded `VideoFrames` that will be given to the Web App sometimes later after other frames.

If all apps need to write more or less the same code to have something working, it's a good sign that we can make the API better.

* VideoFrames are given to Web App as soon as possible which gives the Web App more time to do some processing to frames before displaying them.

This is real-time, the frame processing need to be shorter than a particular processing quantum (possibly with artificial added delay) for things to work well. Some frames might have more time, but other won't.

* Useful if the Web App is in full control of the synchronization and displaying the frames

I can't think of a use-case where using the frames in decode order makes sense. The web app is in control of synchronization and display of the frames anyways.

@chcunningham
Copy link
Collaborator

I think we're more or less in consensus here that inputs should be in decode order while outputs should be in presentation order. I'll leave this open for a bit more to hear objections or use cases for alternative modes.

@sandersdan assuming we maintain course, do you think we need clarification in the spec on this point?

@sandersdan
Copy link
Contributor

We probably do, the decode algorithm is unclear about the relationship, and seems to imply a direct relationship between inputs and outputs. For example, I read it to say that if one chunk results in multiple output frames, then they must be adjacent; that's probably true but not something we should specify.

The actual output order is codec-specific, but could probably be worded in a generic way ("output in the presentation order specified by the codec"?). It's important not to imply involvement of the timestamps though.

@chcunningham
Copy link
Collaborator

chcunningham commented May 12, 2021

triage note: marking 'editorial' as conclusion is to update spec wording to clarify that output is presentation order. if we later desire a mode to output in decode order, this can be achieved by extending the config.

@chcunningham chcunningham added the editorial changes to wording, grammar, etc that don't modify the intended behavior label May 12, 2021
@padenot padenot added need-definition An issues where something needs to be specified normatively and removed editorial changes to wording, grammar, etc that don't modify the intended behavior labels May 17, 2021
dalecurtis added a commit that referenced this issue Aug 3, 2021
@marcello3d
Copy link

Sorry to comment on an old ticket but I'm seeing behavior in Safari where VideoDecoder (with h264) seems to be returning frames in decoder order rather than presentation order, creating strange playback artifacts. (Same thing plays fine in Chrome.)

I can't tell if this is something I need to handle in JavaScript or is it a Safari implementation bug?

@padenot
Copy link
Collaborator

padenot commented Oct 30, 2023

Frames are to always be output in presentation order, and it's the job of the implementation to ensure this is the case.

https://w3c.github.io/webcodecs/#dom-videodecoder-decode has the spec text for this, specifically this line: https://w3c.github.io/webcodecs/#ref-for-dom-videodecoder-codec-implementation-slot%E2%91%A8

https://bugs.webkit.org/ is the bug tracker for WebKit. @youennf might also know if there's something opened already.

As always, test-cases are appreciated by implementers, if you have one handy you can share with them.

@youennf
Copy link
Contributor

youennf commented Oct 30, 2023

@marcello3d, please file a bug report. I believe VTB is providing in decode order and WebKit is missing some logic to do the necessary reordering.

@youennf
Copy link
Contributor

youennf commented Oct 30, 2023

@marcello3d, I filed https://bugs.webkit.org/show_bug.cgi?id=263901.
If you already have a test case, could you upload it add a link to it there?

@marcello3d
Copy link

marcello3d commented Oct 30, 2023

yea, so I just tried adding logging to the sample on this repo and it reproduces:
image

here is my changeset: https://github.com/w3c/webcodecs/pull/734/files

I'm on Safari Version 17.0 (19616.1.27.211.1), macOS 14.0 (23A344)
It also reproduces on Safari Technology Preview Release 181 (Safari 17.4, WebKit 19618.1.3.1)


this also happens with h265 on Safari Technology Preview Release 181 (vp8, vp9, av1 have no errors):

image (it looks like Safari 17.0 doesn't support h265 so I didn't test there)

@marcello3d
Copy link

@youennf okay added some comments on bugzilla, let me know if there's anything else you need!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-definition An issues where something needs to be specified normatively
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants