Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coded frame processing is not clear around what to do for SAP Type 2 overlaps #187

Open
wolenetz opened this issue Jul 6, 2017 · 16 comments

Comments

@wolenetz
Copy link
Member

wolenetz commented Jul 6, 2017

SAP Type 2 (allowed for {audio,video}/mp4) random access points can have a frame later in decode sequence than the random access point, but with a lower presentation time than the random access point.

Suppose there is some buffered video (in presentation timeline) (caps=SAP Type 2 random access point, # = decode sequence)
[a2][A0][a1]

And a new decode sequence (e.g., following an abort() or with a decode sequence discontinuous with previous append) is appended: [B0][b1][b2], having presentation time sequence [b2][B0][b1], with [b2]'s presentation interval overlapping the presentation time of [a1], but neither [B0] nor [b1] overlap any of the previously buffered video.

In the current coded frame processing algorithm, [a1] would be retained. Both [b2] and [a1] would have the same presentation time: which should be retained? Should the overlap of [a1] by [b2] cause [a1] to be removed? (And [a1]'s removal would also trigger the removal of [a2] because the it has decode dependency on [a1].) If you seek to the PTS of [b2] / [a1], which should show?

The coded frame processing algorithm already dictates what to do for the first frame following a decode discontinuity: if it overlaps the presentation start time of some previously buffered frame, remove that previously buffered frame and everything previously buffered in decode order from that removed frame until the next previously buffered random access point. It also maintains a track buffer highest end time for use in removing previously buffered media (from a previous coded frame group) overlapped by the new coded frame group as its presentation interval grows into the future.

However, for SAP Type 2 random access points allow a coded frame group's presentation interval to grow into the past, too.

This seems to strongly motivate also tracking a track buffer lowest start time, and adding overlap removal logic to the spec text.

This hasn't been a problem in Chrome yet due to it having incorrectly been using decode time intervals, not presentation time intervals, for managing all buffered range overlaps and reporting buffered range times; now that I'm fixing that implementation to be correct, I see that SAP Type 2 appears to be insufficiently supported by the MSE spec.

@jdsmith3000 @jyavenard @paulbrucecotton FYI - please comment.
-edited to correct to @paulbrucecotton

@wolenetz
Copy link
Member Author

wolenetz commented Jul 6, 2017

@acolwell I did some spelunking through the repo history, and the text around supporting SAP Type 1 and 2 existed in the first commit: e657ebf. Needless to say, it appears SAP Type 2 support (where PTS of later frame in decode order can precede PTS of random access point) has been allowed since MSE was proposed. Am I missing something? WDYT about adding lowest start time logic to the coded frame processing algorithm?

@wolenetz
Copy link
Member Author

wolenetz commented Jul 6, 2017

@plehegar @paulbrucecotton, would adding something like lowest start time be considered an erratum? It clarifies behavior for improved interoperability for a use case that has been required to be supported since MSE was originally proposed.
-edited to correct to @paulbrucecotton

@jyavenard
Copy link
Member

For H264, Firefox ignores the container's sample table flags, and instead uniquely rely on the frame containing a NAL of type IDR_SLICE

When we did rely on the container's sample table flags, we use the same logic as FFmpeg to determine if the frame depended on any others:

For all other codec (except audio, where we marked all frames as keyframe):

An old comment I wrote on the matter:
"the tfhd box set default sample flags. The definitions for those flags are:
The sample flags field in sample fragments (default_sample_flags here and in a Track Fragment Header Box, and sample_flags and first_sample_flags in a Track Fragment Run Box) is coded as a 32-bit value. It has the following structure:
bit(4) reserved=0;
unsigned int(2) is_leading;
unsigned int(2) sample_depends_on;
unsigned int(2) sample_is_depended_on;
unsigned int(2) sample_has_redundancy;
bit(3) sample_padding_value;
bit(1) sample_is_non_sync_sample;
unsigned int(16) sample_degradation_priority;

with an extra note "The flag sample_is_non_sync_sample provides the same information as the sync sample table [8.6.2]. When this value is set 0 for a sample, it is the same as if the sample were not in a movie fragment and marked with an entry in the sync sample table (or, if all samples are sync samples, the sync sample table were absent).”. Here we have no sync table (no stss box).

So we use the mask 0x01010000; if the operation sampleFlag & 0x1010000 gives us 0, we assume it’s a keyframe ."

I have never seen a case where the frames marked as random access point in the container weren't the first frame of the GOP (in both pts and dts order).

So in short, does it matter ? :)

@wolenetz
Copy link
Member Author

wolenetz commented Jul 6, 2017

I have never seen a case where the frames marked as random access point in the container weren't the first frame of the GOP (in both pts and dts order).
So in short, does it matter ? :)

It would be excellent if this were truly the case (as it had been my assumption until early last week, when reparsing the ISOBMFF spec text around SAP Type 2 indicated that we must support this case you haven't ever seen). However, we've seen folks at least having problems with similar streams in non-MSE case, and we have seen some cameras that produce SAP Type 2 streams. I'll add some logging to Chrome to detect this case. Do you have any Firefox telemetry confirming your statement that no-one ever uses H264 with SAP Type 2?

@acolwell
Copy link
Contributor

acolwell commented Jul 6, 2017

@wolenetz I believe I only included SAP Type 2 because I was just quickly looking to see which types mapped to "closed GOPs". I likely didn't pay attention to the fact that it allowed the presentation time of a later access unit to come before the first access unit. I suspect SAP Type 2 is pretty rare since it could cause all sorts of complications. My initial instinct is to just drop frames that have presentation times before the PTS of the first access unit, but I suspect if the encoder is sufficiently crazy this could break decode dependencies. I suppose you would also wiggle out of this by dropping the SAP Type 2 requirement since you likely didn't prove interop on it. :)

I suspect trying to support later access units with earlier presentation times will open a can of worms that just isn't worth solving. I can't really see a way to do it without having full dependency knowledge of the full GOP because the earliest frame could be the last one in decode order if you happen to have a particularly cruel encoder (say one that encodes frames in reverse order!).

@jyavenard
Copy link
Member

The other difficulty added here is that without deep analysis of the h264 stream, there's no real way to know.
All of the MSE specs imply that we work at the container/demuxer level. Which is completely content agnostic.

It would be really unfortunate to change this...

@wolenetz
Copy link
Member Author

wolenetz commented Jul 6, 2017

OK. I'll add some telemetry logging to Chrome and proceed under the assumption that SAP Type 2 occurrences, if they occur, are rare.

Based on the results of such telemetry and perhaps further discussion, we might want to change this issue to "Deprecate ISOBMFF SAP Type 2 support".

@wolenetz wolenetz closed this as completed Jul 6, 2017
@wolenetz wolenetz reopened this Jul 6, 2017
@paulbrucecotton
Copy link

would adding something like lowest start time be considered an erratum?

The W3C Process document clearly defines classes of changes. It would appear that this may be a Type 3 change ie one that might
impact conformance.

From the current HME WG charter:

Note: As of May 2017, the HTML Media Extensions remains to make corrections that do not add new features to its Recommendation(s) accomplished within the scope of the charter.

HME WG is currently chartered to do only Type 1, Type 2 and Type 3 changes. If I am correct that your proposed change is a Type 3 change then it is scope of the HME WG.

/paulc
HME WG Chair

@cconcolato
Copy link

FYI, in GPAC, we have never seen SAP Type 2 used.
And IIRC, it was 'invented' during the DASH standardization within MPEG for the sake of completeness.

@wolenetz
Copy link
Member Author

wolenetz commented Jul 7, 2017

@cconcolato (#187 (comment)) that is more excellent news. Thank you for providing that data point.

@wolenetz
Copy link
Member Author

wolenetz commented Jul 7, 2017

With https://chromium-review.googlesource.com/c/563017/2 applied to Chrome to detect SAP Type 2, spot-checking several sites showed that at least nest.com's desktop site uses SAP Type 2 + MSE; youtube and netflix desktop site spot-checks showed only SAP Type 1. So it appears SAP Type 2 is used at least in some cases. I'm not certain this warrants adding lowest end time case to fully support SAP Type 2 versus a deprecation warning. I've chosen the latter for the moment.

MXEBot pushed a commit to mirror/chromium that referenced this issue Jul 8, 2017
If a random access point doesn't have the earliest presentation time of
other frames that depend on it (eg, other frames in later decode time up
until the next random access point), the MSE spec was not designed to
support processing and buffering it well. With the change to managing
and reporting buffered ranges by PTS intervals instead of DTS intervals,
this could impact interop. This change detects this general case and
logs once per track to chrome://media-internals. Later changes might
include telemetry collection to assist removing or fixing support for
at least SAP Type 2 in the MSE ISOBMFF bytestream spec.

To verify the new log is emitted by the new test, this change also
upgrades FrameProcessorTest's |media_log_| to a StrictMock<MockMediaLog>
and includes new strict verification of logs emitted during
FrameProcessorTests.

See also related spec issue w3c/media-source#187

BUG=739931,718641

Change-Id: I361177dee6a5c70edf17bdbde2f3ea643977e6ec
Reviewed-on: https://chromium-review.googlesource.com/563017
Commit-Queue: Chrome Cunningham <chcunningham@chromium.org>
Reviewed-by: Chrome Cunningham <chcunningham@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485125}
@jyavenard
Copy link
Member

Upon further thought, the use of SAP type 2 will have no negative impact with firefox MSE stack. There's no need for the I-Frame to be the earliest presented frame of the GOP.

@wolenetz
Copy link
Member Author

So far, preliminary (Chrome M61 - dev channel) shows a small % of occurrences of SAP-type-2 usage in MSE. However, this data is very preliminary, I'll update with public numbers once the related telemetry logging is in a more significant portion of Chrome usage.

xqq pushed a commit to xqq/chromium-media that referenced this issue Nov 14, 2017
If a random access point doesn't have the earliest presentation time of
other frames that depend on it (eg, other frames in later decode time up
until the next random access point), the MSE spec was not designed to
support processing and buffering it well. With the change to managing
and reporting buffered ranges by PTS intervals instead of DTS intervals,
this could impact interop. This change detects this general case and
logs once per track to chrome://media-internals. Later changes might
include telemetry collection to assist removing or fixing support for
at least SAP Type 2 in the MSE ISOBMFF bytestream spec.

To verify the new log is emitted by the new test, this change also
upgrades FrameProcessorTest's |media_log_| to a StrictMock<MockMediaLog>
and includes new strict verification of logs emitted during
FrameProcessorTests.

See also related spec issue w3c/media-source#187

BUG=739931,718641

Change-Id: I361177dee6a5c70edf17bdbde2f3ea643977e6ec
Reviewed-on: https://chromium-review.googlesource.com/563017
Commit-Queue: Chrome Cunningham <chcunningham@chromium.org>
Reviewed-by: Chrome Cunningham <chcunningham@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#485125}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 495493bad91c1fb4dd30462bb04b6f79c96ecc88
@jernoble
Copy link

WebKit has received reports that this situation seems to be occurring regularly on CNN.com videos, e.g. this one. Overlapping appends of SAP type 2 content causes "green flashes" where the decoder tries to apply a "type 2" B-frame to a "type 1" I-frame.

@jernoble
Copy link

(Though technically, the content seems to be MPEG-2TS, not strictly ISOBMFF content. But since TS is supported as a MSE stream type, the issue still needs fixing somehow.)

ashkulz pushed a commit to qtwebkit/webkit-mirror that referenced this issue Jan 24, 2020
    [MSE] Decode glitches when watching videos on CNN.com
    https://bugs.webkit.org/show_bug.cgi?id=206412
    <rdar://problem/55685630>

    Reviewed by Xabier Rodriguez-Calvar.

    Source/WebCore:

    Test: media/media-source/media-source-samples-out-of-order.html

    The "Coded frame processing" algorithm has a known shortcoming <w3c/media-source#187>
    when dealing appends of with "SAP Type 2" content, or in general terms, appending data where the resulting samples
    have presentation times that do not increase monotonically. When this occurs, the ordering of samples in presentation
    time will be different from the ordering of samples in decode time. The decoder requires samples to be enqueued in
    decode time order, but the MSE specification only checks for overlapping samples in presentation time order. During
    appends of out-of-order samples, this can lead to new samples being inserted between a previously appended sample and
    the sample on which that sample depends.

    To resolve this, add a new step in the implementation of the "coded frame processing" algorithm in
    SourceBuffer::sourceBufferPrivateDidReceiveSample(). When the incoming frame is a sync sample, search forward
    in the TrackBuffer for all previous samples in between the new sync sample, and the next sync sample. All the
    samples found in this step would fail to decode correctly if enqueued after the new (possibly different resolution)
    sync sample, so they are removed in this step.

    * Modules/mediasource/SampleMap.cpp:
    (WebCore::DecodeOrderSampleMap::findSampleAfterDecodeKey):
    * Modules/mediasource/SampleMap.h:
    * Modules/mediasource/SourceBuffer.cpp:
    (WebCore::SourceBuffer::sourceBufferPrivateDidReceiveSample):

    LayoutTests:

    * media/media-source/media-source-samples-out-of-order-expected.txt: Added.
    * media/media-source/media-source-samples-out-of-order.html: Added.

    git-svn-id: https://svn.webkit.org/repository/webkit/trunk@254761 268f45cc-cd09-0410-ab3c-d52691b4dbfc

git-svn-id: http://svn.webkit.org/repository/webkit/branches/safari-609-branch@255025 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@mwatson2 mwatson2 added this to the V2 milestone Sep 21, 2020
@mwatson2 mwatson2 added the agenda Topic should be discussed in a group call label Sep 21, 2020
@mwatson2 mwatson2 modified the milestones: V2, Backlog Sep 21, 2020
@mwatson2 mwatson2 removed the agenda Topic should be discussed in a group call label Sep 21, 2020
JonWBedard pushed a commit to WebKit/WebKit that referenced this issue Dec 14, 2020
    [MSE] Decode glitches when watching videos on CNN.com
    https://bugs.webkit.org/show_bug.cgi?id=206412
    <rdar://problem/55685630>

    Reviewed by Xabier Rodriguez-Calvar.

    Source/WebCore:

    Test: media/media-source/media-source-samples-out-of-order.html

    The "Coded frame processing" algorithm has a known shortcoming <w3c/media-source#187>
    when dealing appends of with "SAP Type 2" content, or in general terms, appending data where the resulting samples
    have presentation times that do not increase monotonically. When this occurs, the ordering of samples in presentation
    time will be different from the ordering of samples in decode time. The decoder requires samples to be enqueued in
    decode time order, but the MSE specification only checks for overlapping samples in presentation time order. During
    appends of out-of-order samples, this can lead to new samples being inserted between a previously appended sample and
    the sample on which that sample depends.

    To resolve this, add a new step in the implementation of the "coded frame processing" algorithm in
    SourceBuffer::sourceBufferPrivateDidReceiveSample(). When the incoming frame is a sync sample, search forward
    in the TrackBuffer for all previous samples in between the new sync sample, and the next sync sample. All the
    samples found in this step would fail to decode correctly if enqueued after the new (possibly different resolution)
    sync sample, so they are removed in this step.

    * Modules/mediasource/SampleMap.cpp:
    (WebCore::DecodeOrderSampleMap::findSampleAfterDecodeKey):
    * Modules/mediasource/SampleMap.h:
    * Modules/mediasource/SourceBuffer.cpp:
    (WebCore::SourceBuffer::sourceBufferPrivateDidReceiveSample):

    LayoutTests:

    * media/media-source/media-source-samples-out-of-order-expected.txt: Added.
    * media/media-source/media-source-samples-out-of-order.html: Added.

    Identifier: 219507@main
    git-svn-id: https://svn.webkit.org/repository/webkit/trunk@254761 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Identifier: 218903.102@safari-609-branch
git-svn-id: https://svn.webkit.org/repository/webkit/branches/safari-609-branch@255025 268f45cc-cd09-0410-ab3c-d52691b4dbfc
ryanhaddad pushed a commit to WebKit/WebKit that referenced this issue Dec 22, 2020
https://bugs.webkit.org/show_bug.cgi?id=206412
<rdar://problem/55685630>

Reviewed by Xabier Rodriguez-Calvar.

Source/WebCore:

Test: media/media-source/media-source-samples-out-of-order.html

The "Coded frame processing" algorithm has a known shortcoming <w3c/media-source#187>
when dealing appends of with "SAP Type 2" content, or in general terms, appending data where the resulting samples
have presentation times that do not increase monotonically. When this occurs, the ordering of samples in presentation
time will be different from the ordering of samples in decode time. The decoder requires samples to be enqueued in
decode time order, but the MSE specification only checks for overlapping samples in presentation time order. During
appends of out-of-order samples, this can lead to new samples being inserted between a previously appended sample and
the sample on which that sample depends.

To resolve this, add a new step in the implementation of the "coded frame processing" algorithm in
SourceBuffer::sourceBufferPrivateDidReceiveSample(). When the incoming frame is a sync sample, search forward
in the TrackBuffer for all previous samples in between the new sync sample, and the next sync sample. All the
samples found in this step would fail to decode correctly if enqueued after the new (possibly different resolution)
sync sample, so they are removed in this step.

* Modules/mediasource/SampleMap.cpp:
(WebCore::DecodeOrderSampleMap::findSampleAfterDecodeKey):
* Modules/mediasource/SampleMap.h:
* Modules/mediasource/SourceBuffer.cpp:
(WebCore::SourceBuffer::sourceBufferPrivateDidReceiveSample):

LayoutTests:

* media/media-source/media-source-samples-out-of-order-expected.txt: Added.
* media/media-source/media-source-samples-out-of-order.html: Added.


Canonical link: https://commits.webkit.org/219507@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@254761 268f45cc-cd09-0410-ab3c-d52691b4dbfc
JonWBedard pushed a commit to WebKit/WebKit that referenced this issue Jan 6, 2021
    [MSE] Decode glitches when watching videos on CNN.com
    https://bugs.webkit.org/show_bug.cgi?id=206412
    <rdar://problem/55685630>

    Reviewed by Xabier Rodriguez-Calvar.

    Source/WebCore:

    Test: media/media-source/media-source-samples-out-of-order.html

    The "Coded frame processing" algorithm has a known shortcoming <w3c/media-source#187>
    when dealing appends of with "SAP Type 2" content, or in general terms, appending data where the resulting samples
    have presentation times that do not increase monotonically. When this occurs, the ordering of samples in presentation
    time will be different from the ordering of samples in decode time. The decoder requires samples to be enqueued in
    decode time order, but the MSE specification only checks for overlapping samples in presentation time order. During
    appends of out-of-order samples, this can lead to new samples being inserted between a previously appended sample and
    the sample on which that sample depends.

    To resolve this, add a new step in the implementation of the "coded frame processing" algorithm in
    SourceBuffer::sourceBufferPrivateDidReceiveSample(). When the incoming frame is a sync sample, search forward
    in the TrackBuffer for all previous samples in between the new sync sample, and the next sync sample. All the
    samples found in this step would fail to decode correctly if enqueued after the new (possibly different resolution)
    sync sample, so they are removed in this step.

    * Modules/mediasource/SampleMap.cpp:
    (WebCore::DecodeOrderSampleMap::findSampleAfterDecodeKey):
    * Modules/mediasource/SampleMap.h:
    * Modules/mediasource/SourceBuffer.cpp:
    (WebCore::SourceBuffer::sourceBufferPrivateDidReceiveSample):

    LayoutTests:

    * media/media-source/media-source-samples-out-of-order-expected.txt: Added.
    * media/media-source/media-source-samples-out-of-order.html: Added.

    Canonical link: https://commits.webkit.org/219507@main
    git-svn-id: https://svn.webkit.org/repository/webkit/trunk@254761 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Canonical link: https://commits.webkit.org/218903.102@safari-609-branch
git-svn-id: https://svn.webkit.org/repository/webkit/branches/safari-609-branch@255025 268f45cc-cd09-0410-ab3c-d52691b4dbfc
JonWBedard pushed a commit to WebKit/WebKit that referenced this issue Jan 27, 2022
    [MSE] Decode glitches when watching videos on CNN.com
    https://bugs.webkit.org/show_bug.cgi?id=206412
    <rdar://problem/55685630>

    Reviewed by Xabier Rodriguez-Calvar.

    Source/WebCore:

    Test: media/media-source/media-source-samples-out-of-order.html

    The "Coded frame processing" algorithm has a known shortcoming <w3c/media-source#187>
    when dealing appends of with "SAP Type 2" content, or in general terms, appending data where the resulting samples
    have presentation times that do not increase monotonically. When this occurs, the ordering of samples in presentation
    time will be different from the ordering of samples in decode time. The decoder requires samples to be enqueued in
    decode time order, but the MSE specification only checks for overlapping samples in presentation time order. During
    appends of out-of-order samples, this can lead to new samples being inserted between a previously appended sample and
    the sample on which that sample depends.

    To resolve this, add a new step in the implementation of the "coded frame processing" algorithm in
    SourceBuffer::sourceBufferPrivateDidReceiveSample(). When the incoming frame is a sync sample, search forward
    in the TrackBuffer for all previous samples in between the new sync sample, and the next sync sample. All the
    samples found in this step would fail to decode correctly if enqueued after the new (possibly different resolution)
    sync sample, so they are removed in this step.

    * Modules/mediasource/SampleMap.cpp:
    (WebCore::DecodeOrderSampleMap::findSampleAfterDecodeKey):
    * Modules/mediasource/SampleMap.h:
    * Modules/mediasource/SourceBuffer.cpp:
    (WebCore::SourceBuffer::sourceBufferPrivateDidReceiveSample):

    LayoutTests:

    * media/media-source/media-source-samples-out-of-order-expected.txt: Added.
    * media/media-source/media-source-samples-out-of-order.html: Added.

    git-svn-id: https://svn.webkit.org/repository/webkit/trunk@254761 268f45cc-cd09-0410-ab3c-d52691b4dbfc

git-svn-id: https://svn.webkit.org/repository/webkit/branches/safari-609-branch@255025 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@wolenetz
Copy link
Member Author

wolenetz commented Feb 9, 2022

Chrome has a user that reported SAP-Type-2 buffering causing gaps, due to buffering segments backwards in (chunk3,2,1 for example). Obtaining clarity in the spec would help implementations both support SAP-Type-2 in MSE and also improve interop for such media. See https://crbug.com/1008756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants