Extend EncodedVideoChunkMetadata for Spatial Scalability #756

aboba · 2024-01-04T06:21:23Z

Fixes #619

Rebase and update of PR #654

Related: w3c/webrtc-encoded-transform#220

💥 Error: 400 Bad Request 💥

PR Preview failed to build. (Last tried on Jan 9, 2024, 10:27 PM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 CSS Spec Preprocessor - CSS Spec Preprocessor is the web service used to build Bikeshed specs.

🔗 Related URL

Error running preprocessor, returned code: 2.
FATAL ERROR: Couldn't find target frameId 'dict-member':
&lt;span data-dict-member-info="" for="EncodedVideoChunkMetadat/frameId">&lt;/span>
 ✘  Did not generate, due to errors exceeding the allowed error level.

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

Fixes #619 Rebase and update of PR #654

aboba · 2024-01-04T08:12:06Z

@kalradivyanshu @fippo PTAL.

kalradivyanshu · 2024-01-04T08:19:58Z

@aboba Looks good. So we just set L3T3 in encoder, and each frame will tell us which spatial and temporal layer it belongs in, and which frames are its dependencies, then in the decoder nothing changes, we just make sure all dependencies are fed in before feeding in the frame, and it just works, right?

aboba · 2024-01-04T13:57:47Z

For a frame to be decodable, all its dependencies need to have been received and decoded without an error callback. From the conference server perspective, this means not only tracking what frames were sent to each participant, but also the transport status (whether the frame was completely received) and whether it was successfully decoded. Currently the underlying encoder API limits avenues available for repair to keyframe generation, retransmission and forward error correction. Alternate Long Term Reference (LTR) frames or layer refresh (LRR) are not yet supported.

Djuffin · 2024-01-04T16:41:48Z

index.src.html

-        8. If |encoderConfig|.{{VideoEncoderConfig/alpha}} is set to `"keep"`:
+        8. If |encoderConfig|.{{VideoEncoderConfig/scalabilityMode}}
+            describes multiple [=spatial layers=]:
+            1. Let |svc| be a new {{SvcOutputMetadata}} instance.


This will override everything produced by the step 8 for SVC modes that have both temporal and spatial layers. for example: L2T2

I think I have fixed this. PTAL.

Djuffin · 2024-01-04T16:45:23Z

index.src.html

@@ -1704,6 +1717,9 @@

 dictionary SvcOutputMetadata {
  unsigned long temporalLayerId;
+  unsigned long spatialLayerId;
+  unsigned long long frameId;


frameId and dependencies are never set

@Djuffin @tonyherre dependencies would be set to the sequence of frameId values that the encodedChunk depends on. But the bigger question is the behavior of frameId, which is discussed here: w3c/webrtc-encoded-transform#220

aboba · 2024-01-05T17:22:23Z

@tonyherre PTAL

index.src.html

kalradivyanshu · 2024-02-13T09:56:03Z

Thankyou so much for this @aboba , what all is left in this PR to get it accepted in the spec?

aboba · 2024-02-13T15:04:39Z

@kalradivyanshu It has been noted that spatial scalability is not widely used today because it is not hardware accelerated and therefore create power and thermal issues on mobile devices. As a result, applications are using spatial simulcast instead. Also, the current WebCodecs API does not support layer refresh, which means that if a spatial frame is lost, a base layer keyframe is required, rather than just creating a new spatial frame referencing a received base-layer frame (e.g. moving to a new Long-Term Reference).

@Djuffin has argued that these problems need to be fixed before spatial scalability could become popular in WebCodecs, and therefore that it would make sense to focus on a new encoder API that can address the problems rather than just shipping a (potentially unusable) feature.

kalradivyanshu · 2024-02-13T15:27:52Z

Oh ok. Couple of things:

Spatial scalability is critical for low network restraint according to me, whenever thinking about video streaming/communication, its always a trade off b/w cpu and network, yes CPU used will be more, but simulcast needs keyframes whenever switching layers (specially since for AV1 webcodecs doesnt support switch frames Support for AV1 switch frames #747). And even with hardware acceleration, keyframes are notoriously a lot bigger, so not only will they clog the network, but will cause issues with CPU as well.
If I am using simulcast, if I lose a frame, I still have to rely on a keyframe, so its still a problem as I have mentioned, switching layers mean another keyframe, so either application have to regularly generate keyframe, or create feedback machanisms like PLI, which cause scale issues when broadcasting to larger crowd.

While I agree with the issues @Djuffin raised, I honestly feel that since the new API is atleast a year away, spatial scalability should be added or at the very least stuff like switch frames should be added to make simulcast more usable. Without any of these, the only solution is to do simulcast with keyframes requests for every switch, which in turn will add a huge load on encoder and also the decoder and network.

Thank you both for all your work!

Extend EncodedVideoChunkMetadata for Spatial Scalability

517f6a0

Fixes #619 Rebase and update of PR #654

aboba requested a review from Djuffin January 4, 2024 06:21

aboba self-assigned this Jan 4, 2024

aboba mentioned this pull request Jan 4, 2024

Extend EncodedVideoChunkMetadata for Spatial SVC #654

Closed

Djuffin reviewed Jan 4, 2024

View reviewed changes

aboba mentioned this pull request Jan 5, 2024

Is RTCEncodedVideoFrameMetadata.frame_id actually an unsigned long long or does it wrap at 16 bits? w3c/webrtc-encoded-transform#220

Closed

tonyherre reviewed Jan 8, 2024

View reviewed changes

index.src.html Outdated Show resolved Hide resolved

aboba added 3 commits January 9, 2024 14:13

Move frameId and dependencies

c7dcedc

Fix temporal/spatial layer assignments

16ba584

Move frameId and dependencies

9efde67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend EncodedVideoChunkMetadata for Spatial Scalability #756

Extend EncodedVideoChunkMetadata for Spatial Scalability #756

aboba commented Jan 4, 2024 •

edited by pr-preview bot

aboba commented Jan 4, 2024 •

edited

kalradivyanshu commented Jan 4, 2024

aboba commented Jan 4, 2024

Djuffin Jan 4, 2024

aboba Jan 9, 2024

Djuffin Jan 4, 2024

aboba Jan 10, 2024

aboba commented Jan 5, 2024

kalradivyanshu commented Feb 13, 2024

aboba commented Feb 13, 2024

kalradivyanshu commented Feb 13, 2024

Extend EncodedVideoChunkMetadata for Spatial Scalability #756

Are you sure you want to change the base?

Extend EncodedVideoChunkMetadata for Spatial Scalability #756

Conversation

aboba commented Jan 4, 2024 • edited by pr-preview bot

💥 Error: 400 Bad Request 💥

aboba commented Jan 4, 2024 • edited

kalradivyanshu commented Jan 4, 2024

aboba commented Jan 4, 2024

Djuffin Jan 4, 2024

Choose a reason for hiding this comment

aboba Jan 9, 2024

Choose a reason for hiding this comment

Djuffin Jan 4, 2024

Choose a reason for hiding this comment

aboba Jan 10, 2024

Choose a reason for hiding this comment

aboba commented Jan 5, 2024

kalradivyanshu commented Feb 13, 2024

aboba commented Feb 13, 2024

kalradivyanshu commented Feb 13, 2024

aboba commented Jan 4, 2024 •

edited by pr-preview bot

aboba commented Jan 4, 2024 •

edited