RTCRtpEncodingParameters: scaleResolutionTo #159

henbos · 2023-04-13T08:23:41Z

We all know and love scaleResolutionDownBy... it let's you do this:

Capture 720p and apply expensive video effects on the track.
Send {active, scaleResolutionDownBy:1 (720p)} + {active, scaleResolutionDownBy:2 (360p)} simulcast.

But then server tells us 720p is not needed, so we send {inactive, active:360p}.
Now you might wonder, why are we applying expensive video effects on a 720p track if we're only sending it in 360p?

So we track.applyConstraints() to 1/2 downscale the track and we setParameters() to 2x the scaling factors to counter the fact that the input frame size is smaller. The end result is:

360p track sent as {inactive} + {active, scaleResolutionDownBy:1 (still 360p)}.

Problem: This is racy. The track changing size and the parameters updating the scaling factors are not in-sync, so you might send 720p on the VGA layers for a few frames. You're wasting a key frame on sending the wrong resolution and then you're wasting another key frame to adjust back to the desired 360p resolution. Working around this by temporarily inactivating all encodings while the track is resizing will appear like glitches to the receiver of the stream and there probably would still have to be key frames when the encodings are re-activated. The API was not designed for this.

Proposal: Add "scaleResolutionTo: 360" API which is equivalent to always having the correct scaling factor. In the above example, changing the track's size from 720p to 360p would not result in reconfiguring the encoder and there would be no key frames or races.

The text was updated successfully, but these errors were encountered:

henbos · 2023-04-13T08:26:27Z

Something like this already exists in libwebrtc (RtpEncodingParameters::requested_resolution) so it might be fairly straight-forward to experiment with this in JavaScript

henbos · 2023-04-13T08:31:55Z

@aboba

fippo · 2023-04-13T09:23:36Z

scaling resolution down by a factor (to make things worse: a freely choosable one instead of powers of two) has always been confusing so 👍 for fixing this.

Orphis · 2023-04-13T10:38:19Z

We may not want to have a single value, but a few more to describe how the change is done. While some resolutions are standard and we usually can recognise the usual ones from one dimension, we need both to work with both portrait and landscape tracks.

Quite often, the ratio will be different, so the behaviour needs to be defined there too. Do we want to rescale? Keep the ratio but crop? Or add black bars? Or fit to the height or width parameter and crop the outside parts?
If we're doing this, do we want crop coordinates too in the API? (for example to frame a participants head)

Orphis · 2023-04-13T10:40:51Z

Also, what happens if the track is smaller than the requested resolution? Should the aspect ratio be changed to match the requested resolution's ratio? So is it a min value, max value or target value?

henbos · 2023-04-13T10:52:21Z

Landscape vs portrait mode has to be addressed. There are a few different options I can think of:

If we use a single pixel value (e.g. 360 for 360p) then it could be defined as "height if height < width, otherwise width".
If a {width, height} dictionary is specified, we could crop-and-scale according to some rule.
Or we add more attributes to increase specificity from the app, perhaps something like VideoResizeModeEnum.

Regarding if the track is smaller than requested: I don't think we should ever up-scale, so I only see two options:

Either we send it as-is with scale factor 1.
Or we implicitly inactive that encoding.

I don't have strong opinions, just listing options.

dontcallmedom-bot · 2023-05-16T17:00:37Z

This issue had an associated resolution in WebRTC May 2023 meeting – 16 May 2023 (Issue #159: RTCRtpEncodingParameters: scaleResolutionDownTo):

RESOLUTION: Overall support in the direction, but details need to be fleshed out and complexity assessed

aigarrz · 2024-02-19T11:57:40Z

What should happen when video frame adapter reduces input resolution to below requested size?

Example:

Requested to send two stream simulcast: 720p @ 1M + 540p @ 1.2M
Initially input frame size is 720p
EncoderOveruse triggers down adaptation and reduces input frame size to 540p
Which of these layouts should be sent to peer?
[A] 540p @ 1M + 540p @ 1.2M
[B] 540p @ 1.2M
[C] 540p @ 1M

If the peer is a SFU, only option C avoids bitrate overshooting E2E.

aboba assigned henbos May 14, 2023

aboba added the May 2023 Virtual Interim label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTCRtpEncodingParameters: scaleResolutionTo #159

RTCRtpEncodingParameters: scaleResolutionTo #159

henbos commented Apr 13, 2023

henbos commented Apr 13, 2023

henbos commented Apr 13, 2023

fippo commented Apr 13, 2023 •

edited

Orphis commented Apr 13, 2023

Orphis commented Apr 13, 2023

henbos commented Apr 13, 2023 •

edited

dontcallmedom-bot commented May 16, 2023

aigarrz commented Feb 19, 2024

RTCRtpEncodingParameters: scaleResolutionTo #159

RTCRtpEncodingParameters: scaleResolutionTo #159

Comments

henbos commented Apr 13, 2023

henbos commented Apr 13, 2023

henbos commented Apr 13, 2023

fippo commented Apr 13, 2023 • edited

Orphis commented Apr 13, 2023

Orphis commented Apr 13, 2023

henbos commented Apr 13, 2023 • edited

dontcallmedom-bot commented May 16, 2023

aigarrz commented Feb 19, 2024

fippo commented Apr 13, 2023 •

edited

henbos commented Apr 13, 2023 •

edited