Add background segmentation mask #142

eehakkin · 2024-05-08T22:28:51Z

Hi!

This adds capabilities, constraints and settings for background segmentation mask. Those are fairly obvious.

For the feature to be useful, the actual background segmentation mask must be provided to web apps. There are various ways to do that:

In my PoC, I changed the stream of video frames to be a stream of interleaved background segmentation mask and real video frames and extended video frame metadata with a background segmentation mask flag so that web apps can tell segmentation mask and real video frames apart.
However, that makes it awkward to process such streams and very unclear how to encode them.
In this PR, the real video frame and background segmentation mask frame are bundled together which simplifies processing of the streams and allows encoders to encode real video frames normally. The background segmentation mask frames for their part are mostly for local consumption only.
Another option would be to utilize an alpha channel. However, there are problems with that approach:
- Some pixel formats (such as NV12 and NV21) do not have corresponding alpha channel formats. So it would not be possible to such add an alpha channel and then later to drop it in order to get the original frame. Instead of that, the whole frame would have to be converted to a difference format.
- There are no canvas composite operations, for instance, to operate with alpha masks whereas they work great with grayscale masks.
- Pixels which are certainly background would be completely transparent. For completely transparent pixel the color is practically irrelevant and some compression algorithms could group all completely transparent pixel together and thus lose color information.

/cc @riju

Preview | Diff

riju · 2024-05-14T10:09:15Z

Thanks @eehakkin
In the explainer, we list the difference between Blur and Mask , provide an example code to create a green screen using this feature and also a demo of what BG Segmentation MASK looks like in our Chrome PoC and what you can do (replacement, gif, image, green screen, etc).

In many cases, it might be important to have access to the original camera feed, so BG MASK retains the original frames intact, does segmentation and provides mask frames in addition to the original video frames thus web applications receive both the original frames and mask frames in the same video frame stream

This PR follows up our presentation of BG Segmentation MASK in the monthly WebRTC WG call [Minutes]

PTAL @jan-ivar @aboba @alvestrand @youennf

eladalon1983

I think the general thrust of this effort is very useful for Web applications.

eladalon1983 · 2024-05-14T13:38:30Z

index.html

+              <p>A background segmentation mask with
+                 white denoting certainly foreground,
+                 black denoting certainly background and
+                 grey denoting uncertainty.</p>


Is it really only "uncertainty" that's represented? Is it perhaps sometimes partial transparency, and sometimes ambiguity?

Could anything be said here to clarify that shades of grey tend more towards the foreground/background based on being lighter/darker?

eladalon1983 · 2024-05-14T13:39:29Z

index.html

+      <h3>VideoFrame interface extensions</h3>
+      <pre class="idl">
+partial interface VideoFrame {
+  readonly attribute VideoFrame? backgroundSegmentationMask;


I imagine this isn't going to suffer infinite recursion because the second layer deep will be guaranteed nullable. But it still strikes me as a bit odd to expose a full VideoFrame here, with all its present and future fields, when what we really wish to get is a matrix of integer values of a limited range.

eladalon1983 · 2024-05-14T13:45:46Z

index.html

+};
+
+partial dictionary MediaTrackConstraintSet {
+  ConstrainBoolean backgroundSegmentationMask;


Would it ever be interesting and feasible to tweak the parameters by which segmentation is done?

Atleast on Windows, the platform model does not allow tweaking segmentation parameters today. Using tensorflow.js with BodyPix model for Blur, I see there's atleast a segmentationThreshold parameter. Maybe it's the same as foregroundThresholdProbability with the MediaPipeSelfieSegmentation model ?

Did you have some other parameters in mind ?

Did you have some other parameters in mind?

I am not knowledgeable enough on what parameters would be best to include. I was mostly wondering if this is something we foresee extending from a boolean to a set of parameters, and if so, whether there was a viable path for such future extensions given the current API shape.

Add background segmentation mask

558751d

eladalon1983 self-requested a review May 14, 2024 11:27

eladalon1983 reviewed May 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add background segmentation mask #142

Add background segmentation mask #142

eehakkin commented May 8, 2024 •

edited by pr-preview bot

riju commented May 14, 2024 •

edited

eladalon1983 left a comment

eladalon1983 May 14, 2024

eladalon1983 May 14, 2024

eladalon1983 May 14, 2024

riju May 15, 2024 •

edited

eladalon1983 May 15, 2024

Add background segmentation mask #142

Are you sure you want to change the base?

Add background segmentation mask #142

Conversation

eehakkin commented May 8, 2024 • edited by pr-preview bot

riju commented May 14, 2024 • edited

eladalon1983 left a comment

Choose a reason for hiding this comment

eladalon1983 May 14, 2024

Choose a reason for hiding this comment

eladalon1983 May 14, 2024

Choose a reason for hiding this comment

eladalon1983 May 14, 2024

Choose a reason for hiding this comment

riju May 15, 2024 • edited

Choose a reason for hiding this comment

eladalon1983 May 15, 2024

Choose a reason for hiding this comment

eehakkin commented May 8, 2024 •

edited by pr-preview bot

riju commented May 14, 2024 •

edited

riju May 15, 2024 •

edited