Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add background segmentation mask #142

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eehakkin
Copy link
Contributor

@eehakkin eehakkin commented May 8, 2024

Hi!

This adds capabilities, constraints and settings for background segmentation mask. Those are fairly obvious.

For the feature to be useful, the actual background segmentation mask must be provided to web apps. There are various ways to do that:

  1. In my PoC, I changed the stream of video frames to be a stream of interleaved background segmentation mask and real video frames and extended video frame metadata with a background segmentation mask flag so that web apps can tell segmentation mask and real video frames apart.
    However, that makes it awkward to process such streams and very unclear how to encode them.
  2. In this PR, the real video frame and background segmentation mask frame are bundled together which simplifies processing of the streams and allows encoders to encode real video frames normally. The background segmentation mask frames for their part are mostly for local consumption only.
  3. Another option would be to utilize an alpha channel. However, there are problems with that approach:
    • Some pixel formats (such as NV12 and NV21) do not have corresponding alpha channel formats. So it would not be possible to such add an alpha channel and then later to drop it in order to get the original frame. Instead of that, the whole frame would have to be converted to a difference format.
    • There are no canvas composite operations, for instance, to operate with alpha masks whereas they work great with grayscale masks.
    • Pixels which are certainly background would be completely transparent. For completely transparent pixel the color is practically irrelevant and some compression algorithms could group all completely transparent pixel together and thus lose color information.

/cc @riju


Preview | Diff

@riju
Copy link

riju commented May 14, 2024

Thanks @eehakkin
In the explainer, we list the difference between Blur and Mask , provide an example code to create a green screen using this feature and also a demo of what BG Segmentation MASK looks like in our Chrome PoC and what you can do (replacement, gif, image, green screen, etc).

In many cases, it might be important to have access to the original camera feed, so BG MASK retains the original frames intact, does segmentation and provides mask frames in addition to the original video frames thus web applications receive both the original frames and mask frames in the same video frame stream

This PR follows up our presentation of BG Segmentation MASK in the monthly WebRTC WG call [Minutes]

PTAL @jan-ivar @aboba @alvestrand @youennf

@eladalon1983 eladalon1983 self-requested a review May 14, 2024 11:27
Copy link
Member

@eladalon1983 eladalon1983 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general thrust of this effort is very useful for Web applications.

<p>A background segmentation mask with
white denoting certainly foreground,
black denoting certainly background and
grey denoting uncertainty.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really only "uncertainty" that's represented? Is it perhaps sometimes partial transparency, and sometimes ambiguity?

Could anything be said here to clarify that shades of grey tend more towards the foreground/background based on being lighter/darker?

<h3>VideoFrame interface extensions</h3>
<pre class="idl">
partial interface VideoFrame {
readonly attribute VideoFrame? backgroundSegmentationMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this isn't going to suffer infinite recursion because the second layer deep will be guaranteed nullable. But it still strikes me as a bit odd to expose a full VideoFrame here, with all its present and future fields, when what we really wish to get is a matrix of integer values of a limited range.

};

partial dictionary MediaTrackConstraintSet {
ConstrainBoolean backgroundSegmentationMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it ever be interesting and feasible to tweak the parameters by which segmentation is done?

Copy link

@riju riju May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atleast on Windows, the platform model does not allow tweaking segmentation parameters today. Using tensorflow.js with BodyPix model for Blur, I see there's atleast a segmentationThreshold parameter. Maybe it's the same as foregroundThresholdProbability with the MediaPipeSelfieSegmentation model ?

Did you have some other parameters in mind ?

mediapipe_parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have some other parameters in mind?

I am not knowledgeable enough on what parameters would be best to include. I was mostly wondering if this is something we foresee extending from a boolean to a set of parameters, and if so, whether there was a viable path for such future extensions given the current API shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants