Google Meet background segmentation model #4177

jameshfisher · 2020-11-03T17:08:27Z

System information

TensorFlow.js version (you are using): 2
Are you willing to contribute it (Yes/No): No, it's not mine

Describe the feature and the current behavior/state.
This Google AI blog post describes the background segmentation model used in Google Meet. This model would be an excellent complement to the models in the tfjs-models collection. (The existing BodyPix model can be (ab)used for background segmentation, but has quality and performance issues for this use-case. I expect the Google Meet model improves on this.)

Will this change the current api? How?
No, it would be an addition to tfjs-models.

Who will benefit with this feature?
Apps consuming and/or displaying a user-facing camera feed. WebRTC video chat apps are the most obvious, where background blur/replacement is becoming expected. I also expect it could be a useful preprocessing step before applying e.g. PoseNet. It can also be used creatively on images as a pre-processing step -- for example, this recent app to enhance profile pictures integrates a background segmentation solution.

rthadur · 2020-11-03T17:34:45Z

cc @annxingyuan @tafsiri

simon-lanf · 2020-11-06T18:56:49Z

this would be useful for us.

tafsiri · 2020-11-06T19:52:36Z

I'll pass this on to our PM.

jameshfisher · 2020-11-11T08:58:35Z

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

jasonmayes · 2020-11-11T18:42:05Z

+1 to this! Would love to see this as part of the model repos for TFJS - a lot of people making Chrome Extensions to do great things in video calls etc and this would just make those experiences even more efficient when running to get higher FPS etc.

alvaroschipper · 2020-11-25T15:56:35Z

+1 to this, would be a great, faster alternative to body-pix, really impressed by the performance in Google Meet :)

kirawi · 2020-12-16T17:14:02Z

Very desirable to have! Though I did just link to this issue from the Jitsi Meets repository, I think it would be very cool to have for other projects that need this functionality but don't have the capabilities to develop an in-house model.

jameshfisher · 2020-12-16T17:57:38Z

The blog post about this model links to this Model Card describing the model, which reads

LICENSED UNDER Apache License, Version 2.0

The Model Card also links to this paper describing Model Cards in general, which says that Model Cards can describe a license that the model is released under. So I believe the above license applies to the described model itself (e.g. rather than to the Model Card document).

So it seems like the raw .tflite model here is already Apache-licensed! @jasonmayes would you agree with this / is this Google's position?

(Thanks to @blaueente for originally noting this license in the Model Card!)

stanhrivnak · 2020-12-22T23:49:00Z

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

@jameshfisher I have successfully deployed the raw tflite model (BTW. many thanks for the link!) within a desktop app using MediaPipe. But I failed to do so for web app, since MediaPipe doesn't have any documentation for it yet (just some JS API's for specific examples, but not for custom models). But it looks like you're saying that you did it. How? Have you extracted the layers of the model + weights and "manually" created the same TF model and then converted it to TFJS? Or have you managed to compile the tflite to wasm and use MediaPipe?
Many thanks!

kirawi · 2020-12-22T23:52:07Z

@stanhrivnak I found this while looking into it myself: https://gist.github.com/tworuler/bd7bd4c6cd9a8fbbeb060e7b64cfa008 Unfortunately, I'm not familiar with tensorflow (sad Amd gpu gang), so I have no idea how it works or how to modify it. PINTO0309 uses modified versions of that script for his tflite -> pb scripts.

PINTO0309 · 2020-12-24T15:58:56Z

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, I'm going to delete it.

kirawi · 2020-12-24T18:02:20Z

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, remove it.

Amazing work!

PINTO0309 · 2020-12-24T22:56:00Z

There was a Japanese engineer who implemented it in TFJS. There still seems to be a little problem with the conversion. It gets shifted to the left. Also, there is no smoothing post-processing called "light wrapping", so the border is jagged.

EqCOpUxU8AA9G2Z.mp4

kirawi · 2020-12-24T23:04:15Z

Is the shifting fixable?

PINTO0309 · 2020-12-24T23:07:18Z

I'm using my own tricks in the optimization phase, so that may be affecting the results. Please give me some time so I can try this out.

PINTO0309 · 2020-12-25T13:23:26Z

Is the shifting fixable?

It worked. However, the model resolution of 128x128 does not seem to be very accurate.

kirawi · 2020-12-25T18:28:26Z

That's unfortunate, but nonetheless amazing work man!

kirawi · 2020-12-26T00:14:25Z

Ah wait, I think that is intentional to reduce the computational requirements of the model. The bilateral filter mentioned in the blog further refines the mask, and it might be the case that the model works best with bright colours. I think all things considered, the model does its job fairly well. By the way, mind sharing the test setup you have for the model?

PINTO0309 · 2020-12-26T00:46:15Z

@kirawi
I did not use bilateral filter and just binarized the image, so the result may not be good.

### Download test.jpg
$ sudo gdown --id 1Tyv6P2zshOCqTgYBLoa0aC3Co8W-9JPG

### Download segm_lite_v509_128x128_float32.tflite
$ sudo gdown --id 1qOlcK8iKki_aAi_OrxE2YLaw5EZvQn1S

import numpy as np
from PIL import Image
try:
    from tflite_runtime.interpreter import Interpreter
except:
    from tensorflow.lite.python.interpreter import Interpreter

img = Image.open('test.jpg')
h = img.size[1]
w = img.size[0]
img = img.resize((128, 128))
img = np.asarray(img)
img = img / 255.
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

# Tensorflow Lite
interpreter = Interpreter(model_path='segm_lite_v509_128x128_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]['index']
output_details = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(input_details, img)
interpreter.invoke()
output = interpreter.get_tensor(output_details)

print(output.shape)
out1 = output[0][:, :, 0]
out2 = output[0][:, :, 1]

out1 = (out1 > 0.5) * 255
out2 = (out2 > 0.5) * 255

print('out1:', out1.shape)
print('out2:', out2.shape)

out1 = Image.fromarray(np.uint8(out1)).resize((w, h))
out2 = Image.fromarray(np.uint8(out2)).resize((w, h))

out1.save('out1.jpg')
out2.save('out2.jpg')

w-okada · 2020-12-26T01:01:20Z

I create the demo page to use PINTO's model converted to tensorflowjs.

https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/t11_googlemeet-segmentation/index.html

You can change input device with control panel at right side. If you want to use your camera device, please try.

And at default this page use new version of PINTO's model, but it seems shift to left a little yet...

You can change the model to old version of PINTO's model with the control panel at right side too.
Select modelPath and click reload model button.

PINTO0309 · 2020-12-26T01:10:04Z

I overlaid the image with the tflite implementation at hand. Does it shift when I apply the filter?

Screencast.2020-12-26.10.03.33.mp4

kirawi · 2020-12-26T01:59:10Z

I don't think it's shifting, it looks more like the one with the white background is capturing more of the background than the other one.

PINTO0309 · 2020-12-26T02:01:16Z

@kirawi
I am currently investigating this issue in collaboration with @w-okada on twitter.

w-okada · 2020-12-26T19:12:53Z

mmmm, I spent a lot of time to solve the "shifting" problem yesterday. However, I couldn't.
Can anybody help me?
This is my simple test code with nodejs.

const tf = require('@tensorflow/tfjs-node');
const fs = require('fs');
const jpeg = require('jpeg-js');
const { createCanvas, loadImage } = require('canvas')

const readImage = path => {
    const buf = fs.readFileSync(path)
    const pixels = jpeg.decode(buf, true)
    return pixels
}

const imageByteArray = (image, numChannels) => {
    const pixels = image.data
    const numPixels = image.width * image.height;
    const values = new Int32Array(numPixels * numChannels);
  
    for (let i = 0; i < numPixels; i++) {
      for (let channel = 0; channel < numChannels; ++channel) {
        values[i * numChannels + channel] = pixels[i * 4 + channel];
      }
    }  
    return values
}
  

const main = async()=>{
    const image = readImage("test.jpg")
    const handler = tf.io.fileSystem("./model/model.json");
    const model = await tf.loadGraphModel(handler)
    const numChannels=3
    const values = imageByteArray(image, numChannels)
    const outShape = [image.width, image.height, numChannels];
    let input = tf.tensor3d(values, outShape, 'float32');


    input = tf.image.resizeBilinear(input,[128, 128])
    input = input.expandDims(0)
    input = tf.cast(input, 'float32')
    input = input.div(tf.max(input))

    let predict = await model.predict(input)
    predict = predict.softmax()
    const res = await predict.arraySync()
    const bm = res[0]
    const width = bm[0].length
    const height = bm.length
    const canvas = createCanvas(width, height)
    const imageData = canvas.getContext("2d").getImageData(0, 0, canvas.width, canvas.height)
    for (let rowIndex = 0; rowIndex < canvas.height; rowIndex++) {
        for (let colIndex = 0; colIndex < canvas.width; colIndex++) {
            const pix_offset = ((rowIndex * canvas.width) + colIndex) * 4
            if(bm[rowIndex][colIndex][0]>0.5){
                imageData.data[pix_offset + 0] = 255
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }else{
                imageData.data[pix_offset + 0] = 0
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }
        }
    }
    // const imageDataTransparent = new NodeCanvasImageData(data, this.canvas.width, this.canvas.height);
    canvas.getContext("2d").putImageData(imageData, 0, 0)

    const tmpCanvas = createCanvas(image.width, image.height)
    tmpCanvas.getContext("2d").drawImage(canvas, 0, 0, tmpCanvas.width, tmpCanvas.height)
    const buf = tmpCanvas.toBuffer('image/png')
    fs.writeFileSync('./res.png', buf)
}

main()

stanhrivnak · 2020-12-28T01:20:18Z

Hi guys, first of all, many thanks to @PINTO0309, @w-okada, and others for putting your effort on this! Great work so far! I would really love to have this great model from google in my web app (currently I have bodypix with custom improvements, but still it sucks). Here are my 2 cents.
I have deployed the discussed original tflite model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) within a desktop app using MediaPipe and it performs amazingly (see the attached video) even under not optimal light conditions. What you see is the raw model performance without any post-processing (with it, it looks even better), resolution 128 x 128.
https://user-images.githubusercontent.com/64148065/103182841-d2053c80-48ae-11eb-8ba1-1a1518c9defb.mov

The implications are:

There is hope - the model is already good enough, the resolution 128 x 128 is high enough to have nice results when upsampling to SD/HD. Also, it's super-fast, inferences running well above 25 FPS.
There has to be a flaw in the manual conversion to h5/TFJS.

I think the best would be to compare the outputs of the original tflite model and the created TFJS model (or h5/tflite), layer after layer to see where it deviates and focus to fix that part.
The problem is that the original tflite model uses some custom ops, so it can't be read in python directly. But we know the definitions of these ops, here they are: (not sure if it uses all 3, but at least "Convolution2DTransposeBias", because that is the error it gives me in python)
https://github.com/google/mediapipe/tree/master/mediapipe/util/tflite/operations
The problem is that it's in C++, so it has to be rewritten to python or we need to go with Tensorflow C++. Also, as stated here:
google/mediapipe#35 (comment)
these custom ops are just merged existing operations, so it should be straight-forward.

So this is my plan. I can work on it only ~ 2 hours a day, so if you're faster, go for it and let me know! :) Or if you have any other ideas, share it please!

saghul · 2021-05-12T12:56:43Z

That's the one! Cheers!

w-okada · 2021-05-12T13:15:24Z

Wow!!

euan-smith · 2021-05-12T17:40:08Z

Note that although Google did release the Meet model under the Apache 2.0 licence with that model card pasted above, they no longer have it available for download and there is now a different card with a different licence.

ashikns · 2021-05-12T23:31:15Z

Yep. The new model is called "Xeno" meet segmentation or something. This is the apache released model: OneDrive link.

Also if you tinker around a bit with google meet webpage you can still download the models directly from Google, you just need to find the right url from the js script. At least that was still working as of February.

mgyong · 2021-06-09T22:16:10Z

Hi, I am product manager for MediaPipe. Please note that only the MediaPipe Selfie Segmentation Model is open sourced and licensed under Apache 2 for external use. Other versions, including those used in the Google Meet product, are licensed under Google Terms and Conditions and are not intended for open source use.

saghul · 2021-06-10T07:39:25Z

@jasonmayes Why was this closed?

jasonmayes · 2021-06-10T18:28:22Z

Closed as the folk from MediaPipe clarified the T&C for the models they released.

lina128 · 2021-06-11T17:34:35Z

Reopen to track the segmentation model release through tfjs API

wangqi-cybrook · 2021-07-16T21:17:52Z

From https://meet.google.com/,

https://meet.google.com/_/rtcvidproc/release/hashed/segm_full_sparse_v1008_0bda82336d236e21e52f2b74129b9883.dat
https://meet.google.com/_/rtcvidproc/release/hashed/segm_lite_v1082_c59fbb2b8451df2c2752e562c6523bcc.dat

Looks like the latest model is hashed and can not be downloaded anymore.

technikhil314 · 2021-10-02T11:51:21Z

From https://meet.google.com/,

https://meet.google.com/_/rtcvidproc/release/hashed/segm_full_sparse_v1008_0bda82336d236e21e52f2b74129b9883.dat https://meet.google.com/_/rtcvidproc/release/hashed/segm_lite_v1082_c59fbb2b8451df2c2752e562c6523bcc.dat

Looks like the latest model is hashed and can not be downloaded anymore.

@jimmy7799 We are doomed then? or have we found some way to get the model?

saghul · 2021-10-02T12:46:57Z

Even if you can get it, you are not allowed to use it.

technikhil314 · 2021-10-02T14:07:42Z

@saghul I know I just want to try that out on local. No intention to use it in open source or commercial project.

floe · 2021-10-03T15:54:37Z

JFYI, the MediaPipe Selfie Segmentation model is a) properly Apache licensed and b) can just be downloaded as an Android AAR archive. See https://drive.google.com/file/d/1dCfozqknMa068vVsO2j_1FgZkW_e3VWv/preview .

wangqi-cybrook · 2021-10-03T17:11:29Z

I ever tried the model in MediaPipe, but looks like the performance is not good as google meet one.

no-1ne · 2021-11-20T03:10:01Z

Mediapipe segmentation seems to be coming to tfjs https://github.com/tensorflow/tfjs-models/tree/master/body-segmentation/src

benbro · 2021-11-20T11:34:09Z

@no-1ne how is Mediapipe segmentation in tfjs different than using this js api?

no-1ne · 2021-11-20T20:34:23Z

It’s the same I believe they are porting it to use from within tfjs ecosystem

rthadur · 2022-05-03T19:18:49Z

Mediapipe segmentation model has been deployed here .please verify.

google-ml-butler · 2022-05-10T19:59:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

google-ml-butler · 2022-05-17T20:56:24Z

Closing as stale. Please @mention us if this needs more attention.

dganzella · 2023-05-04T20:25:41Z

i was wondering if there is any hd model (512 px) available?

jameshfisher added the type:feature New feature or request label Nov 3, 2020

jameshfisher changed the title ~~Google Meet background detection model~~ Google Meet background segmentation model Nov 3, 2020

rthadur assigned jasonmayes Nov 11, 2020

kirawi mentioned this issue Dec 16, 2020

Virtual backgrounds using BodyPix jitsi/jitsi-meet#5860

Closed

kirawi mentioned this issue Dec 23, 2020

Google Meets Background Segmentation Model? PINTO0309/PINTO_model_zoo#56

Closed

rthadur added the stat:contributions welcome label Jun 3, 2021

jasonmayes closed this as completed Jun 9, 2021

keijiro mentioned this issue Jun 10, 2021

Migrate to MediaPipe Selfie keijiro/SelfieBarracuda#3

Closed

lina128 self-assigned this Jun 11, 2021

lina128 reopened this Jun 11, 2021

rthadur added stat:awaiting response and removed stat:contributions welcome labels May 3, 2022

google-ml-butler bot added the stalled label May 10, 2022

google-ml-butler bot closed this as completed May 17, 2022

Navigation Menu

Google Meet background segmentation model #4177

Google Meet background segmentation model #4177

Comments

jameshfisher commented Nov 3, 2020 • edited

rthadur commented Nov 3, 2020

simon-lanf commented Nov 6, 2020

tafsiri commented Nov 6, 2020

jameshfisher commented Nov 11, 2020

jasonmayes commented Nov 11, 2020

alvaroschipper commented Nov 25, 2020

kirawi commented Dec 16, 2020

jameshfisher commented Dec 16, 2020 • edited

stanhrivnak commented Dec 22, 2020

kirawi commented Dec 22, 2020 • edited

PINTO0309 commented Dec 24, 2020 • edited

kirawi commented Dec 24, 2020

PINTO0309 commented Dec 24, 2020

kirawi commented Dec 24, 2020 • edited

PINTO0309 commented Dec 24, 2020

PINTO0309 commented Dec 25, 2020

kirawi commented Dec 25, 2020

kirawi commented Dec 26, 2020 • edited

PINTO0309 commented Dec 26, 2020 • edited

w-okada commented Dec 26, 2020 • edited

PINTO0309 commented Dec 26, 2020

kirawi commented Dec 26, 2020

PINTO0309 commented Dec 26, 2020

w-okada commented Dec 26, 2020

stanhrivnak commented Dec 28, 2020

saghul commented May 12, 2021

w-okada commented May 12, 2021

euan-smith commented May 12, 2021

ashikns commented May 12, 2021 • edited

mgyong commented Jun 9, 2021

saghul commented Jun 10, 2021

jasonmayes commented Jun 10, 2021

lina128 commented Jun 11, 2021

wangqi-cybrook commented Jul 16, 2021

technikhil314 commented Oct 2, 2021

saghul commented Oct 2, 2021

technikhil314 commented Oct 2, 2021

floe commented Oct 3, 2021

wangqi-cybrook commented Oct 3, 2021

no-1ne commented Nov 20, 2021

benbro commented Nov 20, 2021

no-1ne commented Nov 20, 2021

rthadur commented May 3, 2022

google-ml-butler bot commented May 10, 2022

google-ml-butler bot commented May 17, 2022

dganzella commented May 4, 2023

jameshfisher commented Nov 3, 2020 •

edited

jameshfisher commented Dec 16, 2020 •

edited

kirawi commented Dec 22, 2020 •

edited

PINTO0309 commented Dec 24, 2020 •

edited

kirawi commented Dec 24, 2020 •

edited

kirawi commented Dec 26, 2020 •

edited

PINTO0309 commented Dec 26, 2020 •

edited

w-okada commented Dec 26, 2020 •

edited

ashikns commented May 12, 2021 •

edited