Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Meet background segmentation model #4177

Closed
jameshfisher opened this issue Nov 3, 2020 · 108 comments
Closed

Google Meet background segmentation model #4177

jameshfisher opened this issue Nov 3, 2020 · 108 comments
Assignees

Comments

@jameshfisher
Copy link
Contributor

jameshfisher commented Nov 3, 2020

System information

  • TensorFlow.js version (you are using): 2
  • Are you willing to contribute it (Yes/No): No, it's not mine

Describe the feature and the current behavior/state.
This Google AI blog post describes the background segmentation model used in Google Meet. This model would be an excellent complement to the models in the tfjs-models collection. (The existing BodyPix model can be (ab)used for background segmentation, but has quality and performance issues for this use-case. I expect the Google Meet model improves on this.)

Will this change the current api? How?
No, it would be an addition to tfjs-models.

Who will benefit with this feature?
Apps consuming and/or displaying a user-facing camera feed. WebRTC video chat apps are the most obvious, where background blur/replacement is becoming expected. I also expect it could be a useful preprocessing step before applying e.g. PoseNet. It can also be used creatively on images as a pre-processing step -- for example, this recent app to enhance profile pictures integrates a background segmentation solution.

@jameshfisher jameshfisher added the type:feature New feature or request label Nov 3, 2020
@rthadur
Copy link
Contributor

rthadur commented Nov 3, 2020

cc @annxingyuan @tafsiri

@jameshfisher jameshfisher changed the title Google Meet background detection model Google Meet background segmentation model Nov 3, 2020
@simon-lanf
Copy link

this would be useful for us.

@tafsiri
Copy link
Contributor

tafsiri commented Nov 6, 2020

I'll pass this on to our PM.

@jameshfisher
Copy link
Contributor Author

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

@jasonmayes
Copy link
Member

+1 to this! Would love to see this as part of the model repos for TFJS - a lot of people making Chrome Extensions to do great things in video calls etc and this would just make those experiences even more efficient when running to get higher FPS etc.

@alvaroschipper
Copy link

+1 to this, would be a great, faster alternative to body-pix, really impressed by the performance in Google Meet :)

@kirawi
Copy link

kirawi commented Dec 16, 2020

Very desirable to have! Though I did just link to this issue from the Jitsi Meets repository, I think it would be very cool to have for other projects that need this functionality but don't have the capabilities to develop an in-house model.

@jameshfisher
Copy link
Contributor Author

jameshfisher commented Dec 16, 2020

The blog post about this model links to this Model Card describing the model, which reads

LICENSED UNDER Apache License, Version 2.0

The Model Card also links to this paper describing Model Cards in general, which says that Model Cards can describe a license that the model is released under. So I believe the above license applies to the described model itself (e.g. rather than to the Model Card document).

So it seems like the raw .tflite model here is already Apache-licensed! @jasonmayes would you agree with this / is this Google's position?

(Thanks to @blaueente for originally noting this license in the Model Card!)

@stanhrivnak
Copy link

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

@jameshfisher I have successfully deployed the raw tflite model (BTW. many thanks for the link!) within a desktop app using MediaPipe. But I failed to do so for web app, since MediaPipe doesn't have any documentation for it yet (just some JS API's for specific examples, but not for custom models). But it looks like you're saying that you did it. How? Have you extracted the layers of the model + weights and "manually" created the same TF model and then converted it to TFJS? Or have you managed to compile the tflite to wasm and use MediaPipe?
Many thanks!

@kirawi
Copy link

kirawi commented Dec 22, 2020

@stanhrivnak I found this while looking into it myself: https://gist.github.com/tworuler/bd7bd4c6cd9a8fbbeb060e7b64cfa008 Unfortunately, I'm not familiar with tensorflow (sad Amd gpu gang), so I have no idea how it works or how to modify it. PINTO0309 uses modified versions of that script for his tflite -> pb scripts.

@PINTO0309
Copy link

PINTO0309 commented Dec 24, 2020

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, I'm going to delete it.

@kirawi
Copy link

kirawi commented Dec 24, 2020

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, remove it.

Amazing work!

@PINTO0309
Copy link

There was a Japanese engineer who implemented it in TFJS. There still seems to be a little problem with the conversion. It gets shifted to the left. Also, there is no smoothing post-processing called "light wrapping", so the border is jagged.

EqCOpUxU8AA9G2Z.mp4

@kirawi
Copy link

kirawi commented Dec 24, 2020

Is the shifting fixable?

@PINTO0309
Copy link

I'm using my own tricks in the optimization phase, so that may be affecting the results. Please give me some time so I can try this out.

@PINTO0309
Copy link

Is the shifting fixable?

It worked. However, the model resolution of 128x128 does not seem to be very accurate.
test (コピー 1)
out1

@kirawi
Copy link

kirawi commented Dec 25, 2020

That's unfortunate, but nonetheless amazing work man!

@kirawi
Copy link

kirawi commented Dec 26, 2020

Ah wait, I think that is intentional to reduce the computational requirements of the model. The bilateral filter mentioned in the blog further refines the mask, and it might be the case that the model works best with bright colours. I think all things considered, the model does its job fairly well. By the way, mind sharing the test setup you have for the model?

@PINTO0309
Copy link

PINTO0309 commented Dec 26, 2020

@kirawi
I did not use bilateral filter and just binarized the image, so the result may not be good.

### Download test.jpg
$ sudo gdown --id 1Tyv6P2zshOCqTgYBLoa0aC3Co8W-9JPG

### Download segm_lite_v509_128x128_float32.tflite
$ sudo gdown --id 1qOlcK8iKki_aAi_OrxE2YLaw5EZvQn1S
import numpy as np
from PIL import Image
try:
    from tflite_runtime.interpreter import Interpreter
except:
    from tensorflow.lite.python.interpreter import Interpreter

img = Image.open('test.jpg')
h = img.size[1]
w = img.size[0]
img = img.resize((128, 128))
img = np.asarray(img)
img = img / 255.
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

# Tensorflow Lite
interpreter = Interpreter(model_path='segm_lite_v509_128x128_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]['index']
output_details = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(input_details, img)
interpreter.invoke()
output = interpreter.get_tensor(output_details)

print(output.shape)
out1 = output[0][:, :, 0]
out2 = output[0][:, :, 1]

out1 = (out1 > 0.5) * 255
out2 = (out2 > 0.5) * 255

print('out1:', out1.shape)
print('out2:', out2.shape)

out1 = Image.fromarray(np.uint8(out1)).resize((w, h))
out2 = Image.fromarray(np.uint8(out2)).resize((w, h))

out1.save('out1.jpg')
out2.save('out2.jpg')

@w-okada
Copy link

w-okada commented Dec 26, 2020

I create the demo page to use PINTO's model converted to tensorflowjs.

https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/t11_googlemeet-segmentation/index.html

You can change input device with control panel at right side. If you want to use your camera device, please try.

And at default this page use new version of PINTO's model, but it seems shift to left a little yet...

You can change the model to old version of PINTO's model with the control panel at right side too.
Select modelPath and click reload model button.

@PINTO0309
Copy link

I overlaid the image with the tflite implementation at hand. Does it shift when I apply the filter?

Screencast.2020-12-26.10.03.33.mp4

@kirawi
Copy link

kirawi commented Dec 26, 2020

I don't think it's shifting, it looks more like the one with the white background is capturing more of the background than the other one.

@PINTO0309
Copy link

@kirawi
I am currently investigating this issue in collaboration with @w-okada on twitter.

@w-okada
Copy link

w-okada commented Dec 26, 2020

mmmm, I spent a lot of time to solve the "shifting" problem yesterday. However, I couldn't.
Can anybody help me?
This is my simple test code with nodejs.

const tf = require('@tensorflow/tfjs-node');
const fs = require('fs');
const jpeg = require('jpeg-js');
const { createCanvas, loadImage } = require('canvas')

const readImage = path => {
    const buf = fs.readFileSync(path)
    const pixels = jpeg.decode(buf, true)
    return pixels
}

const imageByteArray = (image, numChannels) => {
    const pixels = image.data
    const numPixels = image.width * image.height;
    const values = new Int32Array(numPixels * numChannels);
  
    for (let i = 0; i < numPixels; i++) {
      for (let channel = 0; channel < numChannels; ++channel) {
        values[i * numChannels + channel] = pixels[i * 4 + channel];
      }
    }  
    return values
}
  

const main = async()=>{
    const image = readImage("test.jpg")
    const handler = tf.io.fileSystem("./model/model.json");
    const model = await tf.loadGraphModel(handler)
    const numChannels=3
    const values = imageByteArray(image, numChannels)
    const outShape = [image.width, image.height, numChannels];
    let input = tf.tensor3d(values, outShape, 'float32');


    input = tf.image.resizeBilinear(input,[128, 128])
    input = input.expandDims(0)
    input = tf.cast(input, 'float32')
    input = input.div(tf.max(input))

    let predict = await model.predict(input)
    predict = predict.softmax()
    const res = await predict.arraySync()
    const bm = res[0]
    const width = bm[0].length
    const height = bm.length
    const canvas = createCanvas(width, height)
    const imageData = canvas.getContext("2d").getImageData(0, 0, canvas.width, canvas.height)
    for (let rowIndex = 0; rowIndex < canvas.height; rowIndex++) {
        for (let colIndex = 0; colIndex < canvas.width; colIndex++) {
            const pix_offset = ((rowIndex * canvas.width) + colIndex) * 4
            if(bm[rowIndex][colIndex][0]>0.5){
                imageData.data[pix_offset + 0] = 255
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }else{
                imageData.data[pix_offset + 0] = 0
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }
        }
    }
    // const imageDataTransparent = new NodeCanvasImageData(data, this.canvas.width, this.canvas.height);
    canvas.getContext("2d").putImageData(imageData, 0, 0)

    const tmpCanvas = createCanvas(image.width, image.height)
    tmpCanvas.getContext("2d").drawImage(canvas, 0, 0, tmpCanvas.width, tmpCanvas.height)
    const buf = tmpCanvas.toBuffer('image/png')
    fs.writeFileSync('./res.png', buf)
}

main()

test
res

@stanhrivnak
Copy link

Hi guys, first of all, many thanks to @PINTO0309, @w-okada, and others for putting your effort on this! Great work so far! I would really love to have this great model from google in my web app (currently I have bodypix with custom improvements, but still it sucks). Here are my 2 cents.
I have deployed the discussed original tflite model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) within a desktop app using MediaPipe and it performs amazingly (see the attached video) even under not optimal light conditions. What you see is the raw model performance without any post-processing (with it, it looks even better), resolution 128 x 128.
https://user-images.githubusercontent.com/64148065/103182841-d2053c80-48ae-11eb-8ba1-1a1518c9defb.mov

The implications are:

  1. There is hope - the model is already good enough, the resolution 128 x 128 is high enough to have nice results when upsampling to SD/HD. Also, it's super-fast, inferences running well above 25 FPS.
  2. There has to be a flaw in the manual conversion to h5/TFJS.

I think the best would be to compare the outputs of the original tflite model and the created TFJS model (or h5/tflite), layer after layer to see where it deviates and focus to fix that part.
The problem is that the original tflite model uses some custom ops, so it can't be read in python directly. But we know the definitions of these ops, here they are: (not sure if it uses all 3, but at least "Convolution2DTransposeBias", because that is the error it gives me in python)
https://github.com/google/mediapipe/tree/master/mediapipe/util/tflite/operations
The problem is that it's in C++, so it has to be rewritten to python or we need to go with Tensorflow C++. Also, as stated here:
google/mediapipe#35 (comment)
these custom ops are just merged existing operations, so it should be straight-forward.

So this is my plan. I can work on it only ~ 2 hours a day, so if you're faster, go for it and let me know! :) Or if you have any other ideas, share it please!

@saghul
Copy link

saghul commented May 12, 2021

That's the one! Cheers!

@w-okada
Copy link

w-okada commented May 12, 2021

Wow!!

@euan-smith
Copy link

Note that although Google did release the Meet model under the Apache 2.0 licence with that model card pasted above, they no longer have it available for download and there is now a different card with a different licence.

@ashikns
Copy link

ashikns commented May 12, 2021

Yep. The new model is called "Xeno" meet segmentation or something. This is the apache released model: OneDrive link.

Also if you tinker around a bit with google meet webpage you can still download the models directly from Google, you just need to find the right url from the js script. At least that was still working as of February.

@mgyong
Copy link

mgyong commented Jun 9, 2021

Hi, I am product manager for MediaPipe. Please note that only the MediaPipe Selfie Segmentation Model is open sourced and licensed under Apache 2 for external use. Other versions, including those used in the Google Meet product, are licensed under Google Terms and Conditions and are not intended for open source use.

@saghul
Copy link

saghul commented Jun 10, 2021

@jasonmayes Why was this closed?

@jasonmayes
Copy link
Member

Closed as the folk from MediaPipe clarified the T&C for the models they released.

@lina128 lina128 self-assigned this Jun 11, 2021
@lina128
Copy link
Collaborator

lina128 commented Jun 11, 2021

Reopen to track the segmentation model release through tfjs API

@lina128 lina128 reopened this Jun 11, 2021
@technikhil314
Copy link

From https://meet.google.com/,

https://meet.google.com/_/rtcvidproc/release/hashed/segm_full_sparse_v1008_0bda82336d236e21e52f2b74129b9883.dat https://meet.google.com/_/rtcvidproc/release/hashed/segm_lite_v1082_c59fbb2b8451df2c2752e562c6523bcc.dat

Looks like the latest model is hashed and can not be downloaded anymore.

@jimmy7799 We are doomed then? or have we found some way to get the model?

@saghul
Copy link

saghul commented Oct 2, 2021

Even if you can get it, you are not allowed to use it.

@technikhil314
Copy link

@saghul I know I just want to try that out on local. No intention to use it in open source or commercial project.

@floe
Copy link

floe commented Oct 3, 2021

JFYI, the MediaPipe Selfie Segmentation model is a) properly Apache licensed and b) can just be downloaded as an Android AAR archive. See https://drive.google.com/file/d/1dCfozqknMa068vVsO2j_1FgZkW_e3VWv/preview .

@wangqi-cybrook
Copy link

I ever tried the model in MediaPipe, but looks like the performance is not good as google meet one.

@no-1ne
Copy link

no-1ne commented Nov 20, 2021

Mediapipe segmentation seems to be coming to tfjs https://github.com/tensorflow/tfjs-models/tree/master/body-segmentation/src

@benbro
Copy link

benbro commented Nov 20, 2021

@no-1ne how is Mediapipe segmentation in tfjs different than using this js api?

@no-1ne
Copy link

no-1ne commented Nov 20, 2021

It’s the same I believe they are porting it to use from within tfjs ecosystem

@rthadur
Copy link
Contributor

rthadur commented May 3, 2022

Mediapipe segmentation model has been deployed here .please verify.

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

@google-ml-butler
Copy link

Closing as stale. Please @mention us if this needs more attention.

@dganzella
Copy link

i was wondering if there is any hd model (512 px) available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests