Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operations with variable tensor sizes cause GPU Memory leaks #604

Closed
justadudewhohacks opened this issue Aug 13, 2018 · 15 comments
Closed

Comments

@justadudewhohacks
Copy link
Contributor

justadudewhohacks commented Aug 13, 2018

TensorFlow.js version

  • tfjs-core 0.11.9
  • tfjs-core 0.12.10

Browser version

  • chrome 67.0.3396.99 (64-Bit)
  • firefox 61.0.1 (64-Bit)

Describe the problem or feature request

Running operations with variable input tensor sizes causes GPU memory leaks (not tracked by tf.memory stats, but can be tracked using chrome task manager for example):

for (let i = 0; i < iterations; i++) {
  const height = Math.floor(Math.random() * maxTensorSize)
  const width = Math.floor(Math.random() * maxTensorSize)

  console.log(height, width)

  const t1 = tf.ones([height, width])
  const t2 = tf.ones([height, width])

  // do something
  const sum = t1.add(t2)

  t1.dispose()
  t2.dispose()
  sum.dispose()

  await tf.nextFrame()

  console.log(tf.memory())
}

Code to reproduce the bug / link to feature request

https://github.com/justadudewhohacks/tfjs-tensor-size-memoryleak-issue

@Lewuathe
Copy link
Contributor

@justadudewhohacks Thank you so much for the detail report and sample application.

But in my environment, the memory leak was not observed in Chrome task manager.
screen shot 2018-08-17 at 16 57 33

Even I ran the application several times, memory footprint was kept around 100MB. I launched the sample application according to README and checked the Chrome task manager.

@justadudewhohacks
Copy link
Contributor Author

Hi @Lewuathe,

Thanks for reviewing this. I should have mentioned that you have to toggle the GPU memory tab in the task manager:

gpu-mem-leak

I could reproduce this on my desktop machine and laptop (both AMD GPUs + latest version of chrome), on an Intel GPU, as well as on my android device.

After some time the browser throws an exception saying WebGL context lost. Mobile chrome on android crashes almost immediately.

Hope this helps.

@justadudewhohacks
Copy link
Contributor Author

justadudewhohacks commented Aug 27, 2018

In case someone is facing the same issue, when training an image classifier or an object detector, you can mitigate that issue by resizing your images to a fixed input size, before calling tf.fromPixels and instead of doing tensor operations for padding and resizing:

export function imageToSquare(img: HTMLImageElement | HTMLCanvasElement, inputSize: number): HTMLCanvasElement {

  const dims = img instanceof HTMLImageElement 
    ? { width: img.naturalWidth, height: img.naturalHeight }
    : img 
  const scale = inputSize / Math.max(dims.height, dims.width)
  const width = scale * dims.width
  const height = scale * dims.height

  const targetCanvas = document.createElement('canvas')
  targetCanvas .width = inputSize
  targetCanvas .height = inputSize
  targetCanvas.getContext('2d').drawImage(img, 0, 0, width, height)

  return targetCanvas
}

@nsthorat
Copy link
Contributor

Ah yes, this is because we cache textures based on their physical shape, you are basically purposefully getting cache misses every single time. We've found that that's usually pretty rare. Resizing to a fixed input size will absolutely fix the problem :)

Just curious, why was your canvas changing size all the time in practice?

@justadudewhohacks
Copy link
Contributor Author

Thanks for clarification, I guess that explains it. You are probably right, in most cases the input size of tensors should be fixed anyways.

I was facing the issue when training models on images, which where different in size. I was resizing each image with tf.resizeBilinear, which was causing these memory leaks.

Also this was an issue in face-api.js, since you would first run your images through an object detector, which returns multiple bounding boxes of different sizes and extract sub images from these regions for further classification (for instance face landmark detection or computing a face descriptor). This was also a performance issue resulting in flaky inference times, since I guess the graph was recompiled everytime for different input shapes?

However, by now I am using the code snippet I posted above for resizing, which works pretty well. So this is not really an issue from my side anymore.

@nsthorat
Copy link
Contributor

Ah yeah, if the output tensors are variably sized, we'll possibly have to recompile the shaders every time.

If you can provide a simple standalone HTML page that shows the issue we can look into making it faster (it's possible we can do things like upload the shape as a uniform to avoid recompilation).

cc @annxingyuan

@rthadur
Copy link
Contributor

rthadur commented Dec 5, 2018

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@RohanGautam
Copy link

This is happening to me too!
For me, i've pinpointed it down to the following line in my code:
img = tf.browser.fromPixels(webcamElement);
where webcamElement is a frame from the webcam, all the major webcam setup is from google's example here.

I'm resizing img and passing it though a CNN. Even when I dont resize it with tf.js, and change the shape of the webcam frame itself, the issue persists.

Memory leak is clearly the issue as tf.memory() tells me the memory increases very rapidly

I have tried:

Let me know if there is any other info you'd like me to provide! I'm kinda stumped about this issue at the moment.

@RohanGautam
Copy link

UPDATE: I'm currently able to get it working by calling tf.dispose() on every tensor after i'm done using it. Seems to work for now, but it's kinda janky. Hope it helps someone though!

@nsthorat
Copy link
Contributor

nsthorat commented May 7, 2019

@RohanGautam, do you think you could post the code you are having problems with?

@RohanGautam
Copy link

sure! I've boiled it down to the following minimum code required to reproduce the error:
index.html :

<html>

<head>
    <meta charset="UTF-8">
    <title>MemLeak</title>
    <!-- Load the latest version of TensorFlow.js -->
    <script src="https://unpkg.com/@tensorflow/tfjs"></script>
</head>

<body>
    <video autoplay playsinline muted id="webcam" width="250" height="250"></video>
    <!-- Load index.js after the content of the page -->
    <script src="index.js"></script>
</body>

</html>

index.js :

const webcamElement = document.getElementById('webcam');

async function app() {

    await setupWebcam();
    while (true) {
        //!! source of leak !!
        const img = tf.browser.fromPixels(webcamElement);
        // Doing stuff with the image//
        console.log(tf.memory())
        await tf.nextFrame();
    }
}


async function setupWebcam() {
    return new Promise((resolve, reject) => {
        const navigatorAny = navigator;
        navigator.getUserMedia = navigator.getUserMedia ||
            navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
            navigatorAny.msGetUserMedia;
        if (navigator.getUserMedia) {
            navigator.getUserMedia({ video: true },
                stream => {
                    webcamElement.srcObject = stream;
                    webcamElement.addEventListener('loadeddata', () => resolve(), false);
                },
                error => reject());
        } else {
            reject();
        }
    });
}

app();

@nsthorat
Copy link
Contributor

nsthorat commented May 8, 2019

Ah so regular tf.Tensors not tf.Variables. Check out the guide here for why that is not a bug: https://www.tensorflow.org/js/guide/tensors_operations#memory

@dhasegan
Copy link

dhasegan commented May 12, 2020

@RohanGautam did you get to fix your problem? I am having the same issue and tf.dispose and tf.tidy does not work.
My specs:

  • "@tensorflow/tfjs": "1.7.4"

Browser:

  • Chrome Version 81.0.4044.138 (Official Build) (64-bit)

@RohanGautam
Copy link

@dhasegan

@RohanGautam did you get to fix your problem? I am having the same issue and tf.dispose and tf.tidy does not work.
My specs:

* "@tensorflow/tfjs": "1.7.4"

Browser:

* Chrome Version 81.0.4044.138 (Official Build) (64-bit)

Yeah I did! Was working on it a long while back, so had to dig up the archives. It was fixed in this commit in my personal project.

Basically involved disposing everything, including intermediate products of computation.

const img = tf.browser.fromPixels(webcamElement);
const resizedImg = tf.image.resizeBilinear(img, [150, 150])
const batchedImage = resizedImg.expandDims(0);
//--------disposing intermediate products too-------------//
tf.dispose(img);
tf.dispose(resizedImg);
tf.dispose(batchedImage);
console.log(tf.memory());

But you say dispose didn't work for you :/ I'd suggest console.log(tf.memory())-ing in your intermediate steps and narrowing down where it's happening.

@dhasegan
Copy link

tf.memory() is not increasing for me. my input has varying sizes as well and each new size there is a new shader that is created and cached in the TFJS library: #3061

There is no cache purge so it slowly accumulates GPU memory (as seen in the Chrome Task Manager). You might reach a similar issue if you have different sizes for webcamElement

shaileshpandit added a commit to ente-io/photos-web that referenced this issue Dec 20, 2021
To capture high resolution face images
Use ImageBitmap till tf models as tf manipulations on variable dimention images leads to gpu memory leak - tensorflow/tfjs#604
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants