-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operations with variable tensor sizes cause GPU Memory leaks #604
Comments
@justadudewhohacks Thank you so much for the detail report and sample application. But in my environment, the memory leak was not observed in Chrome task manager. Even I ran the application several times, memory footprint was kept around 100MB. I launched the sample application according to README and checked the Chrome task manager. |
Hi @Lewuathe, Thanks for reviewing this. I should have mentioned that you have to toggle the GPU memory tab in the task manager: I could reproduce this on my desktop machine and laptop (both AMD GPUs + latest version of chrome), on an Intel GPU, as well as on my android device. After some time the browser throws an exception saying WebGL context lost. Mobile chrome on android crashes almost immediately. Hope this helps. |
In case someone is facing the same issue, when training an image classifier or an object detector, you can mitigate that issue by resizing your images to a fixed input size, before calling export function imageToSquare(img: HTMLImageElement | HTMLCanvasElement, inputSize: number): HTMLCanvasElement {
const dims = img instanceof HTMLImageElement
? { width: img.naturalWidth, height: img.naturalHeight }
: img
const scale = inputSize / Math.max(dims.height, dims.width)
const width = scale * dims.width
const height = scale * dims.height
const targetCanvas = document.createElement('canvas')
targetCanvas .width = inputSize
targetCanvas .height = inputSize
targetCanvas.getContext('2d').drawImage(img, 0, 0, width, height)
return targetCanvas
} |
Ah yes, this is because we cache textures based on their physical shape, you are basically purposefully getting cache misses every single time. We've found that that's usually pretty rare. Resizing to a fixed input size will absolutely fix the problem :) Just curious, why was your canvas changing size all the time in practice? |
Thanks for clarification, I guess that explains it. You are probably right, in most cases the input size of tensors should be fixed anyways. I was facing the issue when training models on images, which where different in size. I was resizing each image with tf.resizeBilinear, which was causing these memory leaks. Also this was an issue in face-api.js, since you would first run your images through an object detector, which returns multiple bounding boxes of different sizes and extract sub images from these regions for further classification (for instance face landmark detection or computing a face descriptor). This was also a performance issue resulting in flaky inference times, since I guess the graph was recompiled everytime for different input shapes? However, by now I am using the code snippet I posted above for resizing, which works pretty well. So this is not really an issue from my side anymore. |
Ah yeah, if the output tensors are variably sized, we'll possibly have to recompile the shaders every time. If you can provide a simple standalone HTML page that shows the issue we can look into making it faster (it's possible we can do things like upload the shape as a uniform to avoid recompilation). cc @annxingyuan |
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks! |
This is happening to me too! I'm resizing Memory leak is clearly the issue as I have tried:
Let me know if there is any other info you'd like me to provide! I'm kinda stumped about this issue at the moment. |
UPDATE: I'm currently able to get it working by calling |
@RohanGautam, do you think you could post the code you are having problems with? |
sure! I've boiled it down to the following minimum code required to reproduce the error: <html>
<head>
<meta charset="UTF-8">
<title>MemLeak</title>
<!-- Load the latest version of TensorFlow.js -->
<script src="https://unpkg.com/@tensorflow/tfjs"></script>
</head>
<body>
<video autoplay playsinline muted id="webcam" width="250" height="250"></video>
<!-- Load index.js after the content of the page -->
<script src="index.js"></script>
</body>
</html>
const webcamElement = document.getElementById('webcam');
async function app() {
await setupWebcam();
while (true) {
//!! source of leak !!
const img = tf.browser.fromPixels(webcamElement);
// Doing stuff with the image//
console.log(tf.memory())
await tf.nextFrame();
}
}
async function setupWebcam() {
return new Promise((resolve, reject) => {
const navigatorAny = navigator;
navigator.getUserMedia = navigator.getUserMedia ||
navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
navigatorAny.msGetUserMedia;
if (navigator.getUserMedia) {
navigator.getUserMedia({ video: true },
stream => {
webcamElement.srcObject = stream;
webcamElement.addEventListener('loadeddata', () => resolve(), false);
},
error => reject());
} else {
reject();
}
});
}
app(); |
Ah so regular tf.Tensors not tf.Variables. Check out the guide here for why that is not a bug: https://www.tensorflow.org/js/guide/tensors_operations#memory |
@RohanGautam did you get to fix your problem? I am having the same issue and tf.dispose and tf.tidy does not work.
Browser:
|
Yeah I did! Was working on it a long while back, so had to dig up the archives. It was fixed in this commit in my personal project. Basically involved disposing everything, including intermediate products of computation. const img = tf.browser.fromPixels(webcamElement);
const resizedImg = tf.image.resizeBilinear(img, [150, 150])
const batchedImage = resizedImg.expandDims(0);
//--------disposing intermediate products too-------------//
tf.dispose(img);
tf.dispose(resizedImg);
tf.dispose(batchedImage);
console.log(tf.memory()); But you say |
There is no cache purge so it slowly accumulates GPU memory (as seen in the Chrome Task Manager). You might reach a similar issue if you have different sizes for |
To capture high resolution face images Use ImageBitmap till tf models as tf manipulations on variable dimention images leads to gpu memory leak - tensorflow/tfjs#604
TensorFlow.js version
Browser version
Describe the problem or feature request
Running operations with variable input tensor sizes causes GPU memory leaks (not tracked by tf.memory stats, but can be tracked using chrome task manager for example):
Code to reproduce the bug / link to feature request
https://github.com/justadudewhohacks/tfjs-tensor-size-memoryleak-issue
The text was updated successfully, but these errors were encountered: