Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak #57

Closed
marisancans opened this issue Jan 27, 2020 · 18 comments
Closed

memory leak #57

marisancans opened this issue Jan 27, 2020 · 18 comments

Comments

@marisancans
Copy link

Hello, Im facing a memory leak and I can't find out why. I am simply looping through a lot of images and it gradually fills all my memory, this is my setup:
Version facenet-pytorch==2.0.1

mtcnn = MTCNN(image_size=64, keep_all=True)
resnet = InceptionResnetV1(pretrained='vggface2').eval()

for nth, img_path in enumerate(img_paths):
    img = Image.open(img_path.resolve())
    boxes, probs = mtcnn.detect(img)
@marisancans
Copy link
Author

Some additional info, I ran the code on another machine (server) and it looks like there is no leak there, i'm using conda.

Heres my local PC dependencies:
Python 3.6.10

astroid==2.3.3
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
cloudpickle==1.2.2
cycler==0.10.0
cytoolz==0.10.1
dask==2.9.1
decorator==4.4.1
face-alignment==1.0.1
facenet-pytorch==2.0.1
idna==2.8
imageio==2.6.1
isort==4.3.21
kiwisolver==1.1.0
lazy-object-proxy==1.4.3
matplotlib==3.1.2
mccabe==0.6.1
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
networkx==2.4
numpy==1.17.4
olefile==0.46
opencv-python==4.1.2.30
pandas==0.25.3
Pillow==7.0.0
pycparser==2.19
pylint==2.4.4
pyparsing==2.4.6
python-dateutil==2.8.1
pytz==2019.3
PyWavelets==1.1.1
requests==2.22.0
scikit-image==0.16.2
scipy==1.4.1
six==1.13.0
toolz==0.10.0
torch==1.3.1
torchvision==0.4.2
tornado==6.0.3
tqdm==4.41.1
typed-ast==1.4.1
urllib3==1.25.7
wrapt==1.11.2

@timesler
Copy link
Owner

Hi @marisancans, I think it is very unlikely that the memory leak is caused by facenet-pytorch since it did not occur on a different system with the same version.

Can you provide a complete working example of code that caused the leak? I would suggest checking if it is related to the version of PIL or torch that you have installed. To check if it is related to PIL, you could add a del img inside your loop.

@ShadowElement
Copy link

facenet-pytorch 2.2.1 same error

@timesler
Copy link
Owner

@marisancans @ShadowElement I've been able to reproduce this issue now - it doesn't seem to happen on every system and I am not 100% sure what is happening. My guess is that it has to do with slicing a numpy array without creating a copy.

I'm in the process of tracking down the issue - if I find it and can fix, I'll let you know.

@ShadowElement
Copy link

@timesler Thanks

@haydenroche5
Copy link

haydenroche5 commented Feb 25, 2020

Same issue for me using version 2.2.7. I tried to track down the leak myself with pympler, but didn't see anything leaking. Weird. For what it's worth, the leak happens for me when I'm using a CPU. I haven't tried it out with a GPU.

@timesler
Copy link
Owner

@haydenroche5 in what environment did you see the leak happening?

@haydenroche5
Copy link

@timesler I'm using a Google Cloud instance.

HW environment:
8 x Intel(R) Xeon(R) CPU @ 2.30GHz
30 GB RAM

SW environment (non-exhaustive):
Ubuntu 19.10
pillow 6.2.1
python 3.7.5
numpy 1.17.3
facenet-pytorch 2.2.7
torch 1.4.0
torchvision 0.5.0

And to be clear, I've got the same kind of loop as @marisancans in my code. Thanks for looking into this. Happy to provide any other info that might be helpful.

@marisancans
Copy link
Author

if this helps, this is what im using:
Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz

Ubuntu 19.04

envs:
facenet-pytorch==2.0.1
torchvision==0.3.0
torch==1.4.0
Pillow==7.0.0
opencv-python==4.1.2.30
numpy==1.18.1

Im also running all tests on CPU only, because i'm using laptop. The leak didn't happen on server with cuda. Thanks

@mhamedLmarbouh
Copy link

Hi did anyone manage to find the source of this problem cause I am currently suffering from it too, I am running on cpu and the weird thing is that the memory leak doesn't occurs all the time.

@armanhak
Copy link

armanhak commented May 5, 2020

I have the same problem. I run it in colab and want to get embeddings of the lfw photo on the GPU, but I get an error even when I take part of the data (4000). RuntimeError: CUDA out of memory. Tried to allocate 1.30 GiB.

@TerryTran
Copy link

TerryTran commented May 12, 2020

@timesler: I have the same issue above, after digging into the codes it seems memory leak comes from PNet: https://github.com/timesler/facenet-pytorch/blob/master/models/utils/detect_face.py#L50, when I disabled it the memory leak issue disappear. Need others to take a look at this.

@armanhak
Copy link

armanhak commented May 12, 2020

@TerryTran: There seems to be a problem in this module https://github.com/timesler/facenet-pytorch/blob/master/models/inception_resnet_v1.py As I understand it, detect_face.py is needed to find the face in the photo. I do this: model = InceptionResnetV1 (pretrained = 'vggface2').Eval().To(device). And the code I transfer data to get embeddings (model(data)), I get an error about the memory. Checked GPU consumption with GPUtil.showUtilization. There is a sharp increase in GPU consumption.

@TerryTran
Copy link

@TerryTran: There seems to be a problem in this module https://github.com/timesler/facenet-pytorch/blob/master/models/inception_resnet_v1.py As I understand it, detect_face.py is needed to find the face in the photo. I do this: model = InceptionResnetV1 (pretrained = 'vggface2').Eval().To(device). And the code I transfer data to get embeddings (model(data)), I get an error about the memory. Checked GPU consumption with GPUtil.showUtilization. There is a sharp increase in GPU consumption.

I didn't use the InceptionResnetV1 for my testing, only use the detect function of MTCNN, here is my code:

from facenet_pytorch.models.mtcnn import MTCNN
from PIL import Image
img = Image.open(img_path)
for i in range(1000):
boxes, probs = mtcnn.detect(img)

@armanhak
Copy link

armanhak commented May 29, 2020

My memory problem was fixed when I transmitted the data on batches in a loop and for each result I called the detach() method. It looks something like this:

model = InceptionResnetV1(pretrained='vggface2').eval()
embeddings =[]
for batch in data:
    embedding  = model(batch).detach()
    embeddings.append(embedding)
embeddings = torch.stack(embeddings)

@pcshih
Copy link

pcshih commented Jul 24, 2020

add torch.cuda.empty_cache() after line 351 in mtcnn.py

@jdongca2003
Copy link
Contributor

jdongca2003 commented Aug 2, 2020

I created a pull request to fix GPU out of memory issue.

#105

The root cause is that batch size for rnet and onet is dynamic. Sometimes the batch size for rnet input data is very large ( > 20,000). The solution is to use fixed bounded batch size.

@adityapatadia
Copy link

Did @marisancans original issue of running on CPU get resolved? We are facing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants