Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak when using pytorch dataloader #746

Closed
techkang opened this issue May 20, 2019 · 4 comments
Closed

memory leak when using pytorch dataloader #746

techkang opened this issue May 20, 2019 · 4 comments
Assignees
Labels
need-feedback 📢 We need your response (question) submodule ⊂ Periphery/subclasses synchronisation ⇶ Multi-thread/processing

Comments

@techkang
Copy link

techkang commented May 20, 2019

When I use tqdm with pytorch, I found a memory leak.

This code will use more and more memory until break down.

Comment code dummy = tqdm(total=100) in main function or set num_workers=0 will work well.

from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm


class MinDataset(Dataset):

    def __init__(self):
        super().__init__()

    def __getitem__(self, item):
        return '{:.4f}'.format(item) * 1000000

    def __len__(self):
        return 1000`


if __name__ == '__main__':

    min_dataloader = DataLoader(MinDataset(), 32, num_workers=4)
    dummy = tqdm(total=100)
    while True:
        for i, batch in tqdm(enumerate(min_dataloader), total=1000):
            if i == 2:
                break
            pass

I used PyTorch 1.0.1 and tqdm 4.31.1.

@casperdcl
Copy link
Sponsor Member

as mentioned in the documentation, use enumerate(tqdm(x)) instead of tqdm(enumerate(x))

@casperdcl casperdcl self-assigned this May 20, 2019
@casperdcl casperdcl added need-feedback 📢 We need your response (question) submodule ⊂ Periphery/subclasses synchronisation ⇶ Multi-thread/processing labels May 20, 2019
@techkang
Copy link
Author

It works well when I use tqdm(enumerate(x)). Thank you.

@guaguablue
Copy link

I feel puzzled about the last reply:"It works well when I use tqdm(enumerate(x))", according to another reply before:"use enumerate(tqdm(x)) instead of tqdm(enumerate(x))". I am wondering if Mr.techkang wanted to say: "It works well when I use enumerate(tqdm(x))".
And I read the docunment, it says: "Replace tqdm(enumerate(...)) with enumerate(tqdm(...)) or tqdm(enumerate(x), total=len(x), ...)". However, I met the oppsite situation. When I use tqdm in in my code like below:
for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc='Computing mAP')):
Some times it will make my program stuck in this step, and if I use tqdm(enumerate(dataloader), total=len(dataloader), ...) instead, the program will not work any more. I also tried to use tqdm(enumerate(dataloader)) the computing is ok but the tqdm not works well that I can not see the progress bar like: ‘Computing mAP: 5it [00:05, 1.13s/it]‘.
pytorch version:1.1.0, tqdm:4.43.0
So could some help me about that? Thanks a lot!

@casperdcl
Copy link
Sponsor Member

Also I should mention tqdm.contrib.tenumerate

markurtz pushed a commit to neuralmagic/sparseml that referenced this issue Apr 6, 2021
Memory leak seems to be related to the way how tqdm is wrapping data_loader.
More info at: tqdm/tqdm#746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-feedback 📢 We need your response (question) submodule ⊂ Periphery/subclasses synchronisation ⇶ Multi-thread/processing
Projects
None yet
Development

No branches or pull requests

3 participants