Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloader stuck #14

Closed
walkerning opened this issue Jul 24, 2019 · 2 comments
Closed

Dataloader stuck #14

walkerning opened this issue Jul 24, 2019 · 2 comments

Comments

@walkerning
Copy link
Owner

walkerning commented Jul 24, 2019

Occasionally, the data loader in the search process will stuck at some point... Usually the first time when the controller queue is used. This might be related to this issue: pytorch/pytorch/issues/1355.
pytorch/pytorch/issues/1355#issuecomment-308587289 said this issue might be related to shm running out. But there are 32G shm configured, and the actual usage is never close to that.
Try adding some swap space to avoid data handling thread being killed due to running out of memory (not work).

Seems this might be also due to calling iter on the data loader too early, and not used it for a long while.

@walkerning
Copy link
Owner Author

Might be mitigated in f0a4d60.
Maybe add a timeout while waiting and reinit the infinite iterator?

@walkerning
Copy link
Owner Author

After f0a4d60, I haven't observe any data loader stuck anymore. Maybe it's already resolved.

walkerning added a commit that referenced this issue Sep 22, 2020
[RobNAS] Clean and merge robNAS codes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant