-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies #1643
Comments
OOOOOOOOOOOOOOOOOO |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Any update on this? I am also facing the same issue. Have tried many things for the last 3 days, but no success. |
Well, i managed to resolve this. |
This works, but only temporarily. Nowadays I am facing the problem of encountering a crash after few hours of training. It usually happens at the beginning of the epoch, when it is loading.
My environment:
An interesting and at the same time the reproducible crash happened when I loaded the Microsoft Teams application. Even MS Teams reported an exception regarding virtual memory. No other app stopped working. Thus, MS Teams and PyTorch training became "mutually exclusive". After I applied the trick mentioned above, the problem remains only on the PyTorch side, and only sometimes. A lot of ambiguous words, I know, but that's how it is. |
1.try counting down the num_workers to 1or 0. |
reduce number of workers will reduce train speed efficiently. |
I was having the same error thrown with |
That's not fixed. It will reduce the image loading speed.
|
@ardeal @krisstern @PonyPC you can set dataloader workers during training, i.e.:
https://github.com/ultralytics/yolov5/blob/76d301bd21b4de3b0f0d067211da07e6de74b2a0/train.py#L454 It seems like a lot of windows users are encountering this problem, but as @PonyPC mentioned reducing workers will generally also result in slower training. Are you guys encountering this during DDP or single-GPU training? EDIT: just realized this is YOLOv3 repo and not YOLOv5. I would strongly encourage all users to migrate to YOLOv5, which is much better maintained. It's possible this issue is already resolved there. |
YOLOv5 has same problem. |
@PonyPC please raise a bug report issue citing a reproducible example in the YOLOv5 repo in that case. |
I have managed to mitigate (although not completely solve) this issue. I posted a more detailed explanation on a related StackOverflow link but basically try this: Download fixNvPe.py: Install dependency: Run (for OPs paths) (NOTE: THIS WILL MODIFY YOUR DLLS [although it will back them up]): |
this fixed it (although you mentioned "not completely") this has been a better suggestion than anything found elsewhere. |
Hello, I don't get it clearly. All those step is put the fixNvPe.py in the C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll ? If I wrong please explain for me the step. Thank you so much. |
You can place For example, OP's error message was
It is failing to load For example, if you downloaded fixNvPe.py to
This will 'fix' all of the DLL files in the If you get an error message about failing to import |
This problem is about DataLoader. |
The issue is with how multi-process Python works on Windows with the pytorch/cuda DLLs. The number of workers you set in the DataLoader directly relates to how many Python processes are created. Each time a Python process When Windows is asked to reserve memory, if it says that it returned memory then it guarantees that memory will be available to you, even if you never end up using it. Linux allows overcommitting. By default on Linux, when you ask it to reserve memory, it says "Yeah sure, here you go" and tells you that it reserved the memory. But it hasn't actually done this. It will reserve it when you try to use it, and hopes that there is something available at that time. So, if you allocate memory on Windows, you can be sure you can use that memory. If you allocate memory on Linux, it is possible that when you actually try to use the memory that it will not be there, and your program will crash. On Linux, when it spawns On Windows, when you spawn So, on Windows Your suggestion of lowering The trick is to find a balance of all of these variables that keeps that equation true. |
Wow, that's working @cobryan05 |
@PonyPC @cobryan05 hi, I'm not following the convo exactly since we don't have any windows instances here, but if you have any improvements you'd like to implement I'd recommend submitting a PR. The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance. Please see our ✅ Contributing Guide to get started. |
@glenn-jocher Unfortunately I don't think that this is something that can be fixed within yolov5. This is an issue with CUDA and pytorch DLLs. My 'fix' just changes some flags on the DLLs to make them allocate less memory. This likely would be a job for NVidia to fix the flags on their CUDA DLLs (eg, cusolver64_*.dll in CUDA release). Perhaps 'pytorch' could help some as well, since they also package some of these (eg, caffe2_detectron_ops_gpu.dll)... although they use NVidia tools to do this, so the blame probably falls back to NVidia. Even with my changes to these flags, these DLLs still reserve a whole lot more memory than they actually use. I don't know who is to blame, and since my flag changes got me going I'm not digging further into it. edit: I went ahead and submitted the info as a 'bug report' to NVIDIA. Whether or not anything happens with it, or any of the appropriate people at NVIDIA ever see it, who knows? But maybe they'll pick it up and do something about it. |
I've been having this problem also with TensorFlow and as described in detail by @cobryan05, the problem resides in how Windows does multiprocessing and DLLs. @cobryan05 is it possible to paste the link to the NVIDIA page where you posted this problem? I also want to go there and whine to them about it. |
I faced the same problem when the batch size was 8 and num_workers was 6. How I solved this problem by making the following changes |
Can I know where should I go to change the number of workers or put this line of code? I'm using jupyter notebook. |
@szan12 i.e. |
I had the same error on win 10 today and following
didn't help. Then, I suddenly remembered that I had installed cuda 11.7 along with already exisiting cuda 11.3 and 11.2 version. I had moved up |
hello, this problem may be solved now! I use yolov5-6.2, --batch-size 16 --workers 16, the virtual memory it need is much less than before! (It need more than 100GB before) why I use torch 1.13.1+cu117 and cuda 11.8? |
I solved increasing the page file limit of windows |
Thank you for sharing your solution. It's great to hear that increasing the page file limit of Windows helped in resolving the issue. It seems that managing the page file size effectively contributed to stability during the training process. If you encounter any more issues or have further questions, feel free to reach out. |
it works, but the entire training process became too slow, is there any better way to solve this? I got two days wasted on this. |
There is no better solution. |
@ardeal hi there! It seems you've already tried the recommended solutions. As for improving speed, upgrading your hardware such as increasing memory, using a stronger GPU, or leveraging a server CPU may help expedite the training process. If you have further queries or need additional assistance, feel free to ask. |
It is not really related with the computer performance but rather the fact that:
Even on linux it will slowly eat up all of your memory and any swap partition you have till it drives training to a halt. good thing on linux is that you can just have oom killer and resume the training (though not an option on large datasets, those will still memory leak into oblivion). But on windows the only solution is to clear pagefile.sys with a hard reboot. |
@siddtmb hi! Thanks for your insights. Memory management, particularly in a Windows environment, can indeed introduce challenges. We're continuously working on improving the efficiency of our data loader and overall memory usage within YOLOv3 and appreciate your feedback. For mitigating memory leaks or high memory usage issues:
We recognize the importance of efficient memory usage and are committed to making improvements. Contributions and pull requests are always welcome if you have suggestions or optimizations to share with the community. Your feedback is valuable in guiding those efforts. Thank you for bringing this to our attention. |
Hi,
My environment:
Windows 10
python 3.8.5
CPU 10700K + 16GB RAM
GPU 3060Ti (8GB memory)
CUDA 11.0.3_451.82_win10
numpy 1.19.3
torch 1.7.1+cu110
torchvision 0.8.2+cu110
on master branch, follow the section at: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data, and modify batch-size = 2 on my 3060Ti(8G memory). I got the following issue:
Is the issue related with CUDA or GPU memory size?
Thanks and Best Regards,
Ardeal
The text was updated successfully, but these errors were encountered: