Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTX4090 CUDA out of memory. #7

Closed
WuNein opened this issue Mar 6, 2023 · 3 comments
Closed

RTX4090 CUDA out of memory. #7

WuNein opened this issue Mar 6, 2023 · 3 comments

Comments

@WuNein
Copy link

WuNein commented Mar 6, 2023

I am using the latest version of nvidia-docker of pytorch, with support for cuda 12.
I complie the cuda 118 version of bit lib, since the code require bitxxx_cuda118.so .
Tested on 7B version, OK.
13B, CUDA out of memory. About 1-2G less.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 23.65 GiB total capacity; 22.68 GiB already allocated; 41.31 MiB free; 23.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

No OOM error, 64Gb memory installed.

I doubt whether RTX4090 can actually run 13B model.
Please share more detailed imformation of your device.

@ElRoberto538
Copy link

Just a random guess as I haven't tried yet, but is ECC enabled on your card? Try disabling it with nvidia-smi -e 0, re-enable with nvidia-smi -e 1.

@WuNein
Copy link
Author

WuNein commented Mar 6, 2023

Just a random guess as I haven't tried yet, but is ECC enabled on your card? Try disabling it with nvidia-smi -e 0, re-enable with nvidia-smi -e 1.

4090 do not have ecc... the first checkpoint take 13G, the rest is not enough.

@WuNein
Copy link
Author

WuNein commented Mar 6, 2023

change max_seq_len to 256, which is able to run 13B on 4090.

@WuNein WuNein closed this as completed Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants