RTX4090 CUDA out of memory. #7

WuNein · 2023-03-06T08:38:09Z

I am using the latest version of nvidia-docker of pytorch, with support for cuda 12.
I complie the cuda 118 version of bit lib, since the code require bitxxx_cuda118.so .
Tested on 7B version, OK.
13B, CUDA out of memory. About 1-2G less.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 23.65 GiB total capacity; 22.68 GiB already allocated; 41.31 MiB free; 23.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

No OOM error, 64Gb memory installed.

I doubt whether RTX4090 can actually run 13B model.
Please share more detailed imformation of your device.

The text was updated successfully, but these errors were encountered:

ElRoberto538 · 2023-03-06T10:51:18Z

Just a random guess as I haven't tried yet, but is ECC enabled on your card? Try disabling it with nvidia-smi -e 0, re-enable with nvidia-smi -e 1.

WuNein · 2023-03-06T12:31:44Z

Just a random guess as I haven't tried yet, but is ECC enabled on your card? Try disabling it with nvidia-smi -e 0, re-enable with nvidia-smi -e 1.

4090 do not have ecc... the first checkpoint take 13G, the rest is not enough.

WuNein · 2023-03-06T12:36:50Z

change max_seq_len to 256, which is able to run 13B on 4090.

WuNein closed this as completed Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX4090 CUDA out of memory. #7

RTX4090 CUDA out of memory. #7

WuNein commented Mar 6, 2023

ElRoberto538 commented Mar 6, 2023

WuNein commented Mar 6, 2023

WuNein commented Mar 6, 2023

RTX4090 CUDA out of memory. #7

RTX4090 CUDA out of memory. #7

Comments

WuNein commented Mar 6, 2023

ElRoberto538 commented Mar 6, 2023

WuNein commented Mar 6, 2023

WuNein commented Mar 6, 2023