Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FineTune CUDA out of memory #3

Open
freecow opened this issue Oct 29, 2023 · 10 comments
Open

FineTune CUDA out of memory #3

freecow opened this issue Oct 29, 2023 · 10 comments

Comments

@freecow
Copy link

freecow commented Oct 29, 2023

(chatglm3-finetune) root@g101:/data/ChatGLM3/chatglm3-finetune# python finetune.py --dataset_path ./alpaca --lora_rank 4 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --max_steps 52000 --save_steps 1000 --save_total_limit 20 --learning_rate 1e-4 --remove_unused_columns false --logging_steps 50 --output_dir output
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████| 7/7 [00:08<00:00, 1.22s/it]
Traceback (most recent call last):
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 70, in
main()
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 55, in main
model = get_peft_model(model, peft_config).to("cuda:1")
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
return self._apply(convert)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1016.00 MiB (GPU 1; 23.69 GiB total capacity; 22.27 GiB already allocated; 691.69 MiB free; 22.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@Jeru2023
Copy link

Same here, PyTorch reserved too much memory...

@Jeru2023
Copy link

Try to modify finetune line 38 to set load_in_8bit to true:
model = AutoModel.from_pretrained(
"{your model path}", load_in_8bit=True, trust_remote_code=True, device_map="auto"
).cuda()

@freecow
Copy link
Author

freecow commented Oct 29, 2023

finetune.py line 34 to set load_in_8bit to true and delete half():
Original:
model = ChatGLMForConditionalGeneration.from_pretrained(
"model", load_in_8bit=False, trust_remote_code=False, device_map="auto"
).half()

Modified:
model = ChatGLMForConditionalGeneration.from_pretrained(
"model", load_in_8bit=True, trust_remote_code=False, device_map="auto"
)

Error Message:
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)

@xxw1995
Copy link
Owner

xxw1995 commented Oct 30, 2023

修改 device_map 参数来指定设备。如果你想使用 GPU,将 device_map="auto" 修改为 device_map="cuda"。如果你想使用 CPU,将其修改为 device_map="cpu"

@Jeru2023
Copy link

修改 device_map 参数来指定设备。如果你想使用 GPU,将 device_map="auto" 修改为 device_map="cuda"。如果你想使用 CPU,将其修改为 device_map="cpu"

In my case, device_map needs to be set to cuda:0 instead of cuda

@chenmins
Copy link

刚刚测试了,需要 26GB GPU 显存。
Mon Oct 30 11:36:26 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 |
| N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 713 C python 26303MiB |
+-----------------------------------------------------------------------------+

@freecow
Copy link
Author

freecow commented Oct 30, 2023

刚刚测试了,需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的,毕竟好像也不能多卡FineTune

@Jeru2023
Copy link

24G够了,我是4090单卡,一个epoch10秒,还挺快

@Jeru2023
Copy link

刚刚测试了,需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的,毕竟好像也不能多卡FineTune

今天用3090也试了一下,没问题的

@sukibean163
Copy link

sukibean163 commented Nov 8, 2023

刚刚测试了,需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的,毕竟好像也不能多卡FineTune

今天用3090也试了一下,没问题的

为什么我用24G的4090跑,也是出了同样的问题?
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants