FineTune CUDA out of memory #3

freecow · 2023-10-29T13:41:15Z

(chatglm3-finetune) root@g101:/data/ChatGLM3/chatglm3-finetune# python finetune.py --dataset_path ./alpaca --lora_rank 4 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --max_steps 52000 --save_steps 1000 --save_total_limit 20 --learning_rate 1e-4 --remove_unused_columns false --logging_steps 50 --output_dir output
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████| 7/7 [00:08<00:00, 1.22s/it]
Traceback (most recent call last):
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 70, in
main()
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 55, in main
model = get_peft_model(model, peft_config).to("cuda:1")
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
return self._apply(convert)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1016.00 MiB (GPU 1; 23.69 GiB total capacity; 22.27 GiB already allocated; 691.69 MiB free; 22.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

Jeru2023 · 2023-10-29T14:02:54Z

Same here, PyTorch reserved too much memory...

Jeru2023 · 2023-10-29T14:20:25Z

Try to modify finetune line 38 to set load_in_8bit to true:
model = AutoModel.from_pretrained(
"{your model path}", load_in_8bit=True, trust_remote_code=True, device_map="auto"
).cuda()

freecow · 2023-10-29T15:36:13Z

finetune.py line 34 to set load_in_8bit to true and delete half():
Original:
model = ChatGLMForConditionalGeneration.from_pretrained(
"model", load_in_8bit=False, trust_remote_code=False, device_map="auto"
).half()

Modified:
model = ChatGLMForConditionalGeneration.from_pretrained(
"model", load_in_8bit=True, trust_remote_code=False, device_map="auto"
)

Error Message:
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)

xxw1995 · 2023-10-30T00:50:15Z

修改 device_map 参数来指定设备。如果你想使用 GPU，将 device_map="auto" 修改为 device_map="cuda"。如果你想使用 CPU，将其修改为 device_map="cpu"

Jeru2023 · 2023-10-30T01:37:16Z

修改 device_map 参数来指定设备。如果你想使用 GPU，将 device_map="auto" 修改为 device_map="cuda"。如果你想使用 CPU，将其修改为 device_map="cpu"

In my case, device_map needs to be set to cuda:0 instead of cuda

chenmins · 2023-10-30T03:36:39Z

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 713 C python 26303MiB |
+-----------------------------------------------------------------------------+

freecow · 2023-10-30T07:08:23Z

刚刚测试了，需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的，毕竟好像也不能多卡FineTune

Jeru2023 · 2023-10-30T13:02:01Z

24G够了，我是4090单卡，一个epoch10秒，还挺快

Jeru2023 · 2023-10-31T14:28:22Z

刚刚测试了，需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的，毕竟好像也不能多卡FineTune

今天用3090也试了一下，没问题的

sukibean163 · 2023-11-08T03:39:28Z

刚刚测试了，需要 26GB GPU 显存。 Mon Oct 30 11:36:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:A1:00.0 Off | 0 | | N/A 51C P0 327W / 400W | 26305MiB / 81920MiB | 95% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 713 C python 26303MiB | +-----------------------------------------------------------------------------+

那看来不是3090这种24G能玩起来的，毕竟好像也不能多卡FineTune

今天用3090也试了一下，没问题的

为什么我用24G的4090跑，也是出了同样的问题？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FineTune CUDA out of memory #3

FineTune CUDA out of memory #3

freecow commented Oct 29, 2023

Jeru2023 commented Oct 29, 2023

Jeru2023 commented Oct 29, 2023

freecow commented Oct 29, 2023

xxw1995 commented Oct 30, 2023

Jeru2023 commented Oct 30, 2023

chenmins commented Oct 30, 2023

freecow commented Oct 30, 2023

Jeru2023 commented Oct 30, 2023

Jeru2023 commented Oct 31, 2023

sukibean163 commented Nov 8, 2023 •

edited

Loading

FineTune CUDA out of memory #3

FineTune CUDA out of memory #3

Comments

freecow commented Oct 29, 2023

Jeru2023 commented Oct 29, 2023

Jeru2023 commented Oct 29, 2023

freecow commented Oct 29, 2023

xxw1995 commented Oct 30, 2023

Jeru2023 commented Oct 30, 2023

chenmins commented Oct 30, 2023

freecow commented Oct 30, 2023

Jeru2023 commented Oct 30, 2023

Jeru2023 commented Oct 31, 2023

sukibean163 commented Nov 8, 2023 • edited Loading

sukibean163 commented Nov 8, 2023 •

edited

Loading