Open
Description
Is there an existing issue for this bug?
- I have searched the existing issues
The bug has not been fixed in the latest main branch
- I have checked the latest main branch
Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)
Yes, I will share a minimal reproducible script.
🐛 Describe the bug
colossal=0.4.9,Hybrid Parallel Plugin,zero_stage=1,zero_cpu_offload=true,在八张A100显卡上训练QWQ32B,当pp=2,tp=4时程序正常运行,但GPU显存占用很少,80G的显卡只占用了20G,而CPU内存占用较大,占满了服务器CPU内存,增大max_length后报错:terminate called after throwing an instance of 'c10::Error' what() Cuda error: unspecified launch failure cuda kernel errors might be asynchronously reported at some other API call so the stacktrace bellow might be incorrect,如何增大GPU显存占用、减小CPU内存占用?
Environment
No response