关于多卡部署 #1199

Anfeather · 2024-05-08T02:25:22Z

System Info / 系統信息

2080ti多卡

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

多卡运行GLM时，运行代码generated_text_GLM, history = model_GLM.chat(tokenizer, prompt, history=[])，
显示RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
我确定cuda版本正确，并且batchsize只有1. 请问有人遇到过类似问题吗

Expected behavior / 期待表现

The text was updated successfully, but these errors were encountered:

zRzRzRzRzRzRzR · 2024-05-08T08:31:26Z

多卡推理的话，你这个是哪个代码，而且我看了你这个错误好像是驱动和cuda级的，不是代码错误，请你发个完整的官方代码运行的位置和完整报错

Anfeather · 2024-05-09T02:41:05Z

多卡推理的话，你这个是哪个代码，而且我看了你这个错误好像是驱动和cuda级的，不是代码错误，请你发个完整的官方代码运行的位置和完整报错

我找到出bug的原因了，当我在一个py文件中同时导入blip2和glm-6b模型时，就会报错，如果只是导入单一模型则没有问题。相关代码如下：
local_path = "./blip2-opt-2.7b"
processor = Blip2Processor.from_pretrained(local_path)
model_large = Blip2ForConditionalGeneration.from_pretrained(
local_path, torch_dtype=torch.float16, device_map="auto"
)
model_large.eval()

tokenizer = AutoTokenizer.from_pretrained("./chatglm3-6b", trust_remote_code=True)
model_GLM = AutoModel.from_pretrained("./chatglm3-6b", trust_remote_code=True, device_map="auto")
model_GLM = model_GLM.eval()
generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[])

完整报错如下：

Traceback (most recent call last):
File "/home2/an/project/DataShunt+/image_caption/a-PyTorch-Tutorial-to-Image-Captioning-master-2/eval_DS_PT.py", line 73, in
generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[])
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1042, in chat
outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
return self.sample(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 2468, in sample
outputs = self(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 834, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 641, in forward
layer_ret = layer(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

zRzRzRzRzRzRzR · 2024-05-09T03:11:59Z

嗯那，正常是单独导入，不然分配可能出现问题

zRzRzRzRzRzRzR self-assigned this May 8, 2024

zRzRzRzRzRzRzR closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于多卡部署 #1199

关于多卡部署 #1199

Anfeather commented May 8, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented May 8, 2024

Anfeather commented May 9, 2024

zRzRzRzRzRzRzR commented May 9, 2024

关于多卡部署 #1199

关于多卡部署 #1199

Comments

Anfeather commented May 8, 2024 • edited Loading

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented May 8, 2024

Anfeather commented May 9, 2024

zRzRzRzRzRzRzR commented May 9, 2024

Anfeather commented May 8, 2024 •

edited

Loading