Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于多卡部署 #1199

Closed
1 of 2 tasks
Anfeather opened this issue May 8, 2024 · 3 comments
Closed
1 of 2 tasks

关于多卡部署 #1199

Anfeather opened this issue May 8, 2024 · 3 comments
Assignees

Comments

@Anfeather
Copy link

Anfeather commented May 8, 2024

System Info / 系統信息

2080ti多卡

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

多卡运行GLM时,运行代码generated_text_GLM, history = model_GLM.chat(tokenizer, prompt, history=[]),
显示RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
我确定cuda版本正确,并且batchsize只有1. 请问有人遇到过类似问题吗

Expected behavior / 期待表现

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this May 8, 2024
@zRzRzRzRzRzRzR
Copy link
Member

多卡推理的话,你这个是哪个代码,而且我看了你这个错误好像是驱动和cuda级的,不是代码错误,请你发个完整的官方代码运行的位置和完整报错

@Anfeather
Copy link
Author

多卡推理的话,你这个是哪个代码,而且我看了你这个错误好像是驱动和cuda级的,不是代码错误,请你发个完整的官方代码运行的位置和完整报错

我找到出bug的原因了,当我在一个py文件中同时导入blip2和glm-6b模型时,就会报错,如果只是导入单一模型则没有问题。相关代码如下:
local_path = "./blip2-opt-2.7b"
processor = Blip2Processor.from_pretrained(local_path)
model_large = Blip2ForConditionalGeneration.from_pretrained(
local_path, torch_dtype=torch.float16, device_map="auto"
)
model_large.eval()

tokenizer = AutoTokenizer.from_pretrained("./chatglm3-6b", trust_remote_code=True)
model_GLM = AutoModel.from_pretrained("./chatglm3-6b", trust_remote_code=True, device_map="auto")
model_GLM = model_GLM.eval()
generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[])

完整报错如下:

Traceback (most recent call last):
File "/home2/an/project/DataShunt+/image_caption/a-PyTorch-Tutorial-to-Image-Captioning-master-2/eval_DS_PT.py", line 73, in
generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[])
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1042, in chat
outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
return self.sample(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 2468, in sample
outputs = self(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 834, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 641, in forward
layer_ret = layer(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

@zRzRzRzRzRzRzR
Copy link
Member

嗯那,正常是单独导入,不然分配可能出现问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants