Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune error: RuntimeError: FIND was unable to find an engine to execute this computation #52

Closed
yt7589 opened this issue May 24, 2023 · 2 comments

Comments

@yt7589
Copy link

yt7589 commented May 24, 2023

I had download the latest version of VisualGLM-6B. I used the following commands to setup the development environment:

conda create -n glm python=3.9
conda activate glm
git clone https://github.com/THUDM/VisualGLM-6B.git
cd VisualGLM-6B
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt
# edit finetune/finetune_visualglm.sh to set NUM_GPUS_PER_WORKER=2 which is the number of GPU in my server
unzip fewshot-data.zip
bash finetune/finetune_visualglm.sh

It reported errors as below:

Traceback (most recent call last):
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 188, in <module>
    training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 130, in training_main
    iteration, skipped = train(model, optimizer,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 274, in train
    lm_loss, skipped_iter, metrics = train_step(train_data_iterator,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 348, in train_step
    forward_ret = forward_step(data_iterator, model, args, timers, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 84, in forward_step
    logits = model(input_ids=tokens, image=image, pre_image=pre_image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1724, in forward
    loss = self.module(*inputs, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 192, in forward
    return super().forward(input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/visualglm.py", line 20, in word_embedding_forward
    image_emb = self.model(**kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 65, in forward
    enc = self.vit(image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 29, in forward
    return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/vit_model.py", line 55, in word_embedding_forward
    embeddings = self.proj(images)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: FIND was unable to find an engine to execute this computation

Please note that I found the version of my pytorch is 2.0. Dose VisualGLM-6B have something wrong with Pytorch 2.0?

@Sleepychord
Copy link
Contributor

Could you try reinstall pytorch , which matches your CUDA version? I think 2.0 is okay but 13.1 supports more cuda versions.

@yt7589
Copy link
Author

yt7589 commented May 26, 2023

@Sleepychord Thanks. Installed the right version (11.7) of CUDA solved my problem.

@yt7589 yt7589 closed this as completed May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants