Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7B onnx模型(float16) 占用显存超过32G #19

Open
iamhere1 opened this issue Jun 13, 2023 · 0 comments
Open

7B onnx模型(float16) 占用显存超过32G #19

iamhere1 opened this issue Jun 13, 2023 · 0 comments

Comments

@iamhere1
Copy link

iamhere1 commented Jun 13, 2023

机器配置:32G GPU

实验步骤:

  1. 将这里 (https://github.com/ymcui/Chinese-LLaMA-Alpaca#%E6%A8%A1%E5%9E%8B%E4%B8%8B%E8%BD%BD) Chinese-Alpaca-Plus-7B 模型和原版LLaMA进行合并,得到一个新的模型new_chinese;
  2. 使用本项目说明文档提到的方法 (https://github.com/tpoisonooo/llama.onnx),将new_chinese模型转为onnx模型 new_chinese_onnx;
  3. 使用本项目说明文档提到的方法 (https://github.com/tpoisonooo/llama.onnx),将new_chinese_onnx模型转化为精度是fp16的模型new_chinese_onnx_fp16;
  4. 运行python3 demo_llama.py,
    使用GPU:模型加载完毕之前,程序会因显存不足报错;
    使用CPU,能加载成功

问题:
在转onnx之前,模型可以被加载到显存并成功执行inference;
转onnx之后,无论是否继续转为精度fp16的模型,都因显存不足而报错。
请问,对于当前的32G显存gpu, 有什么方法,可以使用转化后的onnx模型来做预测吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant