制作了gptq 4bit版本 #107

MrToy started this conversation in Show and tell

MrToy
Apr 10, 2023

https://huggingface.co/mrtoy/chinese-llama-13b-4bit-128g
效果还行，和fp16输出的内容差不多

Replies: 3 comments 3 replies

controZheng
Apr 10, 2023

https://huggingface.co/mrtoy/chinese-llama-13b-4bit-128g 效果还行，和fp16输出的内容差不多

咋用的啊

1 reply

MrToy Apr 10, 2023
Author

https://github.com/oobabooga/text-generation-webui 支持加载gptq权重
https://github.com/qwopqwop200/GPTQ-for-LLaMa 里提供了如何api加载相关的代码

ymcui
Apr 10, 2023
Maintainer

感谢热心分享，由于不属于issue，我转移到Discussion板块了，欢迎大家参与讨论。

2 replies

ymcui Apr 10, 2023
Maintainer

另外，名称建议改为alpaca，因为看到词表是49954

ymcui Apr 10, 2023
Maintainer

简单试了一下，似乎llama.cpp还不支持新版GPTQ转成GGML格式，目前只能通过GPTQ-for-LLaMa进行预测。

SkywalkerSpace
Apr 18, 2023

感谢分享，亲测可用，大约占用 8G 显存。附上 GPTQ-for-LLaMa CUDA 命令

CUDA_VISIBLE_DEVICES=0 python llama_inference.py /root/.cache/huggingface/hub/models--mrtoy--chinese-llama-13b-4bit-128g/snapshots/7040c8c85f0a838a2758cd54fdffddf6d70ac895/ --wbits 4 --groupsize 128 --load /root/.cache/huggingface/hub/models--mrtoy--chinese-llama-13b-4bit-128g/snapshots/7040c8c85f0a838a2758cd54fdffddf6d70ac895/llama-zh-13b-4bit-128g.pt --max_length 512 --text ""

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment