Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

Open
xbl916 opened this issue Feb 23, 2024 · 1 comment
Milestone

Comments

@xbl916
Copy link

xbl916 commented Feb 23, 2024

Describe the bug

Qwen1.5-14B-Chat-GGUF模型无法运行。
日志见附件
log.txt

我使用的环境是python3.10.官方的0.9.0docker镜像
image
我的显卡环境是
image
为了解决此问题我还尝试过 使用pip install安装0.9.0版本 一样的报错

@XprobeBot XprobeBot added this to the v0.9.1 milestone Feb 23, 2024
@xbl916
Copy link
Author

xbl916 commented Feb 24, 2024

上面是之前的信息,昨晚我尝试后发现,实际上是显存不足导致。
导致排查困难的原因是默认的日志logs/local_1708712683141/xinference.log中并不会打打印 LlamaCpp的具体加载信息,是直接报错。而 xinference-local --log-level DEBUG --host 0.0.0.0 --port 9997 这种方式启动时直接控制台的输出是有LlamaCpp的具体加载信息,类似
image
因此能看到Qwen1.5-14B-Chat-GGUF模型默认32K长度载入是需要24G以上的显存,而我的显卡T4只有16G所以失败了.我将其改为8192长度载入,占用15G显存就正常了。
因此希望能做修改
1、logs/local_1708712683141/xinference.log默认的这个日志也能记录LlamaCpp的具体加载信息,毕竟通常都是nohup后台运行,xinference-local直接的控制台输出一般都看不到
2、界面上能支持配置模型的载入长度,方便小显存设备使用。我目前是通过浏览器f12的控制台设置8192长度启动的"n_ctx": 8192
fetch("http://192.168.2.2:9998/v1/models", {
"headers": {
"accept": "/",
"accept-language": "zh-CN,zh;q=0.9",
"content-type": "application/json"
},
"referrer": "http://192.168.2.2:9998/ui/",
"referrerPolicy": "strict-origin-when-cross-origin",
"body": "{"model_uid":"chatglm3-6b","model_name":"qwen1.5-chat","model_format":"ggufv2","model_size_in_billions":14,"quantization":"q2_k","n_gpu":"auto","n_ctx": 8192,"replica":1}",
"method": "POST",
"mode": "cors",
"credentials": "include"
});

@xbl916 xbl916 changed the title BUG Qwen1.5-14B-Chat-GGUF模型无法运行 建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 Feb 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.9.1, v0.9.2, v0.9.3 Mar 1, 2024
@XprobeBot XprobeBot modified the milestones: v0.9.3, v0.9.4, v0.9.5 Mar 15, 2024
@XprobeBot XprobeBot modified the milestones: v0.10.0, v0.10.1 Mar 29, 2024
@XprobeBot XprobeBot modified the milestones: v0.10.1, v0.10.2 Apr 12, 2024
@XprobeBot XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.2, v0.12.4 Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants