建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

xbl916 · 2024-02-23T13:40:55Z

Describe the bug

Qwen1.5-14B-Chat-GGUF模型无法运行。
日志见附件
log.txt

我使用的环境是python3.10.官方的0.9.0docker镜像

我的显卡环境是

为了解决此问题我还尝试过使用pip install安装0.9.0版本一样的报错

xbl916 · 2024-02-24T05:01:07Z

上面是之前的信息，昨晚我尝试后发现，实际上是显存不足导致。
导致排查困难的原因是默认的日志logs/local_1708712683141/xinference.log中并不会打打印 LlamaCpp的具体加载信息，是直接报错。而 xinference-local --log-level DEBUG --host 0.0.0.0 --port 9997 这种方式启动时直接控制台的输出是有LlamaCpp的具体加载信息，类似

因此能看到Qwen1.5-14B-Chat-GGUF模型默认32K长度载入是需要24G以上的显存，而我的显卡T4只有16G所以失败了.我将其改为8192长度载入，占用15G显存就正常了。
因此希望能做修改
1、logs/local_1708712683141/xinference.log默认的这个日志也能记录LlamaCpp的具体加载信息，毕竟通常都是nohup后台运行，xinference-local直接的控制台输出一般都看不到
2、界面上能支持配置模型的载入长度，方便小显存设备使用。我目前是通过浏览器f12的控制台设置8192长度启动的"n_ctx": 8192
fetch("http://192.168.2.2:9998/v1/models", {
"headers": {
"accept": "/",
"accept-language": "zh-CN,zh;q=0.9",
"content-type": "application/json"
},
"referrer": "http://192.168.2.2:9998/ui/",
"referrerPolicy": "strict-origin-when-cross-origin",
"body": "{"model_uid":"chatglm3-6b","model_name":"qwen1.5-chat","model_format":"ggufv2","model_size_in_billions":14,"quantization":"q2_k","n_gpu":"auto","n_ctx": 8192,"replica":1}",
"method": "POST",
"mode": "cors",
"credentials": "include"
});

XprobeBot added this to the v0.9.1 milestone Feb 23, 2024

xbl916 changed the title ~~BUG Qwen1.5-14B-Chat-GGUF模型无法运行~~ 建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 Feb 24, 2024

qinxuye mentioned this issue Feb 27, 2024

BUG: Failed to launch qwe1.5-14b-GGUF using Xinferrence but works fine in pure llama_cpp python #1043

Open

XprobeBot modified the milestones: v0.9.1, v0.9.2, v0.9.3 Mar 1, 2024

XprobeBot modified the milestones: v0.9.3, v0.9.4, v0.9.5 Mar 15, 2024

XprobeBot modified the milestones: v0.10.0, v0.10.1 Mar 29, 2024

XprobeBot modified the milestones: v0.10.1, v0.10.2 Apr 12, 2024

XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024

XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024

XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024

XprobeBot modified the milestones: v0.12.2, v0.12.4 Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

xbl916 commented Feb 23, 2024

xbl916 commented Feb 24, 2024

建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

建议增加界面n_ctx参数配置。Qwen1.5-14B-Chat-GGUF模型默认小显存无法运行 #1038

Comments

xbl916 commented Feb 23, 2024

Describe the bug

xbl916 commented Feb 24, 2024