Skip to content

使用llama运行gemma3,识别图片时报错 #3416

@lakako

Description

@lakako

System Info / 系統信息

官方docker镜像v1.5.1

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
    pip install / 通过 pip install 安装
    installation from source / 从源码安装

Version info / 版本信息

官方docker镜像,版本1.5.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local -H 0.0.0.0

Reproduction / 复现过程

  1. 使用dify接入xinferecne模型
  2. 聊天的时候输入图片

出现错误

supervisor-1  | Traceback (most recent call last):
supervisor-1  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/llama_cpp/core.py", line 308, in _handle_chat_completion
supervisor-1  |     self._llm.handle_chat_completions(
supervisor-1  |   File "xllamacpp.pyx", line 2073, in xllamacpp.xllamacpp.Server.handle_chat_completions
supervisor-1  | RuntimeError: Failed to parse messages: Unsupported content part type: "image_url"; messages = [
supervisor-1  |   {
supervisor-1  |     "role": "user",
supervisor-1  |     "content": "你好"
supervisor-1  |   },
supervisor-1  |   {
supervisor-1  |     "role": "assistant",
supervisor-1  |     "content": "你好!很高兴认识你。有什么我可以帮助你的吗? 😊"
supervisor-1  |   },
supervisor-1  |   {
supervisor-1  |     "role": "user",
supervisor-1  |     "content": "你是谁"
supervisor-1  |   },
supervisor-1  |   {
supervisor-1  |     "role": "assistant",
supervisor-1  |     "content": "我是一个大型语言模型,由 Google 训练。 \n\n简单来说,我是一个人工智能程序,可以理解和生成人类语言。 我可以:\n\n*   回答你的问题\n*   写不同类型的文本格式,例如诗歌、代码、脚本、音乐作品、电子邮件、信件等。\n*   翻译语言\n*   总结文本\n*   进行对话\n\n我还在不断学习和进步!\n\n你有什么想问我的吗?"
supervisor-1  |   },
supervisor-1  |   {
supervisor-1  |     "role": "user",
supervisor-1  |     "content": [
supervisor-1  |       {
supervisor-1  |         "type": "text",
supervisor-1  |         "text": "图里有什么"
supervisor-1  |       },
supervisor-1  |       {
supervisor-1  |         "type": "image_url",
supervisor-1  |         "image_url": {
supervisor-1  |           "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADkAAAAbCAIAAABJHrDvAAACwUlEQVRYCdWX3Y6aQBTHeSnvfQXvvSg3TXeTte1eaNImbbKT9JJ3MLEXXa/aRzAm+xQqg4yi8g2CMOC0MwjChhU2TSo1kzhzOMP5zeE/X5zt+v9L4TJQy/EaVTKwrEJZLcdbb1QI0Xwhzebw6mW+kCBE641qOV4Gars+pxk2faBsD0EQxzFpwC+O40MQrJUthEgz7AyXo23daABhCYKq6RAi094nuBxaKSVejTGt1puVskvEwPn+oTFgJSCe54tQTlLLRVEjNFqCyUwYR/OFZFiu5XjcS06p/QnwY5Q2CHkCbWF6bhI0vGu1Oy+Ugmeu0+uqsznUzcusE6FAwAuALzB1h0tCKGtSofHhuAueUpDliP9nrCzmFHRa/B1gWAlZq905wzWHlYICAbSFEUveFHS6w3HSbKX5q6+B+Hh03H2adfrv7vfx8Zi3lNYrNbAc8R0wIVSgTK/pt071OhFaBTuLclEDox8/+ZuBfzgtO3vPf/Ou//3xVylf3ljJypyfSTY3hzIZoOEdG1I1a4hxr//wtvc5CMIgCPnbQa//EEVRHqu0Xo+1tGvRWF8DhJAQ4w8D8H4Abu+/3n/6FmJcfFl5q5r1IgRbEJgGpiCRCgtzUQMJSIjxzccvvf5DTVBCSDXr8zFOhGw+5R4VF6YarISQwyGoD/oKVrYUsCUzYYXjbn7NSmfYib4ea26otarVeU00QOdNOsPy86nVpp++IIBsL2DjoftIYcOrhVXqVM1a2u0qxjNrw88uURTPF9LpPOB5/lWyVTMoPROK8ol1td7U7HYVN7RS5NXmdCYURVnV9KtwVAZVNZ3i6fbprL3dGaIoS0tk2w7G1TteZYC/dwhDbNuOtESiKG93RiIAeo/VTVfVLRkpoijPmnHnni0kUZRlpKi6pZvu+W5o2nuGa+80c6saDSk7zVR1Ow/6J6+/AVY5lnn59SJwAAAAAElFTkSuQmCC",
supervisor-1  |           "detail": "high"
supervisor-1  |         }
supervisor-1  |       }
supervisor-1  |     ]
supervisor-1  |   }
supervisor-1  | ]

Expected behavior / 期待表现

正常识别图片内容

Activity

added this to the v1.x milestone on May 9, 2025
qinxuye

qinxuye commented on May 9, 2025

@qinxuye
Contributor

@codingl2k1 llama.cpp 支持 gemma-3 图片输入了吗?

codingl2k1

codingl2k1 commented on May 9, 2025

@codingl2k1
Contributor

@codingl2k1 llama.cpp 支持 gemma-3 图片输入了吗?

目前 llama server 还不支持 multimodal(wip):https://github.com/ggml-org/llama.cpp/tree/master/tools/server
有个 libmtmd 提供了 multimodal 的功能,但不是个完整的 server(上面那个 server 的 multimodal 还在开发中)

lakako

lakako commented on May 9, 2025

@lakako
Author

好吧(捂脸),谢谢解答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @qinxuye@lakako@XprobeBot@codingl2k1

      Issue actions

        使用llama运行gemma3,识别图片时报错 · Issue #3416 · xorbitsai/inference