Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logprobs is not supported for models created with logits_all=False #1911

Open
1 of 3 tasks
sandro-qiang opened this issue Jul 21, 2024 · 3 comments
Open
1 of 3 tasks
Labels
Milestone

Comments

@sandro-qiang
Copy link

sandro-qiang commented Jul 21, 2024

System Info / 系統信息

NVIDIA-SMI 555.52.04 Driver Version: 555.52.04 CUDA Version: 12.5
Ubuntu 22.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

0.13.2

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local -H '0.0.0.0'

Reproduction / 复现过程

gemma-2-it, gguf格式, q4-k-m

Expected behavior / 期待表现

用langflow的openai节点,估计是因为他默认调用时候logprobs不为None,然后xinference创建context时候logits_all=False。建议对于不同的模型后端检查下openai传递的参数。下面是调用栈。

另外,llama.cpp引擎无法设置system_prompt,设置了不生效。

xinference  | 2024-07-21 07:12:16,092 xinference.api.restful_api 1 ERROR    Chat completion stream got an error: [address=0.0.0.0:41803, pid=54] logprobs is not supported for models created with logits_all=False
xinference  | Traceback (most recent call last):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1656, in stream_results
xinference  |     async for item in iterator:
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 340, in __anext__
xinference  |     return await self._actor_ref.__xoscar_next__(self._uid)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
xinference  |     return self._process_result_message(result)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference  |     raise message.as_instanceof_cause()
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send
xinference  |     result = await self._run_coro(message.message_id, coro)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro
xinference  |     return await coro
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
xinference  |     return await super().__on_receive__(message)  # type: ignore
xinference  |   File "xoscar/core.pyx", line 558, in __on_receive__
xinference  |     raise ex
xinference  |   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference  |     async with self._lock:
xinference  |   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference  |     with debug_async_timeout('actor_lock_timeout',
xinference  |   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference  |     result = await result
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 431, in __xoscar_next__
xinference  |     raise e
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 417, in __xoscar_next__
xinference  |     r = await asyncio.to_thread(_wrapper, gen)
xinference  |   File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference  |     return await loop.run_in_executor(None, func_call)
xinference  |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference  |     result = self.fn(*self.args, **self.kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 402, in _wrapper
xinference  |     return next(_gen)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 318, in _to_generator
xinference  |     for v in gen:
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/utils.py", line 558, in _to_chat_completion_chunks
xinference  |     for i, chunk in enumerate(chunks):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/ggml/llamacpp.py", line 212, in generator_wrapper
xinference  |     for index, _completion_chunk in enumerate(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1106, in _create_completion
xinference  |     raise ValueError(
xinference  | ValueError: [address=0.0.0.0:41803, pid=54] logprobs is not supported for models created with logits_all=False
@XprobeBot XprobeBot added the gpu label Jul 21, 2024
@XprobeBot XprobeBot added this to the v0.13.2 milestone Jul 21, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Jul 23, 2024

有兴趣提交代码修复这个问题吗?

@sandro-qiang
Copy link
Author

明天我抽时间看下,如果能解决我发pr。

@sandro-qiang
Copy link
Author

界面launch的时候,Additional parameters passed to the inference engine增加logits_all为true就可以了,不用改代码,另外llama-cpp-python把logits_all标记了deprecated了,现在也没必要动它。

system prompt不生效是因为gemma-2本身就不支持system prompt,不是bug。

@XprobeBot XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants