[Bug]: ChatCompletion prompt_logprobs does not work #3657

noamgat · 2024-03-27T08:41:55Z

Your current environment

PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.107+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB

Nvidia driver version: 525.105.17
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          24
On-line CPU(s) list:             0-23
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family:                      6
Model:                           85
Thread(s) per core:              2
Core(s) per socket:              12
Socket(s):                       1
Stepping:                        7
BogoMIPS:                        4400.41
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       384 KiB (12 instances)
L1i cache:                       384 KiB (12 instances)
L2 cache:                        12 MiB (12 instances)
L3 cache:                        38.5 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-23
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] Could not collectROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	GPU1	CPU Affinity	NUMA Affinity
GPU0	 X 	NV12	0-23		N/A
GPU1	NV12	 X 	0-23		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

According to the documentation here:

https://github.com/vllm-project/vllm/blob/8f44facdddcf3c704f7d6a2719b6e85efc393449/vllm/entrypoints/openai/protocol.py#L97C1-L98C1

It should be possible to get prompt logprobs with chat api by doing a command like this:

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"messages": [
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The chicago bulls did! Dennis Rodman was named MVP."}
],
"echo": true,
"prompt_logprobs": true,
"top_logprobs": 1,
"max_tokens": 5
}'

(Blatantly wrong answer is on purpose)

However, I do not get any logprobs in the response:

{"id":"cmpl-74746f2daa8542fa92383a0d2f180e2a","object":"chat.completion","created":823018,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":"The chicago bulls did! Dennis Rodman was named MVP.\n\nOh, wait"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":37,"total_tokens":42,"completion_tokens":5}}%

How come?
According to this commit:
70f3e8e

It should be in v0.3.3, which is what we are using.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-05-31T03:29:47Z

The latest vLLM version (0.4.3) should fix this issue, thanks to #5029.

Some-random · 2024-06-05T15:06:59Z

Sorry I don't think I'm able to get a correct response with this code. The output I got is {"object":"error","message":"[{'type': 'extra_forbidden', 'loc': ('body', 'prompt_logprobs'), 'msg': 'Extra inputs are not permitted', 'input': True}]","type":"BadRequestError","param":null,"code":400}

Is prompt_logprobs supported in ChatCompletion or not?

DarkLight1337 · 2024-06-05T15:24:34Z

Sorry I misread the OP. It's technically not a bug since prompt_logprobs is not supported by OpenAI API. Perhaps you can open a feature request (or convert your existing issue into one)?

noamgat added the bug Something isn't working label Mar 27, 2024

noamgat changed the title ~~[Bug]: ChatCompletion prompt tokens does not work~~ [Bug]: ChatCompletion prompt_logprobs does not work Mar 27, 2024

DarkLight1337 closed this as completed May 31, 2024

Some-random mentioned this issue Jun 5, 2024

[Bug]: prompt_logprobs doesn't work with openai compatible server #5264

Closed

DarkLight1337 reopened this Jun 5, 2024

DarkLight1337 closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ChatCompletion prompt_logprobs does not work #3657

[Bug]: ChatCompletion prompt_logprobs does not work #3657

noamgat commented Mar 27, 2024

DarkLight1337 commented May 31, 2024

Some-random commented Jun 5, 2024

DarkLight1337 commented Jun 5, 2024 •

edited

Loading

[Bug]: ChatCompletion prompt_logprobs does not work #3657

[Bug]: ChatCompletion prompt_logprobs does not work #3657

Comments

noamgat commented Mar 27, 2024

Your current environment

🐛 Describe the bug

DarkLight1337 commented May 31, 2024

Some-random commented Jun 5, 2024

DarkLight1337 commented Jun 5, 2024 • edited Loading

DarkLight1337 commented Jun 5, 2024 •

edited

Loading