Description
Describe the bug
It seems that a custom property is not being serialized into the final http request for the vllm openai api server
Here is my code (snippet) so far:
string expectedResultRegex =
"""```json\s*\{\s*"expectation_met":\s*(true|false),\s*"bbox_2d":\s*\[(\d+),\s*(\d+),\s*(\d+),\s*(\d+)\]\s*\}\s*```""";
byte[] imageBytes = File.ReadAllBytes(imagePath);
var qwenLocalHttpClient = new HttpClient();
qwenLocalHttpClient.BaseAddress = new Uri("http://localhost:5011/v1");
IChatCompletionService chatCompletionService = new OpenAIChatCompletionService(
modelId: "Qwen/Qwen2.5-VL-32B-Instruct",
apiKey: "EMPTY",
httpClient: qwenLocalHttpClient
);
var prompt = """
Determine if:
1. The UI state matches the user's expectation
2. Identify the relevant UI element
Return in JSON format:
```json
{
"expectation_met": true/false,
"bbox_2d": [x1, y1, x2, y2]
}
```
If element not visible or expectation can't be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!
""";
var chatHistory = new ChatHistory();
chatHistory.AddUserMessage([
new TextContent(prompt),
new ImageContent(imageBytes, "image/png")
]);
var reply = await chatCompletionService.GetChatMessageContentAsync(chatHistory,
new VllmCustomExecutionSettings
{
Temperature = 0,
GuidedRegex = expectedResultRegex,
});
Console.WriteLine(reply);
and the custom class for prompt execution:
public sealed class VllmCustomExecutionSettings : OpenAIPromptExecutionSettings
{
[JsonPropertyName("guided_regex")]
public string GuidedRegex { get; set; }
}
On my docker container log I don't see the guided_regex decoding applied correctly:
Received request chatcmpl-e211e747fb98442b81a2e056a6b5911a: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nDetermine if:\r\n1. The UI state matches the user\'s expectation\r\n2. Identify the relevant UI element\r\n \r\nReturn in JSON format:\r\n```json\r\n{\r\n "expectation_met": true/false,\r\n "bbox_2d": [x1, y1, x2, y2]\r\n}\r\n```\r\n \r\nIf element not visible or expectation can\'t be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9874, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
This is how it should look (using openai's python package:
Received request chatcmpl-77f0db8eb863499f95b755d8e034b75c: prompt: "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\nDetermine if, he UI state matches the user's expectation\nReturn in JSON format with bbox_2d key.\n<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9954, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json=None, regex='```json\\s*\\{\\s*\\s*"expectation_met":\\s*(true|false),\\s*"bbox_2d":\\s*\\[(\\d+),\\s*(\\d+),\\s*(\\d+),\\s*(\\d+)\\]\\s*\\}\\s*```', choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO 05-08 02:51:12 [async_llm.py:228] Added request chatcmpl-77f0db8eb863499f95b755d8e034b75c.
Platform
- Language: C#