Skip to content

.Net: Bug: Custom PromptExecutionSettings with "guided_regex" property for vLLM OpenAI Api Server is not working #11961

Open
@dprokhorov17

Description

@dprokhorov17

Describe the bug
It seems that a custom property is not being serialized into the final http request for the vllm openai api server

Here is my code (snippet) so far:

string expectedResultRegex =
            """```json\s*\{\s*"expectation_met":\s*(true|false),\s*"bbox_2d":\s*\[(\d+),\s*(\d+),\s*(\d+),\s*(\d+)\]\s*\}\s*```""";

        byte[] imageBytes = File.ReadAllBytes(imagePath);

        var qwenLocalHttpClient = new HttpClient();
        qwenLocalHttpClient.BaseAddress = new Uri("http://localhost:5011/v1");

IChatCompletionService chatCompletionService = new OpenAIChatCompletionService(
            modelId: "Qwen/Qwen2.5-VL-32B-Instruct",
            apiKey: "EMPTY",
            httpClient: qwenLocalHttpClient
        );

        var prompt = """
                     Determine if:
                     1. The UI state matches the user's expectation
                     2. Identify the relevant UI element
                      
                     Return in JSON format:
                     ```json
                     {
                         "expectation_met": true/false,
                         "bbox_2d": [x1, y1, x2, y2]
                     }
                     ```
                      
                     If element not visible or expectation can't be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!
                     """;

        var chatHistory = new ChatHistory();
        chatHistory.AddUserMessage([
            new TextContent(prompt),
            new ImageContent(imageBytes, "image/png")
        ]);


        var reply = await chatCompletionService.GetChatMessageContentAsync(chatHistory,
            new VllmCustomExecutionSettings
            {
                Temperature = 0,
                GuidedRegex = expectedResultRegex,
            });

        Console.WriteLine(reply);

and the custom class for prompt execution:

public sealed class VllmCustomExecutionSettings : OpenAIPromptExecutionSettings
    {
        [JsonPropertyName("guided_regex")]
        public string GuidedRegex { get; set; }
    }

On my docker container log I don't see the guided_regex decoding applied correctly:

Received request chatcmpl-e211e747fb98442b81a2e056a6b5911a: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nDetermine if:\r\n1. The UI state matches the user\'s expectation\r\n2. Identify the relevant UI element\r\n \r\nReturn in JSON format:\r\n```json\r\n{\r\n "expectation_met": true/false,\r\n "bbox_2d": [x1, y1, x2, y2]\r\n}\r\n```\r\n \r\nIf element not visible or expectation can\'t be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9874, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

This is how it should look (using openai's python package:

Received request chatcmpl-77f0db8eb863499f95b755d8e034b75c: prompt: "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\nDetermine if, he UI state matches the user's expectation\nReturn in JSON format with bbox_2d key.\n<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9954, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json=None, regex='```json\\s*\\{\\s*\\s*"expectation_met":\\s*(true|false),\\s*"bbox_2d":\\s*\\[(\\d+),\\s*(\\d+),\\s*(\\d+),\\s*(\\d+)\\]\\s*\\}\\s*```', choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO 05-08 02:51:12 [async_llm.py:228] Added request chatcmpl-77f0db8eb863499f95b755d8e034b75c.

Platform

  • Language: C#

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codebugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions