[Frontend][OpenAI] Add support for OpenAI tools calling #4656

Xwdit · 2024-05-07T15:07:52Z

This PR updated from #3237 , supported latest version of vLLM (v0.4.2), added more flexibility and fixed some issues.

vllm/entrypoints/openai/cli_args.py

vllm/entrypoints/openai/api_server.py

AaronFriel · 2024-05-08T05:24:40Z

@Xwdit thanks for taking my feedback!

K-Mistele · 2024-05-10T19:19:52Z

examples/openai_tools_calls.py

+    if stream:
+        text_message = ""
+        for chunk in response:
+            if chunk.choices[0].finish_reason is not None:
+                if chunk.choices[0].finish_reason == "tool_calls":
+                    tool_calls += chunk.choices[0].delta.tool_calls
+                    # print("TEST : %s" % chunk.choices[0].delta.tool_calls)
+                break
+            if chunk.choices[0].delta.content is not None:
+                text_message += chunk.choices[0].delta.content


For what it's worth, probably the better way to handle this is to handle the response stream one chunk or token at a time. If you get a token indicating a tool call (such as <tool_call>) at the start of the response then you want to wait for the entire response from the LLM so that you can invoke the tool. if you get a non-meta or non-control token (e.g. a normal streaming text chat response) then you probably want to start showing the streaming tokens to the user immediately, avoiding the latency of the entire response. But, This is also an example so I'm aware it's not necessary for it to be optimized.

K-Mistele · 2024-05-10T19:41:41Z

I have a couple ideas to add, that I can take a crack at this weekend if you would be open to it!

It might be be worthwhile to allow using jinja templating for defining the tool call system prompt to simplify and allow more flexibility and customization
It might be worthwhile to allow a user to specify somehow if they want guided generation ("outlines") to be used for tool calls. Some models (e.g. the Hermes 2 Pro Models by Nous Research, which this PR seems to default to in terms of the default templates) don't require guided generation for tool output, and guided generation can introduce overhead if the output choices (tool calls) are complex JSON objects. Notes on this from the vLLM discord here and here
The Hermes 2 models also use system prompts for the tool calls - this PR does not allow you to request/enforce that, but simple adds the tool call to the first message regardless of the role. it is probably a good idea to check if there is a system-role message and if so add the tool call stuff to it, and if not, then add the tool call stuff as a system-role message

K-Mistele · 2024-05-10T19:59:43Z

vllm/entrypoints/openai/tools.py

+            elif isinstance(request.messages,
+                            List) and len(request.messages) >= 1:
+                request.messages[0].content = request.messages[
+                    0].content.strip() + "\n\n" + text_inject


This just adds the system prompt to the first message. In almost every case tool calls should be a system prompt message (except for chat formats that don't do system prompts, but that's a marginal amount), so instead of checking for the first message and just adding the tool call prompt, it might be better to do something like this:

if isinstance(request.messages, str): # suggested change: for string prompts, put the tool usage before the prompt request.messages = text_inject + '\n\n' + request.messages.strip() elif isinstance(request.messages, List) and len(request.messages) >= 1: # suggested change: check to see if the first message is a system message. If so, edit it. # otherwise, add a system prompt if request.messages[0].role == 'system': request.messages[0].content = request.messages[0].content.strip() + '\n\n' + text_inject else: request.messages = [{'role': 'system', 'content': text_inject}] + request.messages

Xwdit · 2024-05-11T12:43:44Z

I have a couple ideas to add, that I can take a crack at this weekend if you would be open to it!

It might be be worthwhile to allow using jinja templating for defining the tool call system prompt to simplify and allow more flexibility and customization

It might be worthwhile to allow a user to specify somehow if they want guided generation ("outlines") to be used for tool calls. Some models (e.g. the Hermes 2 Pro Models by Nous Research, which this PR seems to default to in terms of the default templates) don't require guided generation for tool output, and guided generation can introduce overhead if the output choices (tool calls) are complex JSON objects. Notes on this from the vLLM discord here and here

The Hermes 2 models also use system prompts for the tool calls - this PR does not allow you to request/enforce that, but simple adds the tool call to the first message regardless of the role. it is probably a good idea to check if there is a system-role message and if so add the tool call stuff to it, and if not, then add the tool call stuff as a system-role message

Hello, sorry I just saw this message; thank you very much for being willing to contribute to this PR and VLLM. If you are willing, I would greatly appreciate it

K-Mistele · 2024-05-12T22:49:11Z

What is the preferred way to contribute to a PR? Should I create a new one with all the changes in this one and additions, or create a PR into your branch on your fork?

Xwdit · 2024-05-13T00:25:45Z

What is the preferred way to contribute to a PR? Should I create a new one with all the changes in this one and additions, or create a PR into your branch on your fork?

It's fine with me either way, do as you prefer

Xwdit · 2024-05-13T01:55:48Z

What is the preferred way to contribute to a PR? Should I create a new one with all the changes in this one and additions, or create a PR into your branch on your fork?

I have rebased the PR to the latest commit on vllm, don't forget to sync before you start working ;)

K-Mistele · 2024-05-14T22:46:08Z

I have been thinking about this a lot and I have a few more thoughts:

1. Ensuring that tools are in a system prompt

I think that this would be a good change to make. I added it in the branch that I'm making, and will PR it into this branch on your fork @Xwdit - then if/when it's merged, it will be reflected in this PR

2. Guided Generation / Outlines

After digging through the code some more, I realized that guided output is only forced if you use tool_choice, and isn't forced otherwise if you set it to auto - so my performance concerns shouldn't be an issue.

3. Tool Choice System Prompt Formatting

I was thinking more about how to handle tool usage and tool calling, and I think that it's probably just easiest and best to let people provide a jinja template for the tool usage system prompt, with a tools parameter that receives the list of tools specified in the OpenAI API - their template can specify how to process it.

Familiarity

Users are in many cases familiar with Jinja already, since it is commonly used for chat message formatting including in this framework. (for OpenAI API-compatible chat completions, see the --chat-template TEMPLATE CLI argument

Additionally, many models in hugging face specify a chat prompt jinja template in their tokenizer_config.json - if tool-using open models proliferate, we might expect that they may do the same for tool usage, in which case it would be easy to support in vLLM with minimal refactoring.

Flexibility & Future-Proof

Allowing vLLM users to specify their own Jinja template for the tool usage system prompt, similar to allowing them to specify a Jinja template for the chat prompt formatting, is flexible and allows for a wide variety of models and templates.

Users would not be locked in to the assumptions that we currently make about the tool usage system prompt instruction that are apparent in the structure of thetool_params argument that we currently use. It would be easy to add support for new open function-calling models that use a radically different format for the tool calling system prompt instruction by adding a new jinja template, instead of having to refactor core code.

Version Control

Using Jinja template for function calling system prompt formatting makes it easier for people to contribute additional jinja templates for their favorite models & have those be tracked in version control for easy usage, similar to how popular chat model templates are tracked as jinja templates (e.g. examples/template_chatml.jinja)

Please let me know what you think @Xwdit and if you agree or disagree. I haven't finished the implementation yet since it won't be a small amount of work, and I don't want to waste effort on it if you feel like jinja templates for tool usage system prompts isn't the right path forward.

K-Mistele · 2024-05-14T22:46:39Z

If you agree that it's a good idea, I will be happy to take a crack at doing the jinja implementation myself :)

Co-Authored-By: FlorianJoncour <148003496+florianjoncour@users.noreply.github.com>

Xwdit · 2024-05-17T09:44:34Z

If you agree that it's a good idea, I will be happy to take a crack at doing the jinja implementation myself :)

Sorry for any late response 😢; All these ideas look great. Thank you very much for taking the time to work on this, I have no problem with them😉

MaximillionMai · 2024-05-29T15:26:06Z

Any updates on this PR?

K-Mistele · 2024-05-29T16:11:40Z

work in progress still; I will try to knock this out in the next few days - got busy with some life stuff.

K-Mistele · 2024-06-18T18:36:22Z

WIP on implementing the changes in #5649

K-Mistele · 2024-09-25T02:24:58Z

hi @Xwdit this is closed by #5649 I think, which has now been merged :)

AaronFriel reviewed May 7, 2024

View reviewed changes

vllm/entrypoints/openai/cli_args.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

Xwdit force-pushed the tool_squashed branch 2 times, most recently from efbcf5f to eb9bc07 Compare May 8, 2024 23:38

K-Mistele reviewed May 10, 2024

View reviewed changes

Xwdit force-pushed the tool_squashed branch 3 times, most recently from 1a5acc4 to 664642a Compare May 13, 2024 01:51

Xwdit force-pushed the tool_squashed branch 2 times, most recently from 3a2e8ef to 24e4c51 Compare May 14, 2024 15:17

Tool Call Updated

a4df925

Co-Authored-By: FlorianJoncour <148003496+florianjoncour@users.noreply.github.com>

Xwdit force-pushed the tool_squashed branch from 24e4c51 to a4df925 Compare May 15, 2024 03:28

br3no mentioned this pull request May 15, 2024

[Misc] Logits processor plugins #4769

Open

br3no mentioned this pull request May 22, 2024

Support tools and tool_choice parameter in OpenAI compatible service #1869

Closed

K-Mistele mentioned this pull request May 29, 2024

[FRONTEND] OpenAI tools support named functions #5032

Merged

chenhaoqiang mentioned this pull request Jun 7, 2024

能否将openai_api_server.py升级到gpt4的tools调用相兼容的api THUDM/GLM-4#64

Closed

2 tasks

This was referenced Jun 18, 2024

OpenAI Tools / function calling v2 #3237

Closed

[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models #5649

Merged

DarkLight1337 closed this Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][OpenAI] Add support for OpenAI tools calling #4656

[Frontend][OpenAI] Add support for OpenAI tools calling #4656

Xwdit commented May 7, 2024

AaronFriel commented May 8, 2024

K-Mistele May 10, 2024

K-Mistele commented May 10, 2024

K-Mistele May 10, 2024

Xwdit commented May 11, 2024

K-Mistele commented May 12, 2024

Xwdit commented May 13, 2024

Xwdit commented May 13, 2024

K-Mistele commented May 14, 2024

K-Mistele commented May 14, 2024

Xwdit commented May 17, 2024

MaximillionMai commented May 29, 2024

K-Mistele commented May 29, 2024

K-Mistele commented Jun 18, 2024

K-Mistele commented Sep 25, 2024 •

edited

Loading

[Frontend][OpenAI] Add support for OpenAI tools calling #4656

[Frontend][OpenAI] Add support for OpenAI tools calling #4656

Conversation

Xwdit commented May 7, 2024

AaronFriel commented May 8, 2024

K-Mistele May 10, 2024

Choose a reason for hiding this comment

K-Mistele commented May 10, 2024

K-Mistele May 10, 2024

Choose a reason for hiding this comment

Xwdit commented May 11, 2024

K-Mistele commented May 12, 2024

Xwdit commented May 13, 2024

Xwdit commented May 13, 2024

K-Mistele commented May 14, 2024

1. Ensuring that tools are in a system prompt

2. Guided Generation / Outlines

3. Tool Choice System Prompt Formatting

Familiarity

Flexibility & Future-Proof

Version Control

K-Mistele commented May 14, 2024

Xwdit commented May 17, 2024

MaximillionMai commented May 29, 2024

K-Mistele commented May 29, 2024

K-Mistele commented Jun 18, 2024

K-Mistele commented Sep 25, 2024 • edited Loading

K-Mistele commented Sep 25, 2024 •

edited

Loading