Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI Tools / function calling v2 #3237

Closed
wants to merge 11 commits into from

Conversation

FlorianJoncour
Copy link
Contributor

This PR follows #2488

The implementation has been updated to use the new guided generation.

If during a query, the user sets tool_choice to auto, the server will use the template system used in #2488.
However, if a specific function is defined, guided generation will be used to only generate the parameters.

Everything is detailed in the openai_tools_calls.py example.

@mn9891
Copy link

mn9891 commented Mar 7, 2024

Testing the tools/function calls feature using the example provided (in openai_tools_calls.py) it works fine as is, but when you go further in the chat and ask another question that needs a function calling, it never goes to tools but the assistant would reply 'call_get_current_weather_0 was called with arguments .... ' but not using tool_calls.
See the gist here
Tested it in other setups/functions, and it is always the case, the second function call would fail (tool_calls=None) and the assistant would reply 'call_function_name_0 called with ... '

@simon-mo
Copy link
Collaborator

simon-mo commented Mar 8, 2024

Can we go further on reducing the templating to purely JSON schema? I believe it is possible by framing it as

{
"tool_choice": one of the tool name
"tool_params": constrained schema
}

@Uhao-P
Copy link

Uhao-P commented Mar 13, 2024

thank you for your work.
I successfully installed vllm using your branch. When I start openai.api_server and call it, the error AttributeError: 'ActorHandle' object has no attribute 'tokenizer' occurs, #3301 solves the above error.
After openai.api_server is started, you can use the /v1/chat/completions interface to call services and function calling functions normally. However, when I call the /v1/completions interface, an error will appear. AttributeError: 'CompletionRequest' object has no attribute 'tool_choice'
Modify vllm/entrypoints/openai/protocol.py. Adding tools and tool_choice in class CompletionRequest to solve this problem.

@FlorianJoncour
Copy link
Contributor Author

FlorianJoncour commented Mar 16, 2024

@simon-mo : I so removed everything about the jinja template.
I think that was not such a great idea on my side.
The template can now be passed as an argument in requests (tool_params).
But if this param is None, a generic template defined in protocol.py is used.
See the dict TOOLS_TEMPLATE in the sample script openai_tools_calls.py.

@mn9891 : I suspect a bad model for this use case. I started my development with NeuralHermes-7B, but there's a recent model based on Mistral 7B developed by NousResearch that has been trained to call functions which is really great:
https://huggingface.co/FoxEngineAi/Hermes-2-Pro-Mistral-7B_AWQ-GEMM_4_128_True

@Uhao-P : CompletionRequest is not supposed to call functions.
Maybe there are some missing type checking, it'll check this soon.

FlorianJoncour added 2 commits March 16, 2024 16:01
@RonanKMcGovern
Copy link
Contributor

@FlorianJoncour could you share a sample of a fully formatted prompt containing tools? Say, for a Mistral model?

@FlorianJoncour
Copy link
Contributor Author

FlorianJoncour commented Mar 18, 2024

Thank you, there's an edge case I hadn't considered. It's possible that the model generates a list of function calls in JSON rather than making the calls one after the other.

Now, I was able to make the example script work with Mistral-7B-Instruct-v0.2 despite the fact that it's not trained for this.

Edit: I forgot something. The default chat template in the Mistral-7B model enforces an alternating user/assistant text pattern. Since there can be consecutive function calls, the template will raise an error.
Here is a working chat template:
{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ '\n[INST] ' + message['content'] + ' [/INST]' }}{% else %}{{ '\n' + message['content'] + eos_token}}{% endif %}{% endfor %}

@Uhao-P
Copy link

Uhao-P commented Mar 24, 2024

When I use a Python script to make a function call, it succeeds regardless of whether the stream is set to true or false. However, when I integrate with the frontend, and the stream is set to true, the frontend doesn't parse the received results well, and the arguments are always truncated. Comparing ChatGPT with VLLM, ChatGPT returns the arguments in a streaming fashion, whereas VLLM returns a complete set of arguments.
I have successfully simulated ChatGPT's streaming response on the web, which can be parsed successfully. Is there any plan to modify VLLM's streaming calls to match the format of ChatGPT? Can I contribute by submitting a pull request to help accomplish this?

@jamestwhedbee
Copy link
Contributor

I am eagerly awaiting this too. Is there any area where contributions would be welcomed to help merge this?

@simon-mo simon-mo mentioned this pull request Apr 4, 2024
65 tasks
@Aakash-kaushik
Copy link

Aakash-kaushik commented Apr 9, 2024

hi, is this PR being worked on?
Pretty excited for this to merge, ready to extend any help if needed.

@esmeetu esmeetu mentioned this pull request Apr 10, 2024
2 tasks
@weiminw
Copy link

weiminw commented Apr 13, 2024

hope this feature release as soon as possible.

@RonanKMcGovern
Copy link
Contributor

@FlorianJoncour could you share a sample of a fully formatted prompt containing tools? Say, for a Mistral model?

Is it possible to share a fully formatted prompt sample?

That helps a lot anyone fine tuning models...

Copy link

@AaronFriel AaronFriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FlorianJoncour left some comments, but at a more meta level than any individual line, could you describe how this PR enables responding with multiple function calls (like OpenAI's API)?

OpenAI's API has a weakness where it behaves poorly in streaming mode when it uses the "parallel.multi-tool-use" function as a wrapper, which breaks streaming behavior. I want to make sure as a consumer of the tool API there, that:

  1. This implementation doesn't duplicate that behavior or breaking streaming.
  2. I understood what the response chunks look like when the model makes multiple calls - I didn't see that exactly, but I did just skim the PR.

Could you point me to what multi calling looks like?

@@ -125,6 +133,12 @@ def parse_args():
type=str,
default=None,
help="The file path to the SSL cert file")
parser.add_argument(
"--privileged",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this engine arg is not sufficiently descriptive. To me, "privileged" evokes a notion of requiring root or extra capabilities (a la docker and containers, or user account control in Windows, or admin elevation in macOS.)

This is probably more aptly described as --enable-debug-reload-api?

@@ -163,6 +206,16 @@ async def health() -> Response:
return Response(status_code=200)


if "--privileged" in sys.argv:

@app.get("/privileged")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, I don't think /privileged describes what this request route does.

And should this be a POST, not a GET, as it is an effectful operation?

logger.warning(
"\n"
"##########################################################################\n"
"privileged mode enabled. This should only be used for development purpose.\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well - "privileged" just doesn't describe to me what is happening.

@darkacorn
Copy link

darkacorn commented Jun 18, 2024

is the holdup the naming convention or what am i missing ?

@K-Mistele
Copy link
Contributor

is the holdup the naming convention or what am i missing ?

I think it's more that there are other PRs open on this (e.g. #4656 ) but all of them have shortcomings and are pretty opinionated in one way or another. Tool choice for non-auto tool calling is now supported via guided decoding, but "auto" tool choice is a lot harder to get right because different models use different tool choice prompt templates, use different tokens for indicating tool calls, etc. all of which have to be parsed based on the model-specific format and (ideally) streamed back to the client in an OpenAI API-compatible way, which none of the current PRs fully support

@darkacorn
Copy link

https://github.com/mistralai/mistral-common/tree/main/src/mistral_common/protocol/instruct

should be a decent starting ground / if noone can agree

@K-Mistele
Copy link
Contributor

https://github.com/mistralai/mistral-common/tree/main/src/mistral_common/protocol/instruct

should be a decent starting ground / if noone can agree

I'm trying to work on a PR for an implementation that's less opinionated and would work with Mistral 7B Instruct v0.3, as well as the Hermes 2 Pro models by Nous Research & other tool-calling-capable open models in #5649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.