-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI refactoring #2360
OpenAI refactoring #2360
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor points
return StreamingResponse(fake_stream_generator(), | ||
generator = await openai_serving_completion.create_completion( | ||
request, raw_request) | ||
logger.info("TYPE COMPLETION : %s" % str(type(generator))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info("TYPE COMPLETION : %s" % str(type(generator))) |
engine_model_config.tokenizer, | ||
tokenizer_mode=engine_model_config.tokenizer_mode, | ||
trust_remote_code=engine_model_config.trust_remote_code) | ||
self._load_chat_template(self.chat_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat template is the responsibility of ChatCompletion only
will fix and merge this once #2355 is in. |
Fine. I still made the changes ^^ |
@FlorianJoncour, merged! Thank you for the contribution, looking forward to the tool calling PR! |
@FlorianJoncour Is there a new pr of function_call? |
I work on it, it shouldn't be too long |
introduced with vllm-project#2360 was not here before
This is a reset of #2210.
The final goal is to implement function calls using the OpenAI API.
But since it was likely too much all at once, we will do it in two parts.
This pull request is only a refactoring/relocation of code to separate the Uvicorn server, the chat, and the completions.
The chat and completions are now in separate classes.
The goal is to make the entire codebase clearer and more easily modifiable in the future, as the completion should now be considered legacy.
The chat part has been divided into several methods, while the completion remained largely unchanged except for being encapsulated within a class.
Tested chat and completions with and without stream mode.