Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI refactoring #2360

Merged
merged 15 commits into from
Jan 17, 2024
Merged

OpenAI refactoring #2360

merged 15 commits into from
Jan 17, 2024

Conversation

FlorianJoncour
Copy link
Contributor

This is a reset of #2210.

The final goal is to implement function calls using the OpenAI API.
But since it was likely too much all at once, we will do it in two parts.

This pull request is only a refactoring/relocation of code to separate the Uvicorn server, the chat, and the completions.
The chat and completions are now in separate classes.
The goal is to make the entire codebase clearer and more easily modifiable in the future, as the completion should now be considered legacy.

The chat part has been divided into several methods, while the completion remained largely unchanged except for being encapsulated within a class.

Tested chat and completions with and without stream mode.

@simon-mo simon-mo self-assigned this Jan 8, 2024
Copy link
Contributor

@NikolaBorisov NikolaBorisov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks ok.

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor points

return StreamingResponse(fake_stream_generator(),
generator = await openai_serving_completion.create_completion(
request, raw_request)
logger.info("TYPE COMPLETION : %s" % str(type(generator)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("TYPE COMPLETION : %s" % str(type(generator)))

engine_model_config.tokenizer,
tokenizer_mode=engine_model_config.tokenizer_mode,
trust_remote_code=engine_model_config.trust_remote_code)
self._load_chat_template(self.chat_template)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chat template is the responsibility of ChatCompletion only

@simon-mo
Copy link
Collaborator

will fix and merge this once #2355 is in.

@FlorianJoncour
Copy link
Contributor Author

Fine.

I still made the changes ^^

@viktor-ferenczi viktor-ferenczi mentioned this pull request Jan 13, 2024
11 tasks
@simon-mo simon-mo merged commit 14cc317 into vllm-project:main Jan 17, 2024
15 checks passed
@simon-mo
Copy link
Collaborator

@FlorianJoncour, merged! Thank you for the contribution, looking forward to the tool calling PR!

@jessiewiswjc
Copy link

@FlorianJoncour Is there a new pr of function_call?

@FlorianJoncour
Copy link
Contributor Author

I work on it, it shouldn't be too long

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
joennlae added a commit to joennlae/vllm that referenced this pull request Jan 21, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants