Skip to content

Conversation

nicobasile
Copy link
Contributor

Hi, I noticed that the create_chat_completion() function in the openai entrypoint utilizes the check_length(prompt) function, which verifies that the prompt token length doesn't exceed the model's max token length. If it does exceed the max length, it returns an HTTP error explaining so.

However this check_length(prompt) function isn't present in the create_completion() function, so a request to this endpoint which exceeded the max token length wouldn't get a relevant error message.

I added check_length(prompt) to create_completion(), and also modified the function to return the (already computed) token_ids, so that we can re-use them later for the engine.generate(), thus reducing the amount of duplicate token encoding we are doing - in theory this will result in a minor efficiency gain, but I haven't done any scientific tests to prove so.

Thanks!

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@zhuohan123 zhuohan123 merged commit 66c54aa into vllm-project:main Aug 9, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants