Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] fix crash if max_tokens=None #2570

Merged
merged 4 commits into from
Jan 24, 2024

Conversation

NikolaBorisov
Copy link
Contributor

Passing max_tokens=None is allowed in the API,
but after recent changes it started crashing.

Passing max_tokens=None is allowed in the API,
but after recent changes it started crashing.
@NikolaBorisov
Copy link
Contributor Author

Here is the exception without this patch

  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/workspace/vllm/entrypoints/openai/api_server.py", line 143, in create_completion
    generator = await openai_serving_completion.create_completion(
  File "/workspace/vllm/entrypoints/openai/serving_completion.py", line 237, in create_completion
    sampling_params = request.to_sampling_params()
  File "/workspace/vllm/entrypoints/openai/protocol.py", line 134, in to_sampling_params
    return SamplingParams(
  File "/workspace/vllm/sampling_params.py", line 148, in __init__
    self._verify_args()
  File "/workspace/vllm/sampling_params.py", line 186, in _verify_args
    if self.max_tokens < 1:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

@NikolaBorisov NikolaBorisov changed the title fix crash if max_tokens=None [Bugfix] fix crash if max_tokens=None Jan 24, 2024
@zhuohan123
Copy link
Collaborator

If possible, can you add a small test in test_regression so that we can make sure this is correct in the future?

@NikolaBorisov
Copy link
Contributor Author

If possible, can you add a small test in test_regression so that we can make sure this is correct in the future?

Just did. I also added a new test_sampler_parameters file, where I think this makes more sense. Let me know if you want to keep both or remove one. I don't know how to add this test to the buildkite...

Copy link
Collaborator

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@zhuohan123 zhuohan123 merged commit 3209b49 into vllm-project:main Jan 24, 2024
16 checks passed
NikolaBorisov added a commit to deepinfra/vllm that referenced this pull request Jan 31, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants