Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop to float16 if bfloat16 is not supported #1901

Closed
wants to merge 5 commits into from

Conversation

acebot712
Copy link

@acebot712 acebot712 commented Dec 3, 2023

Issue:- #1157

Instead of throwing an error if the GPU compute capability is not supported for bfloat16, vLLM should throw a warning but use float16 instead. This helps in Colab notebooks where the compute of the T4 instance is set to 7.5 and not 8 so vLLM does not work trivially.

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Dec 3, 2023

I personally prefer to ask users to explicitly specify dtype in this case, since otherwise it can affect the accuracy of the model silently. WDYT @zhuohan123 @simon-mo?

@zhuohan123
Copy link
Collaborator

I personally prefer to ask users to explicitly specify dtype in this case, since otherwise it can affect the accuracy of the model silently. WDYT @zhuohan123 @simon-mo?

I think the fallback is fine if we explicitly print a warning?

@simon-mo
Copy link
Collaborator

simon-mo commented Dec 3, 2023

+1. In this case warning is fine. The accuracy difference between bfloat and float should not be too crazy.

@acebot712
Copy link
Author

I came across this issue while trying to use langchain's vllm support on Colab using a T4 GPU. In case we are not going through with this change do let me know alternatively the best way to go about in the following screenshot. Thanks a lot.
Screenshot 2023-12-03 at 12 52 19 PM

@WoosukKwon
Copy link
Collaborator

@acebot712 You can simply do this:

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.1", dtype="half")

@Yard1
Copy link
Collaborator

Yard1 commented Dec 3, 2023

I agree with @WoosukKwon, I think in general we should avoid doing any "magic" that can change the outputs, even if slightly. I would suggest to instead modify the exception message to suggest the user to set the dtype to float16 themselves.

@chenxu2048
Copy link
Contributor

I would suggest to instead modify the exception message to suggest the user to set the dtype to float16 themselves.

We can check device capability outside the vLLM and choose the dtype depending on device. The codes could be:

llm = LLM(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    dtype="float16" if torch.cuda.get_device_capability()[0] < 8 else "bfloat16",
)

@WoosukKwon
Copy link
Collaborator

Hi @acebot712 Thanks for bringing up this issue and submitting the PR! We've decided to keep the current behavior; To avoid silent accuracy changes, vLLM will ask users to set dtype="half" in such a case. Could you revert the change in this PR and add a warning that asks users to set dtype="half"? Thanks.

@WoosukKwon WoosukKwon closed this Dec 16, 2023
@chuanzhubin
Copy link
Contributor

As a beginner, I just experienced this little setback. Finally, of course, adding the parameter solved it: --dtype="half"
But the premise is that, clever as I am, I found this issue. For the sake of future beginners, it is suggested to provide guidance in the error message, suggesting users to add --dtype="half" to specify the data type. @WoosukKwon

@simon-mo
Copy link
Collaborator

simon-mo commented Jan 8, 2024

@chuanzhubin A better error message is indeed a great idea? Would you be interested in submitting a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants