-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF16 Support #17670
base: main
Are you sure you want to change the base?
BF16 Support #17670
Conversation
Hi @joshua-j-hong, thank you so much for the great contributions! Could you please add some PR descriptions so that CI tests can be triggered? |
@tvm-bot rerun |
Failed to re-run CI in https://github.com/apache/tvm/actions/runs/13505622336
with response
|
@tvm-bot rerun |
e2ee869
to
5491f0c
Compare
…to jjhong_bf16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for rehabilitating BF16!
@joshua-j-hong Just want to check these two points. Have we confirmed that they are in good shape now? We are good to merge if these are confirmed. |
I've added the changes for quantization in MLC LLM (will put up the corresponding PR soon) and am currently testing with the original model, but am running into some local build issues. Hoping to resolve these tomorrow to close out this ticket! |
Adds general BF16 support to TVM
comm_reducer
comm_reducer
changes as well as legalization skippingT.bfloat16
in the test fileRelated PR in MLC-LLM adds BF16 support with quantization mlc-ai/mlc-llm#3158
Tested with the original problematic model Gemma 2 27b with both added quantization configurations
q4bf16_0
andq4bf16_1