-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Reduce supported Punica dtypes #4304
Conversation
maybe we can add some file size check in docker file, since we already have the wheel inside the docker image. |
Huggingface down again :( I guess we can't release today... |
I am pretty sure we need the fp16/bf16->fp32 and fp32->fp16/bf16 ones, as fp32 is used in the intermediate buffer between shrink and expand calls. |
|
@Yard1 Yes, I just figured it out and fixed it. Could you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, +1 on measuring and clearly auditing what is contributing to the size of the wheel
I think we may be using fp32 for testing |
This PR reduces the supported dtype combinations in Punica to reduce the binary size.