Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Reduce supported Punica dtypes #4304

Merged
merged 7 commits into from
Apr 24, 2024
Merged

[Misc] Reduce supported Punica dtypes #4304

merged 7 commits into from
Apr 24, 2024

Conversation

WoosukKwon
Copy link
Collaborator

This PR reduces the supported dtype combinations in Punica to reduce the binary size.

@youkaichao
Copy link
Member

maybe we can add some file size check in docker file, since we already have the wheel inside the docker image.

@simon-mo
Copy link
Collaborator

simon-mo commented Apr 23, 2024

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like facebook/opt-125m is not the path to a directory containing a file named config.json.

Huggingface down again :( I guess we can't release today...

@simon-mo simon-mo mentioned this pull request Apr 23, 2024
9 tasks
@Yard1
Copy link
Collaborator

Yard1 commented Apr 23, 2024

I am pretty sure we need the fp16/bf16->fp32 and fp32->fp16/bf16 ones, as fp32 is used in the intermediate buffer between shrink and expand calls.

@WoosukKwon
Copy link
Collaborator Author

WoosukKwon commented Apr 23, 2024

BTW, this PR is not ready yet. I need to check if I pruned the right dtype combinations. Now it seems it passes the Punica test.

@WoosukKwon
Copy link
Collaborator Author

@Yard1 Yes, I just figured it out and fixed it. Could you please take another look?

@WoosukKwon WoosukKwon requested a review from Yard1 April 23, 2024 20:39
@WoosukKwon WoosukKwon changed the title [WIP][Misc] Reduce supported Punica dtypes [Misc] Reduce supported Punica dtypes Apr 23, 2024
Copy link
Collaborator

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, +1 on measuring and clearly auditing what is contributing to the size of the wheel

@Yard1
Copy link
Collaborator

Yard1 commented Apr 23, 2024

I think we may be using fp32 for testing

@simon-mo simon-mo added the release-blocker This PR/issue blocks the next release, therefore deserves highest priority label Apr 23, 2024
@simon-mo simon-mo merged commit 468d761 into main Apr 24, 2024
47 checks passed
@WoosukKwon WoosukKwon deleted the reduce-punica branch April 24, 2024 06:44
xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 25, 2024
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024
alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker This PR/issue blocks the next release, therefore deserves highest priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants