Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action to build wheels on ROCm 6.0 #421

Merged
merged 12 commits into from
Apr 27, 2024
Merged

Conversation

Orion-zhen
Copy link
Contributor

For AMD RX 7000 series GPU (e.g. 7900XTX with gfx1100), ROCm 5.6 and especially 5.7 are extremely unstable and can always lead to memory access fault in llm inferencing. A good way to solve this problem is to obtain pytorch 2.3.0+rocm6.0 and recompile all relative wheels.

It would be convenient if a wheel based on ROCm 6.0 or later could be released, instead of compiling locally. Thus I created this action script and opened this PR.

I have run the action script and tested the generated wheel on my own RX 7900XTX.

BTW, github is really stingy with hard disk space, I have been struggling to install all dependencies in such a limited hard disk : -(

@Orion-zhen Orion-zhen closed this Apr 21, 2024
@Orion-zhen Orion-zhen reopened this Apr 21, 2024
@Orion-zhen
Copy link
Contributor Author

Failed to pass tests with rocm aio build actions, delete the corresponding yml file.

@LeoYelton
Copy link

I think this merge is useful . I also build a wheel for 6.0 before like this one. 6.0 is stable, Time to update.

@Orion-zhen
Copy link
Contributor Author

I think this merge is useful . I also build a wheel for 6.0 before like this one. 6.0 is stable, Time to update.

Thank you for your comment, I have tested the stable version.

@turboderp
Copy link
Owner

This looks good. Is there a reason for building on Ubuntu 22.04, though? I made that change before and had to revert it because a lot of server instances are still on 20.04 and the wheels won't be backwards compatible.

@Orion-zhen
Copy link
Contributor Author

This looks good. Is there a reason for building on Ubuntu 22.04, though? I made that change before and had to revert it because a lot of server instances are still on 20.04 and the wheels won't be backwards compatible.

Well, the reason is simply that I like newer ones. In consideration of compatibility, it should be on 20.04. Thank you for your remind.

@turboderp
Copy link
Owner

Thank you.

I've added it to the matrix and done some tests. It seems like it's building correctly now with Ubuntu 20.04 and Torch 2.3.0. If v0.0.20 also builds for the CUDA wheels there should be +rocm6.0 wheels in the release as well.

Lots of stuff breaking in Torch 2.3.0 sadly, and they dropped ROCm 5.6 support. It's all kind of a mess and the build actions reflect that so they need to be tidied up a lot, I think.

@Orion-zhen
Copy link
Contributor Author

ROCm 5.6 is kind of old now. Would it be possible to seperate different pytorch versions? e.g. exllamav2+rocm5.6+torch2.2, exllamav2+rocm6.0+torch2.3

@turboderp
Copy link
Owner

Yes, it'll build with Torch 2.2 for ROCm 5.6. As soon as I can get this stuff to work. Very close now.

@turboderp
Copy link
Owner

Should be done now. Tested the wheel with Torch 2.3.0 and ROCm 6.0 on the latest Manjaro. It's integrated in the existing workflow but I'll merge this anyway for completeness, then rework the all the actions for the next release. Thanks for the input. :)

@turboderp turboderp merged commit 84be945 into turboderp:master Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants