Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine multi-LoRA and quantization #2601

Open
Yard1 opened this issue Jan 25, 2024 · 3 comments
Open

Combine multi-LoRA and quantization #2601

Yard1 opened this issue Jan 25, 2024 · 3 comments

Comments

@Yard1
Copy link
Collaborator

Yard1 commented Jan 25, 2024

There is no fundamental reason why multi-LoRA cannot work with quantized models. We will most likely want to keep LoRA's unquantized and dequantize the base model output before applying LoRAs with punica kernels. That seems to be the pattern present in other projects too.

Originally posted by @Yard1 in #1804 (comment)

@jacob-hansen
Copy link

Has there been any progress with this? Or has anyone tested multi-LoRA with different quantization to see what might work?

@thincal
Copy link

thincal commented Mar 23, 2024

@Yard1 is there any plan for this support? really depends on this wonderful feature and also needs to know the real effect. thanks.

@whyiug
Copy link
Contributor

whyiug commented Apr 28, 2024

@thincal #4012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants