You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is no fundamental reason why multi-LoRA cannot work with quantized models. We will most likely want to keep LoRA's unquantized and dequantize the base model output before applying LoRAs with punica kernels. That seems to be the pattern present in other projects too.
There is no fundamental reason why multi-LoRA cannot work with quantized models. We will most likely want to keep LoRA's unquantized and dequantize the base model output before applying LoRAs with punica kernels. That seems to be the pattern present in other projects too.
Originally posted by @Yard1 in #1804 (comment)
The text was updated successfully, but these errors were encountered: