Open
Description
Search before asking
- I have searched the Multimodal Maestro issues and found no similar feature requests.
Description
I was going through the maestro repo and found out that both paligemma and florence models didn't support the implementation of 4-bit quantization (i.e. using QLoRA config).
Use case
Using QLoRA, we could easily fine-tune vision language models on even low end devices without losing on precision a lot. As the models grow, we would eventually need to implement QLoRA to make finetuning fast and possible on memory constraints.
Additional
I would like to learn your take on implementing quantization.
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!