-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will this support 4bit quantisation? #2
Comments
Most probably yes, you can merge the LoRA weights into the model and quantize that |
Training in 4-bit or just 4-bit inference? Training in 4-bit would be a game-changer. |
I think technically we can do both, with origin model weight in int4+fp16/fp32 and lora weight of fp32. The dtype of output from QuantLinear layer can be exactly the dtype of its input, so inserting LoRA layer after every QuantLinear layer won't be very difficult. Made an adapter for peft to support QuantLinear,
The only thing left to do is to add support for gradient backward on 4bit matmul. |
|
Awesome project. Thanks so much for creating a foss alpaca codebase.
The text was updated successfully, but these errors were encountered: