-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fake and true quantization don't match #7
Comments
Thanks for pointing out. This is because Moreover, our method is not constrained to a specific quantization method. Either self-implemented or the one in |
Thank you for this clarification. I understand your method is not limited to any quantization function. However, you still use the bitsandbytes as a backend for memory-efficient fine-tuning. If you use custom quantization (like self-implemented nf4 quantization), doesn't it introduce some mismatch because of the different quantization functions between fine-tuning and custom LoRA initialization? Said you can obtain a perfect LoRA initialization as W = Q + AB, where Q = self_implemented_nf4(W). When you use bitsandbytes to fine-tune, Q_new = bitsandbytes_nf4(Q), results in W is not equal to Q_new + AB. |
In addition, may I ask what the default T is for llama? |
|
Hi,
As a debugging way, I want to check whether the fake and true quantized model's weights have the same value. Here is how I implement it:
Then I print out some weight values as:
print(loftq_fp16.state_dict()['model.layers.0.self_attn.q_proj.weight'])
The output is:
For loftq_fp4, I do it in this way:
The output is:
We can see they are quite different, which means the fake quantization doesn't truly reflect the true quantization performance.
The text was updated successfully, but these errors were encountered: