Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are base weights on HF LoftQ models in 16-bit? #26

Open
RonanKMcGovern opened this issue Apr 17, 2024 · 2 comments
Open

Why are base weights on HF LoftQ models in 16-bit? #26

RonanKMcGovern opened this issue Apr 17, 2024 · 2 comments

Comments

@RonanKMcGovern
Copy link

The script quantize_save_load.py generates a quantized model with LoRA adapters.

The base model is then saved and uploaded to LoftQ repos such as this one.

I'm puzzled why the base model weights are 16-bits there because that implies that the base model is somehow upcasted (dequantized) in the quantize_save_load.py script, but I don't see that anywhere.

My baseline expectation is that either:
a) The backbone would be stored in nf4, and then loaded with the 16 bit adapters on top, or
b) The backbone would be upcasted to 16-bit, and then quantized in nf4 upon loading with the 16-bit adapters on top. [But then there should be some upcasting code in quantize_save_load.py].

Could someone clarify? Thanks.

@yxli2123
Copy link
Owner

yxli2123 commented Apr 18, 2024

Hi, @RonanKMcGovern . In the old version of bitsandbytes, it is not allowed to save nf4 format weight. We avoid this issue by saving it in 16 bits in the disk and transforming it into 4 bits when loading it into GPU. However, as the bitsandbytes gets updated recently, it is possible to save it in nf4 format. We will update the code soon.

@RonanKMcGovern
Copy link
Author

RonanKMcGovern commented Apr 19, 2024

Ok, but are you even running the nf4 quantization then?

Or are you just directly saving the bf16 weights? If you're doing that, there is going to be error when reloading the model because the saved bf16 should be the dequantized weights, not the original...

Seems to me something is off because doing even one iteration of loftQ should improve results, but I see worsening results for 1 iteration and more (see this vid), as does kaitchup.substack.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants