Why are base weights on HF LoftQ models in 16-bit? #26

RonanKMcGovern · 2024-04-17T15:33:15Z

The script quantize_save_load.py generates a quantized model with LoRA adapters.

The base model is then saved and uploaded to LoftQ repos such as this one.

I'm puzzled why the base model weights are 16-bits there because that implies that the base model is somehow upcasted (dequantized) in the quantize_save_load.py script, but I don't see that anywhere.

My baseline expectation is that either:
a) The backbone would be stored in nf4, and then loaded with the 16 bit adapters on top, or
b) The backbone would be upcasted to 16-bit, and then quantized in nf4 upon loading with the 16-bit adapters on top. [But then there should be some upcasting code in quantize_save_load.py].

Could someone clarify? Thanks.

yxli2123 · 2024-04-18T17:46:07Z

Hi, @RonanKMcGovern . In the old version of bitsandbytes, it is not allowed to save nf4 format weight. We avoid this issue by saving it in 16 bits in the disk and transforming it into 4 bits when loading it into GPU. However, as the bitsandbytes gets updated recently, it is possible to save it in nf4 format. We will update the code soon.

RonanKMcGovern · 2024-04-19T10:49:03Z

Ok, but are you even running the nf4 quantization then?

Or are you just directly saving the bf16 weights? If you're doing that, there is going to be error when reloading the model because the saved bf16 should be the dequantized weights, not the original...

Seems to me something is off because doing even one iteration of loftQ should improve results, but I see worsening results for 1 iteration and more (see this vid), as does kaitchup.substack.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are base weights on HF LoftQ models in 16-bit? #26

Why are base weights on HF LoftQ models in 16-bit? #26

RonanKMcGovern commented Apr 17, 2024

yxli2123 commented Apr 18, 2024 •

edited

Loading

RonanKMcGovern commented Apr 19, 2024 •

edited

Loading

Why are base weights on HF LoftQ models in 16-bit? #26

Why are base weights on HF LoftQ models in 16-bit? #26

Comments

RonanKMcGovern commented Apr 17, 2024

yxli2123 commented Apr 18, 2024 • edited Loading

RonanKMcGovern commented Apr 19, 2024 • edited Loading

yxli2123 commented Apr 18, 2024 •

edited

Loading

RonanKMcGovern commented Apr 19, 2024 •

edited

Loading