llama : make loras compatible with repacking #12593

ggerganov · 2025-03-26T14:51:45Z

When a LoRA adapter requires to be loaded to an extra buffer type (such as repacked bufts), load in to the default CPU buffer type instead.

ggml-ci

ggerganov · 2025-03-26T14:52:34Z

@giladgd @amakropoulos Could you test if this change fixes the LoRA usage with Q4_K?

amakropoulos · 2025-03-26T16:21:12Z

@ggerganov yes that fixed it!

I get these warnings but otherwise works as expected:

llama_adapter_lora_init_impl: loading lora adapter from '/home/benuix/.config/LLMUnity/models/Qwen2-0.5B-Instruct-ru-lora.gguf' ...
llama_adapter_lora_init_impl: lora for 'blk.11.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.12.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.14.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.15.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.17.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.18.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.2.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.20.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.22.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.23.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.4.ffn_down.weight' cannot use buft 'CPU_AARCH64'
llama_adapter_lora_init_impl: lora for 'blk.5.ffn_down.weight' cannot use buft 'CPU_AARCH64'

ggerganov · 2025-03-26T16:28:05Z

Yes, the warnings are expected.

ggml-ci

giladgd · 2025-03-26T17:33:00Z

@ggerganov I've run some tests and this seems to solve the issue.

slaren

This should work for now. Ideally we would use the same process as the model weights to determine the buffer type. I think this could be done by extracting to a function the relevant parts of the create_tensor lambda in llama_model::load_tensors, and keeping a mapping of each weight to its LLM_TN_IMPL object in llama_model.

amakropoulos · 2025-03-27T08:48:56Z

Thank you for the quick fix!

llama : make loras compatible with repacking

f58c912

ggml-ci

cont : simplify

9aef6ac

ggml-ci

ggerganov requested a review from slaren March 26, 2025 16:30

slaren approved these changes Mar 26, 2025

View reviewed changes

cont : add TODO [no ci]

1f9dc73

ggerganov merged commit f28bc4c into master Mar 27, 2025
1 check passed

ggerganov deleted the gg/lora-with-repack branch March 27, 2025 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : make loras compatible with repacking #12593

llama : make loras compatible with repacking #12593

ggerganov commented Mar 26, 2025

ggerganov commented Mar 26, 2025

amakropoulos commented Mar 26, 2025

ggerganov commented Mar 26, 2025

giladgd commented Mar 26, 2025

slaren left a comment

amakropoulos commented Mar 27, 2025

llama : make loras compatible with repacking #12593

llama : make loras compatible with repacking #12593

Conversation

ggerganov commented Mar 26, 2025

ggerganov commented Mar 26, 2025

amakropoulos commented Mar 26, 2025

ggerganov commented Mar 26, 2025

giladgd commented Mar 26, 2025

slaren left a comment

Choose a reason for hiding this comment

amakropoulos commented Mar 27, 2025