-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : make loras compatible with repacking #12593
Conversation
@giladgd @amakropoulos Could you test if this change fixes the LoRA usage with |
@ggerganov yes that fixed it! I get these warnings but otherwise works as expected:
|
Yes, the warnings are expected. |
ggml-ci
@ggerganov I've run some tests and this seems to solve the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work for now. Ideally we would use the same process as the model weights to determine the buffer type. I think this could be done by extracting to a function the relevant parts of the create_tensor
lambda in llama_model::load_tensors
, and keeping a mapping of each weight to its LLM_TN_IMPL
object in llama_model
.
Thank you for the quick fix! |
fix #12587
ref #12181 (comment)
When a LoRA adapter requires to be loaded to an extra buffer type (such as repacked bufts), load in to the default CPU buffer type instead.