We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi there,
I would like to know if we could use -ngl to load last N layers to GPU instead of first N.
-ngl
last N layers
If possible can someone please point me to a place where I should modify the source code?
llama-bench for example.
llama-bench
The text was updated successfully, but these errors were encountered:
take MoE models as an example. I would like to load first M Dense layers to CPU, then rest N MoE layers to GPU
MoE
first M Dense layers to CPU
rest N MoE layers to GPU
Sorry, something went wrong.
My proposal: update -ngl to something like this:
https://github.com/foldl/chatllm.cpp/blob/master/docs/gpu.md#usage
No branches or pull requests
Hi there,
I would like to know if we could use
-ngl
to loadlast N layers
to GPU instead of first N.If possible can someone please point me to a place where I should modify the source code?
llama-bench
for example.The text was updated successfully, but these errors were encountered: