Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Running deepseek-r1-dynamic-1.58-bit,the KV cache question #11757

Closed
junyan-zg opened this issue Feb 8, 2025 · 4 comments
Closed

When Running deepseek-r1-dynamic-1.58-bit,the KV cache question #11757

junyan-zg opened this issue Feb 8, 2025 · 4 comments
Labels

Comments

@junyan-zg
Copy link

I had 8 * 4090,I want to put the 7 cards on gpu_layers,the KV cache for the free card,how to set?Now the KV on the cpu load is so slowly.

@akaikite
Copy link

I'm no expert, but I think adjusting the tensor_split setting should fix it. It seems like you should be able to compress tensors across 7 cards and push some tensors and kv cache onto 1 card.

@nullnuller
Copy link

Similar question. Is there any guide on this?

@leeetao
Copy link

leeetao commented Feb 13, 2025

Can the latest llama.cpp now support running the deepseek-r1-dynamic-1.58-bit model, assuming sufficient hardware memory?

@github-actions github-actions bot added the stale label Mar 16, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants