When Running deepseek-r1-dynamic-1.58-bit，the KV cache question #11757

junyan-zg · 2025-02-08T11:33:15Z

I had 8 * 4090，I want to put the 7 cards on gpu_layers，the KV cache for the free card，how to set？Now the KV on the cpu load is so slowly.

akaikite · 2025-02-11T15:52:38Z

I'm no expert, but I think adjusting the tensor_split setting should fix it. It seems like you should be able to compress tensors across 7 cards and push some tensors and kv cache onto 1 card.

nullnuller · 2025-02-12T03:39:29Z

Similar question. Is there any guide on this?

leeetao · 2025-02-13T15:41:31Z

Can the latest llama.cpp now support running the deepseek-r1-dynamic-1.58-bit model, assuming sufficient hardware memory?

github-actions · 2025-03-30T01:07:51Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 16, 2025

github-actions bot closed this as completed Mar 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When Running deepseek-r1-dynamic-1.58-bit，the KV cache question #11757

When Running deepseek-r1-dynamic-1.58-bit，the KV cache question #11757

junyan-zg commented Feb 8, 2025

akaikite commented Feb 11, 2025

nullnuller commented Feb 12, 2025

leeetao commented Feb 13, 2025

github-actions bot commented Mar 30, 2025

When Running deepseek-r1-dynamic-1.58-bit，the KV cache question #11757

When Running deepseek-r1-dynamic-1.58-bit，the KV cache question #11757

Comments

junyan-zg commented Feb 8, 2025

akaikite commented Feb 11, 2025

nullnuller commented Feb 12, 2025

leeetao commented Feb 13, 2025

github-actions bot commented Mar 30, 2025