You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm no expert, but I think adjusting the tensor_split setting should fix it. It seems like you should be able to compress tensors across 7 cards and push some tensors and kv cache onto 1 card.
I had 8 * 4090,I want to put the 7 cards on gpu_layers,the KV cache for the free card,how to set?Now the KV on the cpu load is so slowly.
The text was updated successfully, but these errors were encountered: