Can I run VLLM with 5090+5070Ti for Llama 70B Q4 (needs approximately 42 GB) inference? Or do I need identical GPUs? #14706

jayavanth · 2025-03-12T22:20:45Z

jayavanth
Mar 12, 2025

I just want ~42GB of GPU VRAM so I don't think I should spend ~$1000 to get 5090 which is already hard to get. Do you think mismatched GPU will work? What kind of time to first token can I expect from this setup?

According to this, I can expect 0.4s on 5090 and 0.5s on 5070Ti for Llama3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I run VLLM with 5090+5070Ti for Llama 70B Q4 (needs approximately 42 GB) inference? Or do I need identical GPUs? #14706

{{title}}

Replies: 0 comments

Select a reply

Can I run VLLM with 5090+5070Ti for Llama 70B Q4 (needs approximately 42 GB) inference? Or do I need identical GPUs? #14706

jayavanth Mar 12, 2025

Replies: 0 comments

jayavanth
Mar 12, 2025