Mixtral with RTX 4090 #2454

Bonaqa3 · 2024-01-16T13:04:44Z

Bonaqa3
Jan 16, 2024

I'm able to run a Mixtral Model with my RTX 4090 in oobabooga text-ui. I'm running a GGUF Mixtral model with branch 4_K_M. With vllm I'm trying to run different Mixtral models in vanilla and AWQ Mode but always get different types of out of memory exceptions. Has been anybody able to run a mixtral model in vllm with a RTX 4090?

Thanks!

russellballestrini · 2024-01-22T17:01:40Z

russellballestrini
Jan 22, 2024

mixtral is 50G worth of model weights, it's 8x7B, the quantized version of mistral 7b is 3x5G files and all that is fine to load into the 24G of an RTX 4090.

The GGUF are quantized models to use less memory.

If you search for "quantized" in the discussions it seems like lots of requests for this feature but no acknowledgement.

0 replies

choucavalier · 2024-03-19T21:44:40Z

choucavalier
Mar 19, 2024

did you manage to run it? i'm trying to run quantized models (e.g. TheBloke) on my 2xRTX4090 machine and I get OOM errors all the time

0 replies

Bonaqa3 · 2024-03-20T07:07:13Z

Bonaqa3
Mar 20, 2024
Author

Yes, I switched from vllm to tabbyAPI. It runs models with exl2 quantization. With this I was able to run multiple high end models like Mixtral with one RTX4090. It even runs on windows.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral with RTX 4090 #2454

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Mixtral with RTX 4090 #2454

Bonaqa3 Jan 16, 2024

Replies: 3 comments

russellballestrini Jan 22, 2024

choucavalier Mar 19, 2024

Bonaqa3 Mar 20, 2024 Author

Bonaqa3
Jan 16, 2024

russellballestrini
Jan 22, 2024

choucavalier
Mar 19, 2024

Bonaqa3
Mar 20, 2024
Author