-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run on PC #3
Comments
300B parameters so I am not hopeful, I have 64GBs of RAM and doubt I am able to even run this even if I used 16GBs of my VRAM even if quantized to like 1 bit lmao. I would like the older Grok-0 as well to at least have something to play with |
~630 GB vram at FP16, maybe 700. Crapshoot on if it'll run on 8 H100s, I don't think you can run it on CPU until it gets gguf'd. |
I doubt X AI will do it, but when the BitNet code comes out maybe like a 200B version with bitnet would be nice maybe even 120B I think I could run at least one of those since I have already loaded 120B models on this system quantized to hell and back |
Would quantizing to .gguf and using a terabyte of RAM help? 🙃 |
Would need to wait for GGUF support to be added and merged, once that is done maybe those with 256GBs of RAM might have a chance MAYBE 128GBs but I am doubtful. That is just my guess though from my experience with really bad 120B models created by just merging llama 2 with another llama 2 model by stacking the layers, the good news is that at least since the model is so big the performance still will be pretty good when quantizing it |
if TheBloke is still doing model quantization, then you can ask him. I'll eventually try to do this, but I'm not sure it will work out well. |
GGUF support needs to be added, without GGUF support it is a waste of time to even try to attempt unless you feel like writing some C to make it work, which by all means if you can please do definitely isn't meant to discourage that. The model is unknown to GGUF so it has no idea what to even do with it, I don't want you to waste your time |
Also #21 maybe we can get the older 33B model at least |
It's 314B int8 parameters, so you would need 314GB of memory to load the model, plus some more for things like the K/V cache |
I have a PC with 256G RAM, and I'm waiting for gguf. |
It’s time to start selecting and purchasing new large memory devices. :D |
I hope, we will get exact answers here: #62 |
I have a PC with 16G RAM, and I'm waiting for gguf. |
Hey if you want a small taste there is a smaller model now fine tuned on this model now, has the same personality as Grok but it's not as smart of course :3 |
Wow, i will try, thanks! |
Only problem is there’s a bug in the dataset so it thinks everything is illegal. Also this model is a base model, not instruct tuned |
gguf has arrived! |
https://huggingface.co/Arki05/Grok-1-GGUF $ ./main -m ../gguf/grok-1/grok-1-IQ3_XS-split-00001-of-00009.gguf -s 12346 -n 100 -t 32 -p "I believe the meaning of life is" llm_load_print_meta: model type = 314B I believe the meaning of life is to be the best you can be and to make a positive difference in the world. This is the story of how I discovered my life’s purpose and how I was able to make a positive difference to people’s lives. I was born in 1959, and I have always been a very curious child. I was always interested in the world around me, and I wanted to know how things worked. My parents encouraged my curiosity, and they bought me a lot $ numactl -H |
My 256G memory stick (8x DDR3 1866 32G) comes from obsolete server disassembly. In total, they cost me only 640 RMB (about $88). |
Grok's talk seems to be mixed with something strange. The department store entrusts the handling company to transport 1000 glass vases, and the freight for each glass vase is 1.50 yuan. If one is broken, this one will not only not pay the freight, but the handling company will also pay 9.50 yuan. The department store finally paid 1456 yuan. How many vases were broken during the handling? |
Maybe stupid question, but how many RAM, VRAM and what processor need to run this :D
The text was updated successfully, but these errors were encountered: