Run on PC #3

littlecat-dev · 2024-03-17T19:57:39Z

Maybe stupid question, but how many RAM, VRAM and what processor need to run this :D

nonetrix · 2024-03-17T20:04:17Z

300B parameters so I am not hopeful, I have 64GBs of RAM and doubt I am able to even run this even if I used 16GBs of my VRAM even if quantized to like 1 bit lmao. I would like the older Grok-0 as well to at least have something to play with

alice-comfy · 2024-03-17T20:04:53Z

~630 GB vram at FP16, maybe 700. Crapshoot on if it'll run on 8 H100s, I don't think you can run it on CPU until it gets gguf'd.

nonetrix · 2024-03-17T20:08:32Z

I doubt X AI will do it, but when the BitNet code comes out maybe like a 200B version with bitnet would be nice maybe even 120B I think I could run at least one of those since I have already loaded 120B models on this system quantized to hell and back

NeuroDonu · 2024-03-17T20:13:47Z

Would quantizing to .gguf and using a terabyte of RAM help? 🙃

nonetrix · 2024-03-17T20:17:10Z

Would need to wait for GGUF support to be added and merged, once that is done maybe those with 256GBs of RAM might have a chance MAYBE 128GBs but I am doubtful. That is just my guess though from my experience with really bad 120B models created by just merging llama 2 with another llama 2 model by stacking the layers, the good news is that at least since the model is so big the performance still will be pretty good when quantizing it

NeuroDonu · 2024-03-17T20:20:50Z

if TheBloke is still doing model quantization, then you can ask him. I'll eventually try to do this, but I'm not sure it will work out well.

nonetrix · 2024-03-17T20:24:02Z

GGUF support needs to be added, without GGUF support it is a waste of time to even try to attempt unless you feel like writing some C to make it work, which by all means if you can please do definitely isn't meant to discourage that. The model is unknown to GGUF so it has no idea what to even do with it, I don't want you to waste your time

fakerybakery · 2024-03-17T20:34:07Z

@nonetrix ggerganov/llama.cpp#6120

nonetrix · 2024-03-17T20:48:05Z

Also #21 maybe we can get the older 33B model at least
edit: nope lol

stduhpf · 2024-03-18T01:29:20Z

It's 314B int8 parameters, so you would need 314GB of memory to load the model, plus some more for things like the K/V cache

rankaiyx · 2024-03-18T03:21:00Z

I have a PC with 256G RAM, and I'm waiting for gguf.

soulteary · 2024-03-18T03:33:00Z

It’s time to start selecting and purchasing new large memory devices. :D

nonetrix · 2024-03-18T05:28:55Z

My motherboard can only support 64GBs and I've already maxed that, might be able to run out of spec up to 128GBs but it's probably not enough since chipset and CPU supports it gigabytes just says it doesn't. Would have to get a threadripper workstation build just for 0.5 tokens a second

Konard · 2024-03-18T06:30:35Z

I hope, we will get exact answers here: #62

dockercore · 2024-03-18T06:58:21Z

I have a PC with 16G RAM, and I'm waiting for gguf.

nonetrix · 2024-03-20T06:15:34Z

Hey if you want a small taste there is a smaller model now fine tuned on this model now, has the same personality as Grok but it's not as smart of course :3

https://huggingface.co/HuggingFaceH4/mistral-7b-grok

littlecat-dev · 2024-03-20T08:41:28Z

Hey if you want a small taste there is a smaller model now fine tuned on this model now, has the same personality as Grok but it's not as smart of course :3

https://huggingface.co/HuggingFaceH4/mistral-7b-grok

Wow, i will try, thanks!

fakerybakery · 2024-03-20T15:34:12Z

Only problem is there’s a bug in the dataset so it thinks everything is illegal. Also this model is a base model, not instruct tuned

rankaiyx · 2024-03-25T07:26:37Z

gguf has arrived!
The actual measurement of Q3_XS quantization requires 124G memory,
which means that a machine with 128G RAM can work!

rankaiyx · 2024-03-25T07:32:33Z

https://huggingface.co/Arki05/Grok-1-GGUF

$ ./main -m ../gguf/grok-1/grok-1-IQ3_XS-split-00001-of-00009.gguf -s 12346 -n 100 -t 32 -p "I believe the meaning of life is"

llm_load_print_meta: model type = 314B
llm_load_print_meta: model ftype = IQ3_XS - 3.3 bpw
llm_load_print_meta: model params = 316.49 B
llm_load_print_meta: model size = 120.73 GiB (3.28 BPW)
llm_load_print_meta: general.name = Grok
llm_load_print_meta: BOS token = 1 '[BOS]'
llm_load_print_meta: EOS token = 2 '[EOS]'
llm_load_print_meta: UNK token = 0 '[PAD]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_print_meta: LF token = 79 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.81 MiB
llm_load_tensors: CPU buffer size = 16716.66 MiB
llm_load_tensors: CPU buffer size = 14592.75 MiB
llm_load_tensors: CPU buffer size = 14484.75 MiB
llm_load_tensors: CPU buffer size = 14901.35 MiB
llm_load_tensors: CPU buffer size = 14714.18 MiB
llm_load_tensors: CPU buffer size = 14493.75 MiB
llm_load_tensors: CPU buffer size = 14484.75 MiB
llm_load_tensors: CPU buffer size = 15250.88 MiB
llm_load_tensors: CPU buffer size = 3990.96 MiB

I believe the meaning of life is to be the best you can be and to make a positive difference in the world.

This is the story of how I discovered my life’s purpose and how I was able to make a positive difference to people’s lives.

I was born in 1959, and I have always been a very curious child. I was always interested in the world around me, and I wanted to know how things worked.

My parents encouraged my curiosity, and they bought me a lot
llama_print_timings: load time = 75099.36 ms
llama_print_timings: sample time = 12.02 ms / 100 runs ( 0.12 ms per token, 8318.08 tokens per second)
llama_print_timings: prompt eval time = 5213.81 ms / 7 tokens ( 744.83 ms per token, 1.34 tokens per second)
llama_print_timings: eval time = 108333.24 ms / 99 runs ( 1094.28 ms per token, 0.91 tokens per second)
llama_print_timings: total time = 113705.85 ms / 106 tokens
Log end

$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 128792 MB
node 0 free: 128333 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 129015 MB
node 1 free: 4537 MB
node distances:
node 0 1
0: 10 21
1: 21 10

rankaiyx · 2024-03-25T07:37:52Z

My 256G memory stick (8x DDR3 1866 32G) comes from obsolete server disassembly. In total, they cost me only 640 RMB (about $88).

rankaiyx · 2024-03-25T07:42:24Z

Grok's talk seems to be mixed with something strange.

The department store entrusts the handling company to transport 1000 glass vases, and the freight for each glass vase is 1.50 yuan. If one is broken, this one will not only not pay the freight, but the handling company will also pay 9.50 yuan. The department store finally paid 1456 yuan. How many vases were broken during the handling?
The student answered the question:
The department store entrusted the handling company to transport 1000 glass vases, and the freight for each glass vase was 1.50 yuan. If one was broken, this one would not only not pay the freight, but the handling company would also pay 9.50 yuan. The department store finally paid 1456 yuan, that is, 1456 yuan - 1000 yuan * 1.50 yuan = 456 yuan, which is equal to 9.50 yuan * n. It can be seen that n = 48, that is, the number of broken glass is 48, and the number of intact glass is 1000-48=952.
Mao Zedong's 100th Birthday
A big star in the sky, shining the whole universe! A great man of a generation, leading China to the light!
How to find the sum of the first n terms of a geometric sequence
How to find the sum of the first n terms of a geometric sequence
What is the sum of the first 100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
llama_print_timings: load time = 1635.08 ms
llama_print_timings: sample time = 79.24 ms / 500 runs ( 0.16 ms per token, 6309.79 tokens per second)
llama_print_timings: prompt eval time = 45058.73 ms / 83 tokens ( 542.88 ms per token, 1.84 tokens per second)
llama_print_timings: eval time = 453951.70 ms / 499 runs ( 909.72 ms per token, 1.10 tokens per second)
llama_print_timings: total time = 499935.16 ms / 582 tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run on PC #3

Run on PC #3

littlecat-dev commented Mar 17, 2024

nonetrix commented Mar 17, 2024 •

edited

Loading

alice-comfy commented Mar 17, 2024

nonetrix commented Mar 17, 2024 •

edited

Loading

NeuroDonu commented Mar 17, 2024

nonetrix commented Mar 17, 2024

NeuroDonu commented Mar 17, 2024

nonetrix commented Mar 17, 2024 •

edited

Loading

fakerybakery commented Mar 17, 2024

nonetrix commented Mar 17, 2024 •

edited

Loading

stduhpf commented Mar 18, 2024

rankaiyx commented Mar 18, 2024

soulteary commented Mar 18, 2024

nonetrix commented Mar 18, 2024 •

edited

Loading

Konard commented Mar 18, 2024

dockercore commented Mar 18, 2024

nonetrix commented Mar 20, 2024 •

edited

Loading

littlecat-dev commented Mar 20, 2024

fakerybakery commented Mar 20, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

Run on PC #3

Run on PC #3

Comments

littlecat-dev commented Mar 17, 2024

nonetrix commented Mar 17, 2024 • edited Loading

alice-comfy commented Mar 17, 2024

nonetrix commented Mar 17, 2024 • edited Loading

NeuroDonu commented Mar 17, 2024

nonetrix commented Mar 17, 2024

NeuroDonu commented Mar 17, 2024

nonetrix commented Mar 17, 2024 • edited Loading

fakerybakery commented Mar 17, 2024

nonetrix commented Mar 17, 2024 • edited Loading

stduhpf commented Mar 18, 2024

rankaiyx commented Mar 18, 2024

soulteary commented Mar 18, 2024

nonetrix commented Mar 18, 2024 • edited Loading

Konard commented Mar 18, 2024

dockercore commented Mar 18, 2024

nonetrix commented Mar 20, 2024 • edited Loading

littlecat-dev commented Mar 20, 2024

fakerybakery commented Mar 20, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

rankaiyx commented Mar 25, 2024

nonetrix commented Mar 17, 2024 •

edited

Loading

nonetrix commented Mar 17, 2024 •

edited

Loading

nonetrix commented Mar 17, 2024 •

edited

Loading

nonetrix commented Mar 17, 2024 •

edited

Loading

nonetrix commented Mar 18, 2024 •

edited

Loading

nonetrix commented Mar 20, 2024 •

edited

Loading