OOM with A100 8*80G #125

nanhexinyu · 2024-03-18T11:45:37Z

How can i run the demo case with random data?
I use A100 8 * 80G GPU and still OOM error
I think it because I start the case with fp16 or fp32, how to use QW8Bit in random data?
thanks~

nanhexinyu · 2024-03-18T12:25:48Z

when I change float32 to int8 , it has other problem.
w = hk.get_parameter( "w", [input_size, output_size], jnp.int8, init=hk.initializers.Constant(0))
raise TypeError(f"{name} argument does not appear valid. It should be a "
TypeError: params argument does not appear valid. It should be a mapping but is of type <class 'model.TrainingState'>. For reference the parameters for apply are apply(params, rng, ...)`` for hk.transformandapply(params, state, rng, ...)forhk.transform_with_state`.

jesst3r · 2024-03-18T13:16:49Z

Silly me, thinking that I could run Grok on my two 3090TIs :)

null0034 · 2024-03-18T17:21:51Z

Silly me, thinking that I could run Grok on my two 3090TIs :)傻了我，以为我可以在我的两张3090TIs上运行Grok :)

Clearly, the memory of this graphics card is still far from sufficient; it's too large!

zRzRzRzRzRzRzR · 2024-03-19T04:55:13Z

It will cost 65GB GPU memory in per A100 80G..

atgsmsg · 2024-03-19T16:16:37Z

H100 SXM5 NVLink GPU x 8
$34,000.00 each ($272,000.00)

AMD 100-000000802 EPYC 9124 Genoa 9004 Series 16-core 3 GHz Server Processor × 2
$1,111.00 each (2,222.00)

24 x 64GB DDR5 4800 ECC Reg Server Compatible Memory Kit (1.5TB Total)
$8,280.00

Micron MTFDKCB960TFR-1BC1ZABYYR 7450 PRO 960 GB Solid State Drive - 2.5" Internal - U.3 (PCI Express NVMe 4.0 x4) - Read Intensive - TAA Compliant
$142.00 each

total $297,019.00 (without station/power units)

surak · 2024-03-19T16:51:18Z

I can confirm that 512gb ram and 4*A100 40gb is not enough for it.

xuyixun21 · 2024-03-22T11:19:00Z

Silly me, thinking that I could run Grok on my two 3090TIs :)

you're so funny!

Konard mentioned this issue Mar 18, 2024

Hardware requirements #62

Open

trholding mentioned this issue Mar 19, 2024

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM with A100 8*80G #125

OOM with A100 8*80G #125

nanhexinyu commented Mar 18, 2024

nanhexinyu commented Mar 18, 2024

jesst3r commented Mar 18, 2024

null0034 commented Mar 18, 2024

zRzRzRzRzRzRzR commented Mar 19, 2024

atgsmsg commented Mar 19, 2024

surak commented Mar 19, 2024

xuyixun21 commented Mar 22, 2024

OOM with A100 8*80G #125

OOM with A100 8*80G #125

Comments

nanhexinyu commented Mar 18, 2024

nanhexinyu commented Mar 18, 2024

jesst3r commented Mar 18, 2024

null0034 commented Mar 18, 2024

zRzRzRzRzRzRzR commented Mar 19, 2024

atgsmsg commented Mar 19, 2024

surak commented Mar 19, 2024

xuyixun21 commented Mar 22, 2024