Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM with A100 8*80G #125

Open
nanhexinyu opened this issue Mar 18, 2024 · 7 comments
Open

OOM with A100 8*80G #125

nanhexinyu opened this issue Mar 18, 2024 · 7 comments

Comments

@nanhexinyu
Copy link

How can i run the demo case with random data?
I use A100 8 * 80G GPU and still OOM error
I think it because I start the case with fp16 or fp32, how to use QW8Bit in random data?
thanks~

@nanhexinyu
Copy link
Author

when I change float32 to int8 , it has other problem.
w = hk.get_parameter( "w", [input_size, output_size], jnp.int8, init=hk.initializers.Constant(0))
raise TypeError(f"{name} argument does not appear valid. It should be a "
TypeError: params argument does not appear valid. It should be a mapping but is of type <class 'model.TrainingState'>. For reference the parameters for apply are apply(params, rng, ...)`` for hk.transformandapply(params, state, rng, ...)forhk.transform_with_state`.

@jesst3r
Copy link

jesst3r commented Mar 18, 2024

Silly me, thinking that I could run Grok on my two 3090TIs :)

@null0034
Copy link

Silly me, thinking that I could run Grok on my two 3090TIs :)傻了我,以为我可以在我的两张3090TIs上运行Grok :)

Clearly, the memory of this graphics card is still far from sufficient; it's too large!

@zRzRzRzRzRzRzR
Copy link

It will cost 65GB GPU memory in per A100 80G..

@atgsmsg
Copy link

atgsmsg commented Mar 19, 2024

H100 SXM5 NVLink GPU x 8
$34,000.00 each ($272,000.00)

AMD 100-000000802 EPYC 9124 Genoa 9004 Series 16-core 3 GHz Server Processor × 2
$1,111.00 each (2,222.00)

24 x 64GB DDR5 4800 ECC Reg Server Compatible Memory Kit (1.5TB Total)
$8,280.00

Micron MTFDKCB960TFR-1BC1ZABYYR 7450 PRO 960 GB Solid State Drive - 2.5" Internal - U.3 (PCI Express NVMe 4.0 x4) - Read Intensive - TAA Compliant
$142.00 each

total $297,019.00 (without station/power units)

@surak
Copy link

surak commented Mar 19, 2024

I can confirm that 512gb ram and 4*A100 40gb is not enough for it.

@xuyixun21
Copy link

Silly me, thinking that I could run Grok on my two 3090TIs :)

you're so funny!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants