Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of devices 1 must equal the product of mesh_shape (1, 8)? #198

Closed
vjsplus opened this issue Mar 19, 2024 · 2 comments
Closed

Number of devices 1 must equal the product of mesh_shape (1, 8)? #198

vjsplus opened this issue Mar 19, 2024 · 2 comments

Comments

@vjsplus
Copy link

vjsplus commented Mar 19, 2024

Maybe it's a stupid question. What equipment do you need?

grok-1 % python run.py
INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': 
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/anaconda3/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file), '/usr/local/lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache)
INFO:rank:Initializing mesh for self.local_mesh_config=(1, 8) self.between_hosts_config=(1, 1)...
INFO:rank:Detected 1 devices in mesh
Traceback (most recent call last):
  File "/Users/vjs/grok-1/run.py", line 72, in <module>
    main()
  File "/Users/vjs/grok-1/run.py", line 63, in main
    inference_runner.initialize()
  File "/Users/vjs/grok-1/runners.py", line 282, in initialize
    runner.initialize(
  File "/Users/vjs/grok-1/runners.py", line 181, in initialize
    self.mesh = make_mesh(self.local_mesh_config, self.between_hosts_config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vjs/grok-1/runners.py", line 586, in make_mesh
    device_mesh = mesh_utils.create_hybrid_device_mesh(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/jax/experimental/mesh_utils.py", line 373, in create_hybrid_device_mesh
    per_granule_meshes = [create_device_mesh(mesh_shape, granule)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/jax/experimental/mesh_utils.py", line 373, in <listcomp>
    per_granule_meshes = [create_device_mesh(mesh_shape, granule)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/jax/experimental/mesh_utils.py", line 302, in create_device_mesh
    raise ValueError(f'Number of devices {len(devices)} must equal the product '
ValueError: Number of devices 1 must equal the product of mesh_shape (1, 8)
@bluevisor
Copy link

#38 (comment)

@trholding
Copy link

8 GPUs are required with total VRAM that can fit the model.

Please see: #183

Please close this issue and move to:

https://github.com/xai-org/grok-1/discussions

Duplicate of / Related to:
#38

Reason:
User's shell issue / Not a real issue
#69
#108

@xSetech xSetech closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants