Grok implementation details #14

andrewgcodes · 2024-03-17T20:34:06Z

not an issue but would be nice if it was in the readme/model.py header:
314B parameters
Mixture of 8 Experts
2 experts used per token
64 layers
48 attention heads for queries
8 attention heads for keys/values
embeddings size: 6,144
rotary embeddings (RoPE)
SentencePiece tokenizer; 131,072 tokens
Supports activation sharding and 8-bit quantization
Max seq length (context): 8,192 tokens

morozover · 2024-03-18T11:09:32Z

Could you please explain what is the source of this info?

garethpaul · 2024-03-18T14:22:08Z

@morozover you can find most of the values in TransformerConfig

grok-1/run.py

Line 33 in e50578b

model=TransformerConfig(

Additional see Rotary Embedding implementation in class RotaryEmbedding

grok-1/model.py

Line 635 in e50578b

class RotaryEmbedding(hk.Module):

Added an overview of the model as discussed in response to #14. Adding more info on the the model specs before they proceed to download the checkpoints should help folks ensure they have the necessary resources to effectively utilize Grok-1.

Konard · 2024-03-18T17:30:47Z

@andrewgcodes since it is merged at #27, I think this issue can be closed.

andrewgcodes · 2024-03-18T17:36:26Z

The details I shared have been added to the README.

Added an overview of the model as discussed in response to xai-org#14. Adding more info on the the model specs before they proceed to download the checkpoints should help folks ensure they have the necessary resources to effectively utilize Grok-1. Former-commit-id: 1ff4435

garethpaul mentioned this issue Mar 17, 2024

Update README with Model Specifications #27

Merged

andrewgcodes closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grok implementation details #14

Grok implementation details #14

andrewgcodes commented Mar 17, 2024

morozover commented Mar 18, 2024

garethpaul commented Mar 18, 2024 •

edited

Konard commented Mar 18, 2024

andrewgcodes commented Mar 18, 2024

Grok implementation details #14

Grok implementation details #14

Comments

andrewgcodes commented Mar 17, 2024

morozover commented Mar 18, 2024

garethpaul commented Mar 18, 2024 • edited

Konard commented Mar 18, 2024

andrewgcodes commented Mar 18, 2024

garethpaul commented Mar 18, 2024 •

edited