-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grok implementation details #14
Comments
Could you please explain what is the source of this info? |
@morozover you can find most of the values in TransformerConfig Line 33 in e50578b
Additional see Rotary Embedding implementation in class RotaryEmbedding Line 635 in e50578b
|
Added an overview of the model as discussed in response to #14. Adding more info on the the model specs before they proceed to download the checkpoints should help folks ensure they have the necessary resources to effectively utilize Grok-1.
@andrewgcodes since it is merged at #27, I think this issue can be closed. |
The details I shared have been added to the README. |
Added an overview of the model as discussed in response to xai-org#14. Adding more info on the the model specs before they proceed to download the checkpoints should help folks ensure they have the necessary resources to effectively utilize Grok-1. Former-commit-id: 1ff4435
not an issue but would be nice if it was in the readme/model.py header:
314B parameters
Mixture of 8 Experts
2 experts used per token
64 layers
48 attention heads for queries
8 attention heads for keys/values
embeddings size: 6,144
rotary embeddings (RoPE)
SentencePiece tokenizer; 131,072 tokens
Supports activation sharding and 8-bit quantization
Max seq length (context): 8,192 tokens
The text was updated successfully, but these errors were encountered: