Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grok implementation details #14

Closed
andrewgcodes opened this issue Mar 17, 2024 · 4 comments
Closed

Grok implementation details #14

andrewgcodes opened this issue Mar 17, 2024 · 4 comments

Comments

@andrewgcodes
Copy link

not an issue but would be nice if it was in the readme/model.py header:
314B parameters
Mixture of 8 Experts
2 experts used per token
64 layers
48 attention heads for queries
8 attention heads for keys/values
embeddings size: 6,144
rotary embeddings (RoPE)
SentencePiece tokenizer; 131,072 tokens
Supports activation sharding and 8-bit quantization
Max seq length (context): 8,192 tokens

@morozover
Copy link

Could you please explain what is the source of this info?

@garethpaul
Copy link
Contributor

garethpaul commented Mar 18, 2024

@morozover you can find most of the values in TransformerConfig

grok-1/run.py

Line 33 in e50578b

model=TransformerConfig(

Additional see Rotary Embedding implementation in class RotaryEmbedding

grok-1/model.py

Line 635 in e50578b

class RotaryEmbedding(hk.Module):

ibab pushed a commit that referenced this issue Mar 18, 2024
Added an overview of the model as discussed in response to #14. 

Adding more info on the the model specs before they proceed to download
the checkpoints should help folks ensure they have the necessary
resources to effectively utilize Grok-1.
@Konard
Copy link

Konard commented Mar 18, 2024

@andrewgcodes since it is merged at #27, I think this issue can be closed.

@andrewgcodes
Copy link
Author

The details I shared have been added to the README.

ccygh pushed a commit to ccygh/grok-1 that referenced this issue Mar 21, 2024
Added an overview of the model as discussed in response to xai-org#14. 

Adding more info on the the model specs before they proceed to download
the checkpoints should help folks ensure they have the necessary
resources to effectively utilize Grok-1.

Former-commit-id: 1ff4435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants