-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a scientific paper? #23
Comments
It would be nice to have:
|
A model card would make sense. If there weren't any new per-say techniques then I wouldn't see the need for yet another paper. If there is, then sure! |
I'm happy as long as the code is up to date and the science is released even if not in an academic setting. |
Agree that model card (what data Grok was trained on) is crucial for this being truly "open source" It looks like Grok had a model card back from last November: https://x.ai/model-card/
I doubt we'll get an answer any more detailed than "the Internet" and "whatever synthetic data our employees made" |
Because there's no research underlying it? Nothing new or surprising in the model so far, it's just the same architecture as other MoE LLMs with different data and training compute. |
Disagree. I think @Explosion-Scratch did a good job pointing out why a paper would be useful in this case. |
That's a technical report at best though. |
Call it what you want, the issue is that it doesn't exist. |
Absolutely needs some experiment details on μTransfer of a MOE model that large, if someone noticed several 'weird' multiplier here. Lines 31 to 47 in 7050ed2
|
Is there a scientific paper accompanying this release? I've searched but couldn't find one. I find it odd that the weights would be released but not the research.
The text was updated successfully, but these errors were encountered: