This is a big one. New attention block, new architecture scales, one-parameter scaling to 1.5B, and so much more.
(twitter thread will be updated here when possible).
This is a big one. New attention block, new architecture scales, one-parameter scaling to 1.5B, and so much more.
(twitter thread will be updated here when possible).