Skip to content

v0.3.0 beta (~136-140s)

Choose a tag to compare

@tysam-code tysam-code released this 26 Mar 13:10
· 4 commits to main since this release
c8485a0

Hiya there! In this release, we upgrade the MLP a bit to include the SiGLU activation function (over the default non-linearly-gated GELU function), convert the network over to pure bfloat16 (from a mixed precision dynamic), and perform various optimizations to bring our training time down another 18-22 seconds or so (woop woop!) For more info, check out the twitter thread detailing some of the tweaks for this patch (https://twitter.com/hi_tysam/status/1639975149951672321)! <3 :D :)))) <3 🎆 🎇 🎇 🎆