Skip to content

Releases: tysam-code/hlb-gpt

v0.4.1 beta (<100s)

22 Mar 00:09
a9f158f

Choose a tag to compare

This is a big one. New attention block, new architecture scales, one-parameter scaling to 1.5B, and so much more.

(twitter thread will be updated here when possible).

v0.3.0 beta (~136-140s)

26 Mar 13:10
c8485a0

Choose a tag to compare

Hiya there! In this release, we upgrade the MLP a bit to include the SiGLU activation function (over the default non-linearly-gated GELU function), convert the network over to pure bfloat16 (from a mixed precision dynamic), and perform various optimizations to bring our training time down another 18-22 seconds or so (woop woop!) For more info, check out the twitter thread detailing some of the tweaks for this patch (https://twitter.com/hi_tysam/status/1639975149951672321)! <3 :D :)))) <3 🎆 🎇 🎇 🎆

v0.2.0 beta

22 Mar 02:39
4535aa0

Choose a tag to compare

Hi there! In this release, we add sequence length scheduling and make a few other tweaks! For more info on the sequence length scheduling (and the relevant supporting changes), please see the release tweet at https://twitter.com/hi_tysam/status/1637691454012153856?cxt=HHwWgICzgevsn7otAAAA

beta v0.1.0

20 Mar 00:34
0ce502e

Choose a tag to compare

Greetings. In this release (originally from 3/12/23), we add a few features that cuts the training time nearly in half. This tag also includes a hotfix to restore backwards compatibility for people with torch versions less than 2.0.

For a more detailed summary of this release, please check out https://twitter.com/hi_tysam/status/1635123488674697218?cxt=HHwWhMDSpcqJkLEtAAAA

baseline 0.0.0

06 Mar 02:18

Choose a tag to compare

Hi hi hiya there! <3 :D Feel free to check out the README.md on this tag, it has the best summary of this release that I could probably give (also, so much typing and proofreading today, as always on release days I suppose, I am beat! :'D)