Merge pull request
#1 from michuhu/minor-fixes
Pull request merge
jmaczanpushed 9 commits to main • c558af9…18b1c98 • on Feb 1
jmaczanpushed 1 commit to main • 88358d5…c558af9 • on Nov 20, 2024
jmaczanpushed 1 commit to main • 9a02eb6…88358d5 • on Oct 9, 2024
jmaczanpushed 1 commit to main • 983384f…9a02eb6 • on Oct 9, 2024
jmaczanpushed 1 commit to main • f2871bc…983384f • on Oct 9, 2024
Output linear transformation weights initialization fix and heads ite…
jmaczanpushed 1 commit to main • 57f3e26…f2871bc • on Jul 14, 2024
Increase default hparams a bit
jmaczanpushed 1 commit to main • 990376a…57f3e26 • on Jul 14, 2024
Minor updates, trying to align my code with Karpathy's, maybe it'll i…
jmaczanpushed 1 commit to main • 7d1a286…990376a • on Jul 14, 2024
Gradient clipping, detect anomaly, use pad tokens in data loader, shu…
jmaczanpushed 1 commit to main • 9dac6dc…7d1a286 • on Jul 11, 2024
Fixing dimensionalities issues
jmaczanpushed 1 commit to main • d38d86d…9dac6dc • on Jul 11, 2024
Watched Karpathy's GPT video and identified issues with attention com…
jmaczanpushed 1 commit to main • ba9647e…d38d86d • on Jul 11, 2024
Use masked_fill for causal masking
jmaczanpushed 1 commit to main • 5b4937e…ba9647e • on Jul 6, 2024
Add dropout to attention head and verify tokenizer
jmaczanpushed 1 commit to main • 74dc9cb…5b4937e • on Jul 5, 2024
DataLoader suffle default to False
jmaczanpushed 1 commit to main • cab6a88…74dc9cb • on Jul 3, 2024
jmaczanpushed 2 commits to main • dd8e435…cab6a88 • on Jun 29, 2024
jmaczanpushed 2 commits to main • c1abc9d…dd8e435 • on Jun 24, 2024
jmaczanpushed 1 commit to main • b8e02a1…c1abc9d • on Jun 23, 2024
jmaczanpushed 1 commit to main • eaef1bc…b8e02a1 • on Jun 22, 2024
Show delimiter when running inference
jmaczanpushed 1 commit to main • 2c75855…eaef1bc • on Jun 22, 2024
Top k and temperature to prevent the same token repeating
jmaczanpushed 1 commit to main • 06583df…2c75855 • on Jun 22, 2024
jmaczanpushed 1 commit to main • 4254d6a…06583df • on Jun 22, 2024
Add inference code (not yet tested), improve default hyperparameters
jmaczanpushed 1 commit to main • 333117e…4254d6a • on Jun 22, 2024
jmaczanpushed 1 commit to main • bc0d873…333117e • on Jun 22, 2024
jmaczanpushed 1 commit to main • 6afc6c9…bc0d873 • on Jun 22, 2024
jmaczanpushed 1 commit to main • eda24a9…6afc6c9 • on Jun 22, 2024
Fixed dimensionality issues, seems to be ready for the training
jmaczanpushed 1 commit to main • 5c499a3…eda24a9 • on Jun 22, 2024
Working on training loop and debuggin index issue in embeddings
jmaczanpushed 1 commit to main • 63b69bb…5c499a3 • on Jun 22, 2024
jmaczanpushed 1 commit to main • 9e86f44…63b69bb • on Jun 21, 2024
Fixing statically found issues
jmaczanpushed 1 commit to main • f654fd4…9e86f44 • on Jun 21, 2024
Rewrite forward() and model structure for GPT and TransformerBlock
jmaczanpushed 1 commit to main • e9a146b…f654fd4 • on Jun 21, 2024
You can’t perform that action at this time.