Skip to content

Activity

Merge pull request #1 from michuhu/minor-fixes

Pull request merge
jmaczanpushed 9 commits to main • c558af9…18b1c98 • 
on Feb 1

Update README.md

jmaczanpushed 1 commit to main • 88358d5…c558af9 • 
on Nov 20, 2024

Update CITATION.cff

jmaczanpushed 1 commit to main • 9a02eb6…88358d5 • 
on Oct 9, 2024

Create CITATION.cff

jmaczanpushed 1 commit to main • 983384f…9a02eb6 • 
on Oct 9, 2024

Update README.md

jmaczanpushed 1 commit to main • f2871bc…983384f • 
on Oct 9, 2024

Output linear transformation weights initialization fix and heads ite…

jmaczanpushed 1 commit to main • 57f3e26…f2871bc • 
on Jul 14, 2024

Increase default hparams a bit

jmaczanpushed 1 commit to main • 990376a…57f3e26 • 
on Jul 14, 2024

Minor updates, trying to align my code with Karpathy's, maybe it'll i…

jmaczanpushed 1 commit to main • 7d1a286…990376a • 
on Jul 14, 2024

Gradient clipping, detect anomaly, use pad tokens in data loader, shu…

jmaczanpushed 1 commit to main • 9dac6dc…7d1a286 • 
on Jul 11, 2024

Fixing dimensionalities issues

jmaczanpushed 1 commit to main • d38d86d…9dac6dc • 
on Jul 11, 2024

Watched Karpathy's GPT video and identified issues with attention com…

jmaczanpushed 1 commit to main • ba9647e…d38d86d • 
on Jul 11, 2024

Use masked_fill for causal masking

jmaczanpushed 1 commit to main • 5b4937e…ba9647e • 
on Jul 6, 2024

Add dropout to attention head and verify tokenizer

jmaczanpushed 1 commit to main • 74dc9cb…5b4937e • 
on Jul 5, 2024

DataLoader suffle default to False

jmaczanpushed 1 commit to main • cab6a88…74dc9cb • 
on Jul 3, 2024

Code reformat

jmaczanpushed 2 commits to main • dd8e435…cab6a88 • 
on Jun 29, 2024

Minor device changes

jmaczanpushed 2 commits to main • c1abc9d…dd8e435 • 
on Jun 24, 2024

Update README.md

jmaczanpushed 1 commit to main • b8e02a1…c1abc9d • 
on Jun 23, 2024

Add train and run README

jmaczanpushed 1 commit to main • eaef1bc…b8e02a1 • 
on Jun 22, 2024

Show delimiter when running inference

jmaczanpushed 1 commit to main • 2c75855…eaef1bc • 
on Jun 22, 2024

Top k and temperature to prevent the same token repeating

jmaczanpushed 1 commit to main • 06583df…2c75855 • 
on Jun 22, 2024

Inference code

jmaczanpushed 1 commit to main • 4254d6a…06583df • 
on Jun 22, 2024

Add inference code (not yet tested), improve default hyperparameters

jmaczanpushed 1 commit to main • 333117e…4254d6a • 
on Jun 22, 2024

Use right device

jmaczanpushed 1 commit to main • bc0d873…333117e • 
on Jun 22, 2024

requirements.txt

jmaczanpushed 1 commit to main • 6afc6c9…bc0d873 • 
on Jun 22, 2024

Add checkpointing

jmaczanpushed 1 commit to main • eda24a9…6afc6c9 • 
on Jun 22, 2024

Fixed dimensionality issues, seems to be ready for the training

jmaczanpushed 1 commit to main • 5c499a3…eda24a9 • 
on Jun 22, 2024

Working on training loop and debuggin index issue in embeddings

jmaczanpushed 1 commit to main • 63b69bb…5c499a3 • 
on Jun 22, 2024

PE fix

jmaczanpushed 1 commit to main • 9e86f44…63b69bb • 
on Jun 21, 2024

Fixing statically found issues

jmaczanpushed 1 commit to main • f654fd4…9e86f44 • 
on Jun 21, 2024

Rewrite forward() and model structure for GPT and TransformerBlock

jmaczanpushed 1 commit to main • e9a146b…f654fd4 • 
on Jun 21, 2024