Activity

Merge pull request #1 from michuhu/minor-fixes

Pull request merge

jmaczanpushed 9 commits to main • c558af9…18b1c98 •

on Feb 1

Update README.md

jmaczanpushed 1 commit to main • 88358d5…c558af9 •

on Nov 20, 2024

Update CITATION.cff

jmaczanpushed 1 commit to main • 9a02eb6…88358d5 •

on Oct 9, 2024

Create CITATION.cff

jmaczanpushed 1 commit to main • 983384f…9a02eb6 •

on Oct 9, 2024

Update README.md

jmaczanpushed 1 commit to main • f2871bc…983384f •

on Oct 9, 2024

Output linear transformation weights initialization fix and heads ite…

jmaczanpushed 1 commit to main • 57f3e26…f2871bc •

on Jul 14, 2024

Increase default hparams a bit

jmaczanpushed 1 commit to main • 990376a…57f3e26 •

on Jul 14, 2024

Minor updates, trying to align my code with Karpathy's, maybe it'll i…

jmaczanpushed 1 commit to main • 7d1a286…990376a •

on Jul 14, 2024

Gradient clipping, detect anomaly, use pad tokens in data loader, shu…

jmaczanpushed 1 commit to main • 9dac6dc…7d1a286 •

on Jul 11, 2024

Fixing dimensionalities issues

jmaczanpushed 1 commit to main • d38d86d…9dac6dc •

on Jul 11, 2024

Watched Karpathy's GPT video and identified issues with attention com…

jmaczanpushed 1 commit to main • ba9647e…d38d86d •

on Jul 11, 2024

Use masked_fill for causal masking

jmaczanpushed 1 commit to main • 5b4937e…ba9647e •

on Jul 6, 2024

Add dropout to attention head and verify tokenizer

jmaczanpushed 1 commit to main • 74dc9cb…5b4937e •

on Jul 5, 2024

DataLoader suffle default to False

jmaczanpushed 1 commit to main • cab6a88…74dc9cb •

on Jul 3, 2024

Code reformat

jmaczanpushed 2 commits to main • dd8e435…cab6a88 •

on Jun 29, 2024

Minor device changes

jmaczanpushed 2 commits to main • c1abc9d…dd8e435 •

on Jun 24, 2024

Update README.md

jmaczanpushed 1 commit to main • b8e02a1…c1abc9d •

on Jun 23, 2024

Add train and run README

jmaczanpushed 1 commit to main • eaef1bc…b8e02a1 •

on Jun 22, 2024

Show delimiter when running inference

jmaczanpushed 1 commit to main • 2c75855…eaef1bc •

on Jun 22, 2024

Top k and temperature to prevent the same token repeating

jmaczanpushed 1 commit to main • 06583df…2c75855 •

on Jun 22, 2024

Inference code

jmaczanpushed 1 commit to main • 4254d6a…06583df •

on Jun 22, 2024

Add inference code (not yet tested), improve default hyperparameters

jmaczanpushed 1 commit to main • 333117e…4254d6a •

on Jun 22, 2024

Use right device

jmaczanpushed 1 commit to main • bc0d873…333117e •

on Jun 22, 2024

requirements.txt

jmaczanpushed 1 commit to main • 6afc6c9…bc0d873 •

on Jun 22, 2024

Add checkpointing

jmaczanpushed 1 commit to main • eda24a9…6afc6c9 •

on Jun 22, 2024

Fixed dimensionality issues, seems to be ready for the training

jmaczanpushed 1 commit to main • 5c499a3…eda24a9 •

on Jun 22, 2024

Working on training loop and debuggin index issue in embeddings

jmaczanpushed 1 commit to main • 63b69bb…5c499a3 •

on Jun 22, 2024

PE fix

jmaczanpushed 1 commit to main • 9e86f44…63b69bb •

on Jun 21, 2024

Fixing statically found issues

jmaczanpushed 1 commit to main • f654fd4…9e86f44 •

on Jun 21, 2024

Rewrite forward() and model structure for GPT and TransformerBlock

jmaczanpushed 1 commit to main • e9a146b…f654fd4 •

on Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull request #1 from michuhu/minor-fixes

Update README.md

Update CITATION.cff

Create CITATION.cff

Update README.md

Output linear transformation weights initialization fix and heads ite…

Increase default hparams a bit

Minor updates, trying to align my code with Karpathy's, maybe it'll i…

Gradient clipping, detect anomaly, use pad tokens in data loader, shu…

Fixing dimensionalities issues

Watched Karpathy's GPT video and identified issues with attention com…

Use masked_fill for causal masking

Add dropout to attention head and verify tokenizer

DataLoader suffle default to False

Code reformat

Minor device changes

Update README.md

Add train and run README

Show delimiter when running inference

Top k and temperature to prevent the same token repeating

Inference code

Add inference code (not yet tested), improve default hyperparameters

Use right device

requirements.txt

Add checkpointing

Fixed dimensionality issues, seems to be ready for the training

Working on training loop and debuggin index issue in embeddings

PE fix

Fixing statically found issues

Rewrite forward() and model structure for GPT and TransformerBlock