Skip to content

xTimeCrystal/MiniModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ› οΈ Setup & Training

1. Install Dependencies

First, install the required packages:

pip install torchao liger_kernel pyarrow tensorboard

πŸ’‘ Note: torchao and liger_kernel may require a recent version of PyTorch (β‰₯2.3) and a CUDA-enabled environment for optimal performance.

2. Prepare Data

  1. Download all files from this repository.
  2. Place them in a single working directory.
  3. Inside this directory, create a subfolder named 128.
  4. Download the training data (Parquet files) into the 128/ folder:
    πŸ”— TinyCorpus-v2

3. File Structure

Your directory should look like this:

your-training-folder/
β”œβ”€β”€ trainGPT-token.py
β”œβ”€β”€ fast_self_attn_model.py
β”œβ”€β”€ data_utils.py
β”œβ”€β”€ dev_optim.py
└── 128/
    β”œβ”€β”€ tinycorpus-000-of-128.parquet
    β”œβ”€β”€ tinycorpus-001-of-128.parquet
    └── ...                            # all shard files

4. Start Training

Run the training script from inside your-training-folder:

python trainGPT-token.py

This will replicate MiniModel with 12 layers, the original model used 24 layers. Please change 'layers': 24 in trainGPT-token.py if you wish to replicate the original model.

By default, the script logs training loss and other metrics to a directory called runs/ using PyTorch’s SummaryWriter.

5. Monitor Training with TensorBoard

While training is running (or after it finishes), launch TensorBoard to visualize the loss curve:

tensorboard --logdir=runs

Then open your browser and go to:
πŸ‘‰ http://localhost:6006

You’ll see a real-time plots of the training loss (refreshes every 30s).

6. Troubleshooting Out-of-Memory (OOM) Errors

If you encounter memory issues, open trainGPT-token.py and adjust one or both of the following:

  • Reduce model size:
    'input_dims': 512   # default 768
  • Reduce batch size:
    batch_size = 32     # default 64

Smaller values will lower VRAM usage at the cost of training speed or stability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages