First, install the required packages:
pip install torchao liger_kernel pyarrow tensorboard
π‘ Note:
torchao
andliger_kernel
may require a recent version of PyTorch (β₯2.3) and a CUDA-enabled environment for optimal performance.
- Download all files from this repository.
- Place them in a single working directory.
- Inside this directory, create a subfolder named
128
. - Download the training data (Parquet files) into the
128/
folder:
π TinyCorpus-v2
Your directory should look like this:
your-training-folder/
βββ trainGPT-token.py
βββ fast_self_attn_model.py
βββ data_utils.py
βββ dev_optim.py
βββ 128/
βββ tinycorpus-000-of-128.parquet
βββ tinycorpus-001-of-128.parquet
βββ ... # all shard files
Run the training script from inside your-training-folder:
python trainGPT-token.py
This will replicate MiniModel with 12 layers, the original model used 24 layers. Please change 'layers': 24
in trainGPT-token.py
if you wish to replicate the original model.
By default, the script logs training loss and other metrics to a directory called
runs/
using PyTorchβsSummaryWriter
.
While training is running (or after it finishes), launch TensorBoard to visualize the loss curve:
tensorboard --logdir=runs
Then open your browser and go to:
π http://localhost:6006
Youβll see a real-time plots of the training loss (refreshes every 30s).
If you encounter memory issues, open trainGPT-token.py
and adjust one or both of the following:
- Reduce model size:
'input_dims': 512 # default 768
- Reduce batch size:
batch_size = 32 # default 64
Smaller values will lower VRAM usage at the cost of training speed or stability.