tutorial: LLM basics from scratch provide step by step explanation.
Download from Google drive: https://drive.google.com/drive/folders/1IaD_SIIB-K3Sij_-JjWoPy_UrWqQRdjx
download the file named openwebtext.tar.xz
from google drive link and extract all the .xz
files in folder openWebTextCorpus
.
The files should look like:
after you have downloaded and extracted files above, in terminal:
python convert_data.py
The program automatically convert all the .xz
files you have extracted in folder openWebTextCorpus
and put the converted .txt
files in folder data
. Since we are using [neetbox][neetbox] for monitoring, open localhost:20202 (neetbox's default port) in your browser and you can check the progresses:
python train.py --config gptv1_s.toml
Since we are using neetbox for monitoring, open localhost:20202 (neetbox's default port) in your browser and you can check the progresses:
python inference.py --config gptv1_s.toml
Open localhost:20202 (neetbox's default port) in your browser and feed text to your model via action button.
more information see also LLM basics from scratch