A simple GPT-based LLM implemented mainly in Rust. CUDA work happens through cust, cublas, and cuda_std, while bespoke CUDA C++ kernels power the heavy lifting.
C++ is used for the CUDA kernels.
Mostly, this was done for learning purposes to explore LLMs. The resulting model is usable, but it wasnt benchmarked and isnt anything special. You can check out the model weights here
The project targets the open-phi/textbooks corpus.
The repository now ships with a lightweight downloader that speaks directly to the Hugging Face datasets-server API and emits newline-delimited UTF-8:
cargo run --release -- download ./textbooks.txtThe binary performs momentum SGD across the entire transformer stack—token and position embeddings, every transformer block, the final layer norm, and the language-model head—using analytic CUDA-backed gradients. Provide the corpus path and optionally the number of epochs. Pass --save <dir> to export the trained weights:
cargo run --release -- ./textbooks.txt 3 --save ./checkpoints/baselineWhen --save is set the binary ensures the directory exists and writes two files compatible with Hugging Face uploads:
config.json– serializedGPTConfigmodel.safetensors– parameter tensors in the safetensors format