Train a BPE tokenizer and a small transformer LLM — entirely in the browser, on WebGPU. Forward pass, backward pass, and AdamW all run on your GPU. No server, no cloud, no Python — just hand-written WGSL and a browser tab.
This is the engine behind llm.istanbul, extracted as a standalone, dependency-free, browser-first package.
Most ML stacks hide the GPU behind layers of framework. This one doesn't: every kernel — embedding, RMSNorm, tiled matmul, RoPE, attention (online softmax + GQA + KV cache), SwiGLU, cross-entropy, the full backward pass, and AdamW — is a small, readable WGSL compute shader. The whole training loop fits in your head.
Every kernel is documented line by line at llm.istanbul/learn/en.
- A browser with WebGPU (Chrome / Edge 113+, recent Firefox).
navigator.gpuis required. - ESM only. Zero runtime dependencies. Shaders are inlined at build time — no runtime
fetch, so it bundles cleanly (Vite / esbuild / webpack / Rollup).
npm install webgpu-llmimport { BPEEngine, BPETrainer, TrieTokenizer } from 'webgpu-llm/bpe';
const engine = await new BPEEngine().init(); // requires navigator.gpu
const trainer = new BPETrainer(engine);
const text = '… your corpus …';
await trainer.train(new TextEncoder().encode(text), { targetVocabSize: 8192 });
const vocabJson = trainer.exportVocab(); // save this — it pairs with the model
// tokenize with the trained vocab
const tok = TrieTokenizer.fromVocab(engine, trainer.vocab);
const ids = await tok.encodeBytes(new TextEncoder().encode('merhaba dünya'));
const back = tok.decode(ids); // Uint8Array → TextDecoder to read
tok.destroy();import { LLMEngine, Model } from 'webgpu-llm/llm';
const engine = await new LLMEngine().init();
const model = new Model(engine, {
d_model: 512, n_heads: 8, n_kv_heads: 2, n_layers: 8,
d_ff_mult: 4, activation: 'swiglu', vocab_size: 8192, seq_len: 512,
});
model.alloc();
// forward → backward → step (AdamW), looped over your tokenized corpus.
// The full loop, every parameter, and a verified recipe are walked through here:
// https://llm.istanbul/learn/en/guideThe training loop (sampling windows, gradient accumulation, LR schedule, validation, checkpointing) is involved on purpose — see the How to Use guide above for an end-to-end, copy-pasteable recipe.
For GPT-4-style, Turkish-aware pre-tokenization (better than byte-level), pass a pre-tokenizer
into train():
import { PreTokenizer } from 'webgpu-llm/pretokenizer'; // ships a WASM Unicode core| Import | What |
|---|---|
webgpu-llm/bpe |
BPEEngine, BPETrainer, TrieTokenizer, Vocab, trie helpers |
webgpu-llm/llm |
LLMEngine, Model, dispatch helpers |
webgpu-llm/pretokenizer |
WASM Unicode pre-tokenizer (optional) |
webgpu-llm/worker |
Web Worker entry for off-main-thread BPE training (browser only) |
webgpu-llm |
everything, flat (subpaths preferred for tree-shaking) |
Full, line-by-line walkthrough of every kernel and the end-to-end usage guide:
llm.istanbul/learn/en (also in Turkish: /learn/tr).
MIT © Uğur Toprakdeviren