Skip to content

toprakdeviren/webgpu-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webgpu-llm

Train a BPE tokenizer and a small transformer LLM — entirely in the browser, on WebGPU. Forward pass, backward pass, and AdamW all run on your GPU. No server, no cloud, no Python — just hand-written WGSL and a browser tab.

This is the engine behind llm.istanbul, extracted as a standalone, dependency-free, browser-first package.


Why

Most ML stacks hide the GPU behind layers of framework. This one doesn't: every kernel — embedding, RMSNorm, tiled matmul, RoPE, attention (online softmax + GQA + KV cache), SwiGLU, cross-entropy, the full backward pass, and AdamW — is a small, readable WGSL compute shader. The whole training loop fits in your head.

Every kernel is documented line by line at llm.istanbul/learn/en.

Requirements

  • A browser with WebGPU (Chrome / Edge 113+, recent Firefox). navigator.gpu is required.
  • ESM only. Zero runtime dependencies. Shaders are inlined at build time — no runtime fetch, so it bundles cleanly (Vite / esbuild / webpack / Rollup).

Install

npm install webgpu-llm

Tokenizer — train + tokenize

import { BPEEngine, BPETrainer, TrieTokenizer } from 'webgpu-llm/bpe';

const engine = await new BPEEngine().init();         // requires navigator.gpu
const trainer = new BPETrainer(engine);

const text = '… your corpus …';
await trainer.train(new TextEncoder().encode(text), { targetVocabSize: 8192 });

const vocabJson = trainer.exportVocab();             // save this — it pairs with the model

// tokenize with the trained vocab
const tok = TrieTokenizer.fromVocab(engine, trainer.vocab);
const ids = await tok.encodeBytes(new TextEncoder().encode('merhaba dünya'));
const back = tok.decode(ids);                        // Uint8Array → TextDecoder to read
tok.destroy();

Model — train a transformer

import { LLMEngine, Model } from 'webgpu-llm/llm';

const engine = await new LLMEngine().init();

const model = new Model(engine, {
  d_model: 512, n_heads: 8, n_kv_heads: 2, n_layers: 8,
  d_ff_mult: 4, activation: 'swiglu', vocab_size: 8192, seq_len: 512,
});
model.alloc();

// forward → backward → step (AdamW), looped over your tokenized corpus.
// The full loop, every parameter, and a verified recipe are walked through here:
//   https://llm.istanbul/learn/en/guide

The training loop (sampling windows, gradient accumulation, LR schedule, validation, checkpointing) is involved on purpose — see the How to Use guide above for an end-to-end, copy-pasteable recipe.

Optional: Unicode-aware pre-tokenizer

For GPT-4-style, Turkish-aware pre-tokenization (better than byte-level), pass a pre-tokenizer into train():

import { PreTokenizer } from 'webgpu-llm/pretokenizer';   // ships a WASM Unicode core

Subpath exports

Import What
webgpu-llm/bpe BPEEngine, BPETrainer, TrieTokenizer, Vocab, trie helpers
webgpu-llm/llm LLMEngine, Model, dispatch helpers
webgpu-llm/pretokenizer WASM Unicode pre-tokenizer (optional)
webgpu-llm/worker Web Worker entry for off-main-thread BPE training (browser only)
webgpu-llm everything, flat (subpaths preferred for tree-shaking)

Docs

Full, line-by-line walkthrough of every kernel and the end-to-end usage guide: llm.istanbul/learn/en (also in Turkish: /learn/tr).

License

MIT © Uğur Toprakdeviren

About

Train transformer LLM, entirely in the browser, on WebGPU.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors