gpu-bpe

GPU-accelerated Byte Pair Encoding that runs entirely in the browser. Training and inference happen on WebGPU compute shaders — no server, no Python, no CUDA.

Demo: decoder.run/bpe

What it does

Drop a text file (or an entire folder) into the browser and train a BPE tokenizer on the GPU. The trained model can then encode arbitrary text in real time, also on the GPU.

The full pipeline:

Pre-tokenize — WASM-based Unicode 17.0 word boundary detection (GPT-4 style rules: contractions, space-prefix model, digit grouping)
Train — Batched merge loop on WebGPU compute shaders (128 merges per GPU roundtrip)
Compile — Flatten the merge table into a binary trie (8-byte nodes, 4-byte edges)
Tokenize — Chunked trie walk on the GPU with shared-memory root edge caching

Architecture

index.html ─── app.js
                 ├── bpe/engine.js          WebGPU device + pipeline compiler
                 ├── bpe/trainer.js         Training loop (GPU)
                 ├── bpe/tokenizer.js       Trie tokenizer (GPU)
                 ├── bpe/bpe.wgsl           All 25 compute kernels
                 ├── wasm/decoder.mjs       Unicode 17.0 API (Decoder WASM)
                 ├── wasm/pre_tokenizer.mjs Word boundary detection
                 └── ui/                    File handling, training UI, encoder

GPU Kernels (bpe.wgsl)

25 compute kernels in a single WGSL file, split at load time into per-kernel shader modules:

Stage	Kernels	Notes
Word boundaries	`bpe_word_boundary`	Byte-level heuristic fallback (normally bypassed by WASM pre-tokenizer)
Pair counting	`bpe_pair_count`, `bpe_pair_count_b`	Two-level open-addressing hash table (prime size 2,097,143)
Max-pair reduction	`bpe_find_max_pair`, `bpe_find_max_pair_final`	Two-pass parallel reduction
Merge	`bpe_merge`, `bpe_merge_b`, `bpe_setup_merge`	In-place symbol rewriting with word boundary preservation
Stream compaction	`bpe_prefix_sum_`, `bpe_compact_`, `bpe_fill_valid_*`	Blelloch prefix sum for gap removal
Batch control	`bpe_update_count`, `check_early_stop`, `vocab_merge`	GPU-driven iteration state
Tokenization	`trie_tokenizer_chunked`, `trie_tokenizer_compact`	Shared-memory trie walk

Pre-tokenization (WASM)

The WASM layer (Decoder) provides full Unicode 17.0 property tables. The PreTokenizer classifies codepoints into character classes (letter, digit, whitespace, punctuation, symbol, newline) and applies GPT-4 style boundary rules at the codepoint level — before the byte stream reaches the GPU. This solves the multi-byte punctuation merging problem where byte-level heuristics cannot distinguish continuation bytes of letters from continuation bytes of punctuation.

Usage

Development

Serve the project root with any static file server:

# Python
python3 -m http.server 8080

# Node
npx -y serve .

Open http://localhost:8080 in a WebGPU-capable browser (Chrome 113+, Edge 113+, or Firefox Nightly with dom.webgpu.enabled).

Training

Drop text files or select a folder
Pick a vocabulary size (1K to 64K)
Click Train

The log panel shows real-time progress: merge count, merges/sec, and ETA.

Encoding

After training, switch to the Tokenizer tab. Type or paste text to see the token breakdown with IDs.

Importing a vocabulary

You can also load a pre-trained vocabulary (JSON merge list) via the "Load Vocab" button without retraining.

No dependencies. No build step required.

Browser Requirements

WebGPU support (Chrome 113+, Edge 113+)
maxStorageBufferBindingSize >= 512 MB
maxBufferSize >= 512 MB

License

MIT

Author

Uğur Toprakdeviren

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
fonts		fonts
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpu-bpe

What it does

Architecture

GPU Kernels (bpe.wgsl)

Pre-tokenization (WASM)

Usage

Development

Training

Encoding

Importing a vocabulary

Browser Requirements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpu-bpe

What it does

Architecture

GPU Kernels (bpe.wgsl)

Pre-tokenization (WASM)

Usage

Development

Training

Encoding

Importing a vocabulary

Browser Requirements

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages