Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization. Compatible with SentencePiece, Tokenizers, Tiktoken and more.
-
Updated
Aug 4, 2024 - Rust
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization. Compatible with SentencePiece, Tokenizers, Tiktoken and more.
Add a description, image, and links to the unigram topic page so that developers can more easily learn about it.
To associate your repository with the unigram topic, visit your repo's landing page and select "manage topics."