Release v0.12.0 · zurawiki/tiktoken-rs

Summary

This release backports OpenAI tiktoken 0.13.0 into tiktoken-rs. The main reason to upgrade is better alignment with upstream tokenization behavior, especially the upstream Rust core changes for large BPE pieces and error-aware encoding.

For most users who call the high-level model/token counting helpers, this should behave the same aside from the new Rust compiler requirement. Users who call lower-level CoreBPE encoding methods directly should review the breaking changes below.

What Changed

Backported the vendored OpenAI tiktoken Rust core from 0.9.0 to 0.13.0.
Added the upstream large-piece BPE merge path. Functionally, this improves behavior for very large or repetitive inputs that previously stressed the merge algorithm.
Changed CoreBPE::encode to return Result<(Vec<Rank>, usize), EncodeError>, matching upstream. Regex/tokenization failures can now be reported instead of being hidden behind infallible APIs.
Updated encode_as and count to return Result because they call encode.
Re-exported EncodeError so callers can handle encode failures directly.
Aligned the vendored core with Rust 2024 and raised the crate MSRV to Rust 1.85.
Synced model-to-tokenizer mappings with upstream tiktoken 0.13.0 while keeping local extra prefixes isolated.
Hardened asset downloads with SHA-256 checks and a repo-root-aware asset path.

Breaking Changes

If your code calls CoreBPE::encode, unwrap or propagate the result before using the tokens:

let allowed = bpe.special_tokens();
let (tokens, last_piece_token_len) = bpe.encode("hello <|endoftext|>", &allowed)?;

The generic helpers changed similarly:

let (tokens, last_piece_token_len) = bpe.encode_as::<usize>(text, &allowed)?;
let token_count = bpe.count(text, &allowed)?;

encode_ordinary, encode_ordinary_as, encode_with_special_tokens, and count_ordinary remain infallible.

Projects must now build with Rust 1.85 or newer.

Practical Impact

Applications processing long repeated text should see more robust tokenization behavior.
Code that only uses helpers like get_chat_completion_max_tokens, get_text_completion_max_tokens, bpe_for_model, or singleton tokenizer constructors should not need call-site changes.
Code using low-level CoreBPE::encode, encode_as, or count needs a small migration to handle Result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Summary

What Changed

Breaking Changes

Practical Impact

Links

Uh oh!