Fast, Java-native BPE Tokenizer for LLMs. Optimized for Hinglish code-mixed text and source-code (camelCase/snake_case). Native JVM alternative to Tiktoken.
java nlp tokenizer java-library tokenization bpe byte-pair-encoding hinglish llm-tools bpe-tokenizer generic-tokenizer
-
Updated
May 21, 2026 - Java