Skip to content

Tokenizers for Language Models - Go API for HuggingFace Tokenizers

License

Notifications You must be signed in to change notification settings

gomlx/tokenizers

Repository files navigation

Tokenizers for Go

Under Construction

UNDER CONSTRUCTION

Not functional yet, but for Gemma/Gemini/T5 and other Google models, see https://github.com/eliben/go-sentencepiece/.

About

Tokenizers for Language Models - Go API for HuggingFace Tokenizers

Highlights

Important

TODO: nothing implemented yet.

  • Allow customization to various LLMs, exposing most of the functionality of the HuggingFace Tokenizers library.
  • Provide a from_pretrained API, that downloads parameters to various known models -- levaraging HuggingFace Hub

Installation

This library is a wrapper around the Rust implementation by HuggingFace, and it requires the compiled Rust code available as a libgomlx_tokenizers.a.

To make that easy, the project provides a prebuilt libgomlx_tokenizers.a in the git repository (for the popular platforms), so for many nothing is needed (except having CGO enabled -- for cross-compilation set CGO_ENABLED=1), and it can be simply included as any other Go library.

If you want to build the underlying Rust wrapper and dependencies yourselves for any reason (including maybe to add support for a different platform), it uses the Mage build system -- an improved Makefile-like that uses Go.

If you create a new rule for a different platform, please consider contributing it back 😄

Important

TODO

Thank You

Questions

Why fork and not collaborate with an already existing tokenizers project ?

I plan to revamp how the library is organized, its "ergonomics" to be more aligned with GoMLX APIs, and add documentation. I will also expand the functionality to match (as much as I'm able to do) HuggingFace's library. All this will completely break the API of the original repositories, and I felt too much to ask from the original authors.

About

Tokenizers for Language Models - Go API for HuggingFace Tokenizers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published