Skip to content

jolby/cl-tokenizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cl-tokenizers

Tokenizers used for encoding/embedding and LLMs.

Status

This project is just getting started. Currently only the OpenAI Tiktoken BPE encoding is supported. It would be great to get some more implemented!

Installation

cl-tokenizers is not in quicklisp. It will need to be installed in the local-projects quicklisp directory:

> cd /$USER_HOME/quicklisp/local-projects/
> git clone https://github.com/jolby/cl-tokenizers.git

Basic Usage at the REPL

CL-USER>(ql:quickload :tokenizers)
> (:TOKENIZERS)
CL-USER>(defparameter *cl100k-encoder* (tokenizers:get-encoder :tiktoken "cl100k_base"))
> *CL100K-ENCODER*
CL-USER>(tokenizers:encode *cl100k-encoder* "hello world")
> #(15339 1917)
CL-USER>(tokenizers:decode *cl100k-encoder* #(15339 1917))
> "hello world"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published