Building applications with LLMs through composability, in Kotlin
-
Updated
Oct 14, 2024 - Kotlin
Building applications with LLMs through composability, in Kotlin
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
This repository is part of a course on Elasticsearch in Python. It includes notebooks that demonstrate its usage, along with a YouTube series to guide you through the material.
Develop DL models using Pytorch and Hugging Face
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Python script for manipulating the existing tokenizer.
Use custom tokenizers in spacy-transformers
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
Package to align tokens from different tokenizations.
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
[Unofficial] Simple .NET wrapper of HuggingFace Tokenizers library
Small library that provides functions to tokenize a string into an array of words with or without punctuation
Visualize some important concepts related to LLM architectures.
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
A graphical user interface for the Elasticsearch Analyze API
A high-performance tokenizer built to rival GPT-4, trained on the C4 dataset.
Byte Pair Encoding (BPE) tokenizer tailored for the Turkish language
Add a description, image, and links to the tokenizers topic page so that developers can more easily learn about it.
To associate your repository with the tokenizers topic, visit your repo's landing page and select "manage topics."