Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
-
Updated
Jul 20, 2024 - Python
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Use custom tokenizers in spacy-transformers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Python script for manipulating the existing tokenizer.
Package to align tokens from different tokenizations.
ML Model designed to learn compositional structure of LEGO assemblies
Create prompts with a given token length for testing LLMs and other transformers text models.
Add a description, image, and links to the tokenizers topic page so that developers can more easily learn about it.
To associate your repository with the tokenizers topic, visit your repo's landing page and select "manage topics."