tokenizers

Here are 10 public repositories matching this topic...

jshuadvd / LongRoPE

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper

nlp machine-learning natural-language-processing ai deep-learning transformers artificial-intelligence gpt language-model natural-language-inference natural tokenization natural-language-understanding attention-is-all-you-need attention-mechanisms transformer-architecture natural-language-procressing tokenizers llm

Updated Jul 20, 2024
Python

Prismadic / magnet

Star

the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly

Updated Jul 19, 2024
Python

sayakpaul / count-tokens-hf-datasets

Star

This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.

transformers dataflow apache-beam tokenizers hf-datasets unigram-tokenization

Updated Oct 20, 2022
Python

megagonlabs / ginza-transformers

Star

Use custom tokenizers in spacy-transformers

nlp natural-language-processing transformers spacy ginza spacy-transformers tokenizers sudachitra

Updated Aug 9, 2022
Python

Hugging-Face-Supporter / tftokenizers

Star

Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels

nlp natural-language-processing tensorflow tokenizer transformers bert tensorflow-hub tokenizers sentencepie

Updated Mar 29, 2022
Python

Beomi / megatronlm_dataset_autotokenizer

Star

Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.

transformers gpt-neox tokenizers megatron-lm

Updated Nov 16, 2023
Python

1kkiRen / Tokenizer-Changer

Star

Python script for manipulating the existing tokenizer.

tokens delete tokenizers

Updated Aug 27, 2024
Python

symanto-research / merge-tokenizers

Star

Package to align tokens from different tokenizations.

distance transformers tokens tokenizers

Updated Mar 25, 2024
Python

willsaliba / LDR_Transformer

Star

ML Model designed to learn compositional structure of LEGO assemblies

machine-learning transformer tokenizers

Updated Jun 12, 2024
Python

helena-intel / test-prompt-generator

Star

Create prompts with a given token length for testing LLMs and other transformers text models.

nlp benchmarking transformers tokenizers llm llm-inference

Updated Apr 18, 2024
Python

Improve this page

Add a description, image, and links to the tokenizers topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenizers topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizers

Here are 10 public repositories matching this topic...

jshuadvd / LongRoPE

Prismadic / magnet

sayakpaul / count-tokens-hf-datasets

megagonlabs / ginza-transformers

Hugging-Face-Supporter / tftokenizers

Beomi / megatronlm_dataset_autotokenizer

1kkiRen / Tokenizer-Changer

symanto-research / merge-tokenizers

willsaliba / LDR_Transformer

helena-intel / test-prompt-generator

Improve this page

Add this topic to your repo