bpe

This repository provides a clear, educational implementation of Byte Pair Encoding (BPE) tokenization in plain Python. The focus is on algorithmic understanding, not raw performance.

algorithms algorithms-and-data-structures bpe

Updated Aug 28, 2024
Python

teleprint-me / byte-pair

Sponsor

Star

Byte Pair Encoding (BPE) Tokenization for Natural Language Processing

nlp tokenizer bpe

Updated May 6, 2024
Python

vatsalsaglani / BytePairEncoding

Star

A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.

nlp natural-language-processing tokenizer vocabulary nlp-library vocabulary-builder natural-language-understanding subword-units bpe bytepairencoding subwordtokenization subwordtokens

Updated May 21, 2020
Python

KOLANICH-libs / Bin2Text.py

Star

An extremily simple and restricted tool/lib converting binary data into text that can be processed with unsuperwised character-level natural language processing tools/libs

nlp binary unsupervised-learning bpe

Updated Oct 13, 2023
Python

yash-srivastava19 / sec_bpe

Sponsor

Star

A modified, secure version of BPE algorithm

ai tokenization karpathy bpe

Updated Mar 29, 2024
Python

viviaxenov / text_to_image_with_transformer

Star

An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.

machine-learning transformer neural-networks text-to-image bpe vqvae

Updated Jun 8, 2021
Python

Improve this page

Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpe

Here are 29 public repositories matching this topic...

rsennrich / subword-nmt

soaxelbrooke / python-bpe

akretion / nfelib

gautierdag / bpeasy

jiesutd / SubwordEncoding-CWS

zouharvi / tokenization-scorer

stephantul / piecelearn

OctopusMind / BBPE

cooelf / subMrc

DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python

jmaczan / bpe-tokenizer

isi-nlp / nlcodec

SeonbeomKim / Python-Byte_Pair_Encoding

howdymic / tiktoken-server

marta1994 / efficient_bpe_explanation

teleprint-me / byte-pair

vatsalsaglani / BytePairEncoding

KOLANICH-libs / Bin2Text.py

yash-srivastava19 / sec_bpe

viviaxenov / text_to_image_with_transformer

Improve this page

Add this topic to your repo