Learning BPE embeddings by first learning a segmentation model and then training word2vec
-
Updated
Dec 18, 2022 - Python
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Byte Pair Encoding (BPE)
A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.
Add a description, image, and links to the wordpiece topic page so that developers can more easily learn about it.
To associate your repository with the wordpiece topic, visit your repo's landing page and select "manage topics."