retro style tokenization for language models
-
Updated
May 30, 2024 - Python
retro style tokenization for language models
This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
It aims to write new sentences by learning character units sentences using RNN. As training data, a collection of Shakespeare's novels was used.
Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
An implementation of "Character-level Convolutional Networks for Text Classification" in Tensorflow. See https://arxiv.org/pdf/1509.01626.pdf.
Text Article generator using using Character level LSTM network.
Build a character level language model to generate new dinosaur names
Add a description, image, and links to the character-level-language-model topic page so that developers can more easily learn about it.
To associate your repository with the character-level-language-model topic, visit your repo's landing page and select "manage topics."