The EveryVoice TTS Toolkit - Text To Speech for your language
-
Updated
Jul 5, 2024 - Python
The EveryVoice TTS Toolkit - Text To Speech for your language
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊
NLP pipelines for Tagalog using spaCy
GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly
The following repository contains model training and BLEU calculation tools for a Polish-Kashubian translator.
Auto-generated stopwords for South African Bantu Languages
Repo associated with the forthcoming paper 'Instruct-global: aligning language models to follow instructions in low-resource languages'. Instruct-global automates the process of generating instruction datasets in low-resource languages (LRLs).
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
The Ede Python library automates the generations of instruction fine-tuning datasets in low-resource languages.
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Workflow for forced alignment between languages
Example dataset and prompt design of Korean Offensive language Machine Generation (K-OMG), published at IJCNLP-AACL 2023.
Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"
Fine-tune LLM for early Middle English lemmatization with data from LAEME.
Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan
FilWordNet web portal — a language resource for Filipino and Philippine English built from text analysis network science and natural language processing
Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.
To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."