Auto-generated stopwords for South African Bantu Languages
-
Updated
Jun 1, 2024 - Python
Auto-generated stopwords for South African Bantu Languages
Investigating transfer learning in low-resourced languages, specifically in a named entity recognition (NER) task (IJCNLP-AACL 2023). http://arxiv.org/abs/2309.05311
A web application to test sentence-similarity models of the top 10 Indian Languages
FilWordNet web portal — a language resource for Filipino and Philippine English built from text analysis network science and natural language processing
[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly
Example dataset and prompt design of Korean Offensive language Machine Generation (K-OMG), published at IJCNLP-AACL 2023.
The following repository contains model training and BLEU calculation tools for a Polish-Kashubian translator.
Implementation for the paper titled, " Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding", IEEE Access, 2023
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
Finetuning BERT models on a powerset of different linguistic domains
Fine-tune LLM for early Middle English lemmatization with data from LAEME.
Repo associated with the forthcoming paper 'Instruct-global: aligning language models to follow instructions in low-resource languages'. Instruct-global automates the process of generating instruction datasets in low-resource languages (LRLs).
GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
Official implementation of "CONCRETE: Improving Cross-lingual Fact Checking with Cross-lingual Retrieval" (COLING'22)
Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"
Training a POS Tagger and Dependency Parser for a Low-Resource Language (Tagalog)
The Ede Python library automates the generations of instruction fine-tuning datasets in low-resource languages.
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
Segmentation of Polysynthetic Languages
Download guarani-dominant tweets
Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.
To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."