Auto-generated stopwords for South African Bantu Languages
-
Updated
Jun 1, 2024 - Python
Auto-generated stopwords for South African Bantu Languages
FilWordNet web portal — a language resource for Filipino and Philippine English built from text analysis network science and natural language processing
A web application to test sentence-similarity models of the top 10 Indian Languages
The Ede Python library automates the generations of instruction fine-tuning datasets in low-resource languages.
This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊
The Kinyarwanda and Kirundi Languages Toolkit (KKLTK) is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.
Fine-tune LLM for early Middle English lemmatization with data from LAEME.
Repo associated with the forthcoming paper 'Instruct-global: aligning language models to follow instructions in low-resource languages'. Instruct-global automates the process of generating instruction datasets in low-resource languages (LRLs).
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
Segmentation of Polysynthetic Languages
Download guarani-dominant tweets
Train JOSA (Jopara Sentiment Analysis) corpus with traditional machine learning algorithms.
Accompanying code for research work on Weakly Supervised Learning of Speech features for Low resource languages
Open-Ended Text Generation in isiZulu: Decoding Strategies for a Morphologically Rich Low-Resource Language
The following repository contains model training and BLEU calculation tools for a Polish-Kashubian translator.
Implementation for the paper titled, " Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding", IEEE Access, 2023
Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"
Training a POS Tagger and Dependency Parser for a Low-Resource Language (Tagalog)
Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"
This work is an adaptation of CNN+Transformer architecture to training text recognition models for Yorùbá & Igbo Languages
Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.
To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."