low-resource-languages

Star

Here are 50 public repositories matching this topic...

ndamulelonemakh / our-stopwords

Star

Auto-generated stopwords for South African Bantu Languages

nlp natural-language-processing dataset stopwords african-languages tshivenda low-resource-languages africanlp

Updated Jun 1, 2024
Python

dlsucomet / filwordnet-portal

Star

FilWordNet web portal — a language resource for Filipino and Philippine English built from text analysis network science and natural language processing

nlp natural-language-processing clustering community-detection network-science wordnet word-sense-induction community-detection-algorithm leiden low-resource-languages roberta synset multiplex-network diachronic-network-analysis

Updated Nov 8, 2023
Python

AJgthb2002 / Sentence-similarity-playground

Star

A web application to test sentence-similarity models of the top 10 Indian Languages

nlp web-application sentence-similarity sentence-embeddings indian-languages low-resource-languages streamlit sentence-bert

Updated Apr 28, 2023
Python

generalpurposelab / ede-data

Star

The Ede Python library automates the generations of instruction fine-tuning datasets in low-resource languages.

fine-tuning low-resource-languages llms

Updated Apr 4, 2024
Python

nicolay-r / RuSentNE-LLM-Benchmark

Star

This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊

sentiment-analysis leaderboard prompt openai gemma zero-shot mistral reasoning fine-tuning low-resource-languages transformers-library low-resource-nlp gpt4 llm llms chain-of-thought llama3 gpt4o

Updated Jun 30, 2024
Python

The Kinyarwanda and Kirundi Languages Toolkit (KKLTK) is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.

text-processing low-resource-languages text-preprocessing scientific-machine-learning low-resource-languages-toolkit

Updated Oct 27, 2020
Python

mdm-code / manx

Star

Fine-tune LLM for early Middle English lemmatization with data from LAEME.

nlp deep-learning parsing neural-network lemmatizer nlp-machine-learning lemmatization low-resource-languages middle-english low-resource-nlp low-resource-machine-learning

Updated Jan 25, 2024
Python

generalpurposelab / instruct-global

Star

Repo associated with the forthcoming paper 'Instruct-global: aligning language models to follow instructions in low-resource languages'. Instruct-global automates the process of generating instruction datasets in low-resource languages (LRLs).

fine-tuning low-resource-languages llms

Updated May 16, 2024
Python

ndamulelonemakh / zabantu-beta

Star

ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages

nlp zulu tshivenda low-resource-languages roberta sotho xlm-roberta tsonga

Updated Apr 15, 2024
Python

adoxography / SPieL

Star

Segmentation of Polysynthetic Languages

machine-learning natural-language-processing linguistics low-resource-languages

Updated Jun 17, 2024
Python

mmaguero / guarani-tweets

Star

Download guarani-dominant tweets

text-mining twitter-api clustering text-similarity scraping language-detection levenshtein-distance social-network-analysis word-lookup guarani low-resource-languages snscrape jopara

Updated May 11, 2021
Python

mmaguero / textcat-josa

Star

Train JOSA (Jopara Sentiment Analysis) corpus with traditional machine learning algorithms.

sentiment-analysis text-classification support-vector-machines sequence-classification text-categorization guarani unbalanced-data low-resource-languages jopara complement-naive-bayes

Updated May 11, 2021
Python

ogunlao / low_res_speech_project

Star

Accompanying code for research work on Weakly Supervised Learning of Speech features for Low resource languages

speech-to-text weakly-supervised-learning low-resource-languages

Updated Jun 21, 2021
Python

victoriapedlar / isizulu-text-generation

Star

Open-Ended Text Generation in isiZulu: Decoding Strategies for a Morphologically Rich Low-Resource Language

nlp deep-learning text-generation pytorch language-model low-resource-languages isizulu

Updated May 16, 2023
Python

kashubian-translator / pl-csb-model

Star

The following repository contains model training and BLEU calculation tools for a Polish-Kashubian translator.

translator machine-translation polish nlp-machine-learning low-resource-languages kashubian

Updated Jun 6, 2024
Python

fokhruli / CM-seti-anlysis

Star

Implementation for the paper titled, " Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding", IEEE Access, 2023

natural-language-processing sentiment-analysis low-resource-languages code-mixed

Updated Sep 7, 2023
Python

GGLAB-KU / turkish-plu

Star

Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"

deep-learning retrieval text-generation procedural classification language-model low-resource-languages procedural-text low-resource-nlp procedural-language-understanding

Updated Feb 4, 2024
Python

ljvmiranda921 / ud-tagalog-spacy

Star

Training a POS Tagger and Dependency Parser for a Low-Resource Language (Tagalog)

nlp machine-learning spacy tagalog low-resource-languages

Updated Apr 23, 2022
Python

uds-lsv / transfer-distant-transformer-african

Star

Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"

african-languages ner topic-classification low-resource low-resource-languages transformer-models

Updated Dec 16, 2021
Python

ToluClassics / LowResourceOCR

Star

This work is an adaptation of CNN+Transformer architecture to training text recognition models for Yorùbá & Igbo Languages

ocr transformers low-resource-languages

Updated Oct 14, 2022
Python

Improve this page

Add a description, image, and links to the low-resource-languages topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-resource-languages topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-resource-languages

Here are 50 public repositories matching this topic...

ndamulelonemakh / our-stopwords

dlsucomet / filwordnet-portal

AJgthb2002 / Sentence-similarity-playground

generalpurposelab / ede-data

nicolay-r / RuSentNE-LLM-Benchmark

Andrews2017 / kkltk

mdm-code / manx

generalpurposelab / instruct-global

ndamulelonemakh / zabantu-beta

adoxography / SPieL

mmaguero / guarani-tweets

mmaguero / textcat-josa

ogunlao / low_res_speech_project

victoriapedlar / isizulu-text-generation

kashubian-translator / pl-csb-model

fokhruli / CM-seti-anlysis

GGLAB-KU / turkish-plu

ljvmiranda921 / ud-tagalog-spacy

uds-lsv / transfer-distant-transformer-african

ToluClassics / LowResourceOCR

Improve this page

Add this topic to your repo