Build software better, together

chschroeder / self-training-for-sample-efficient-active-learning

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)

active-learning low-resource-nlp llms active-learning-in-nlp

Updated Nov 2, 2024
Python

cisnlp / GlotLID

Star

Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp glotcc glotlid

Updated Oct 30, 2024
Python

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

machine-translation neural-machine-translation parallel-corpus parallel-corpora bangla-nlp low-resource-languages bangla-machine-translation bangla-dataset-machine-translation emnlp-2020 low-resource-nlp low-resource-machine-translation

Updated Oct 23, 2024
Python

adbar / simplemma

Star

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

nlp tokenizer language-detection wordlist lemmatizer morphological-analysis lemmatiser tokenization lemmatization corpus-tools language-identification low-resource-nlp

Updated Oct 9, 2024
Python

nicolay-r / RuSentNE-LLM-Benchmark

Star

This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊

sentiment-analysis leaderboard prompt openai gemma zero-shot mistral reasoning fine-tuning low-resource-languages transformers-library low-resource-nlp gpt4 llm llms chain-of-thought llama3 gpt4o

Updated Oct 8, 2024
Python

galax19ksh / Advancements-in-Manipuri-NLP

Star

A comprehensive overview of research regarding Natural Language Processing (NLP) of Manipuri language.

nlp manipuri low-resource-nlp meiteilon manipurinlp

Updated Sep 26, 2024

KennethEnevoldsen / scandinavian-embedding-benchmark

Star

A Scandinavian Benchmark for sentence embeddings

nlp benchmark natural-language-processing low-resource-nlp scandinavian

Updated Oct 2, 2024
Python

ljvmiranda921 / calamanCy

Star

NLP pipelines for Tagalog using spaCy

nlp machine-learning natural-language-processing spacy computational-linguistics ner low-resource-languages low-resource-nlp

Updated Aug 12, 2024
Python

devrimcavusoglu / nonwestlit

Star

NONWESTLIT Project Codebase

multilingual dataset low-resource-languages low-resource-nlp

Updated Jul 23, 2024
Python

Lhtie / Bio-Domain-Transfer

Star

Implementation of NAACL 2024 main conference paper: Named Entity Recognition Under Domain Shift via Metric Learning for Life Science

chemical pytorch information-extraction named-entity-recognition nltk biomedical knowledge-transfer few-shot contrastive-learning low-resource-nlp doamin-adaptation transformers-bert

Updated Jun 19, 2024
Python

pnborchert / MultiRep

Star

Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning. NAACL 2024.

information-extraction relation-extraction few-shot fewrel contrastive-learning low-resource-nlp

Updated Jun 18, 2024
Python

luciusssss / mc2_corpus

Star

[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

multilingual natural-language-processing corpus mongolian tibetan tibetan-nlp uyghur kazakh low-resource-languages low-resource-nlp

Updated Jun 15, 2024
Python

luciusssss / ZhuangBench

Star

[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly

low-resource-languages zhuang low-resource-nlp large-language-models llm

Updated Jun 12, 2024
Python

kasunw22 / sinhala-word-embedding-alignment

Star

English-Sinhala multilingual word embedding alignment resources

sinhala procrustes-alignment english-sinhala procrustes-analysis labse low-resource-nlp bilingual-lexicon-induction word-embedding-alignment rcsls-alignment supervised-embedding-alignment unsupervised-embedding-alignment low-resource-word-embedding-alignment sinhala-word-embeddings fasttext-sinhala-word-embedding-alignment vecmap multilingual-embeddings

Updated Jun 8, 2024
Python

Mufassir-Chowdhury / BnPC

Star

This is the official repository of the paper titled "BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation", accepted in The 17th Workshop on Building and Using Comparable Corpora (BUCC 2024) co-located with LREC-COLING 2024. It contains the codes and the dataset.

nlp dataset paraphrase-identification bangla-dataset bangla-nlp paraphrase-detection low-resource-nlp bangla-paraphrase bnpc

Updated May 26, 2024
Jupyter Notebook

StefanHeng / ProgGen

Star

Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"

natural-language-processing named-entity-recognition data-generation few-shot-learning training-data-generation low-resource-nlp large-language-models efficient-nlp

Updated Mar 29, 2024
Python

nicolay-r / awesome-sentiment-attitude-extraction

Star

A curated list of awesome sentiment analysis studies, in which attitude corresponds to the text position conveyed by Subject towards other Object mentioned in text such as: entities, events, etc.