Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)
-
Updated
Nov 2, 2024 - Python
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)
Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media 📊
A comprehensive overview of research regarding Natural Language Processing (NLP) of Manipuri language.
A Scandinavian Benchmark for sentence embeddings
NLP pipelines for Tagalog using spaCy
NONWESTLIT Project Codebase
Implementation of NAACL 2024 main conference paper: Named Entity Recognition Under Domain Shift via Metric Learning for Life Science
Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning. NAACL 2024.
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly
English-Sinhala multilingual word embedding alignment resources
This is the official repository of the paper titled "BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation", accepted in The 17th Workshop on Building and Using Comparable Corpora (BUCC 2024) co-located with LREC-COLING 2024. It contains the codes and the dataset.
Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"
A curated list of awesome sentiment analysis studies, in which attitude corresponds to the text position conveyed by Subject towards other Object mentioned in text such as: entities, events, etc.
Official implementation of the EACL Findings 2024 paper: Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction
Children StoryBooks for 180 langauges.
Pashto Natural Language Processing Toolkit
Add a description, image, and links to the low-resource-nlp topic page so that developers can more easily learn about it.
To associate your repository with the low-resource-nlp topic, visit your repo's landing page and select "manage topics."