A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
-
Updated
Sep 25, 2024 - Python
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
This repository contains the Arabic sarcasm dataset (ArSarcasm)
Dialect identification using Siamese network
The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguistic and the acoustic cues. This dataset is a potential benchmark for DCS in spontaneous speech.
ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.
Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)
Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek
VarDial19 shared task: Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT)
A tool that predicts the dialect of English of an SMS message using recurrent neural networks supplemented with data from Google Trends.
Ríomhchlár a dhéanann aicmiú staitistiúil ar théacsanna Gaeilge de réir a gcanúint
Arabic_Dialect_Identification_NLP-AIM-Task
using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.
Twitter Dialect Datasets and Classifiers (GULF Arabic Corpus)
Twitter Dialect Datasets and Classifiers (EG + GULF Arabic Corpus)
An Arabic Tweet Dialect Classifier
Twitter Dialect Datasets and Classifiers (EG Arabic Corpus)
Web interface for far-speech demo to be present in INTERSPEECH 2019
Add a description, image, and links to the dialect-identification topic page so that developers can more easily learn about it.
To associate your repository with the dialect-identification topic, visit your repo's landing page and select "manage topics."