Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)
-
Updated
Oct 17, 2022 - Python
Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)
Fine-tune BERT models to classify Arabic text by different dialects.
Arabic_Dialect_Identification_NLP-AIM-Task
Arabic Dialect Identification between 18 country-level Arabic dialects using QADI dataset and pretrained language model AraBERT
Simple Script to undo Farasa Segmentation, compatible with AraBERT pre-segmentation
Easy to use extractive text summarization with AraBERT
Mental health diagnosis tool using NLP and ML for Arabic inputs, with a Laravel web application interface
After collecting 40 thousand tweets and preprocessing it, I used word embeddings with arabert and tf-idf along with two neural network architectures and 5 machine learning algorithms. Due to the huge size of the dataset, I chose Amazon SageMaker to train the models
Arabic Dialect Sentimenal Analysis
This is an experiment for Qur'an QA for the shared task at the OSCAT workshop
Diacritics are short vowels with a constant length that are spoken. The same word in the Arabic language can have different meanings and different pronunciations based on how it is diacritized. In this project, we implement a pipeline to predict the diacritic of each character in an Arabic text using Natural Language Processing techniques.
Emotion Prediction in Arabic Text
Sentiment analysis with arabert in Tunisian dialect
Many countries speak Arabic; however, each country has its own dialect, the aim of this project is to build a model that predicts the dialect given the text.
Fine-tuning / pre-training AraElectra on a specific domain for QA system.
Disambiguation Study for Arabic Applied on Text Classification
Dialectical Arabic Sentiment Analysis
An advanced Arabic fake news detection model using LSTM and AraBERT. This project leverages the Arabic Fake News Dataset (AFND) to classify news articles as credible, not credible, or undecided. Includes preprocessing steps, model building, and evaluation using TensorFlow.
Used “aubmindlab/bert-base-arabertv2” from Aub-mind AraBERT to create a simple Arabic text tokenizer.
Add a description, image, and links to the arabert topic page so that developers can more easily learn about it.
To associate your repository with the arabert topic, visit your repo's landing page and select "manage topics."