Analysis of the evolution of advanced transformer-based language models: experiments on opinion mining

Abstract

Opinion mining, also known as sentiment analysis, is a subfield of natural language processing (NLP) that focuses on identifying and extracting subjective information in textual material. This can include determining the overall sentiment of a piece of text (e.g., positive or negative), as well as identifying specific emotions or opinions expressed in the text, that involves the use of advanced machine and deep learning techniques. Recently, transformer-based language models make this task of human emotion analysis intuitive, thanks to the attention mechanism and parallel computation. These advantages make such models very powerful on linguistic tasks, unlike recurrent neural networks that spend a lot of time on sequential processing, making them prone to fail when it comes to processing long text. The scope of our paper aims to study the behaviour of the cutting-edge Transformer-based language models on opinion mining and provide a high-level comparison between them to highlight their key particularities. Additionally, our comparative study shows leads and paves the way for production engineers regarding the approach to focus on and is useful for researchers as it provides guidelines for future research subjects.

Tech Stack

PyTorch
Hugging-Face
Transformers

Repo Structure

 ...   
   ├── assets
   ├── data
   ├── notebooks
   ├── LICENSE
   └── README.md

Dataset

For experiments we used the IMDb movie reviews dataset due to its accessibility, size, balance, relevance, and preprocessing.

Model Results

Model Architectures:

	Encoder	Decoder	Encoder - Decoder
Arch.			-

Main Results:

Model	Objective	Recall	Precision	F1	Accuracy
BERT	Autoencoding	93.9	94.3	94.1	94.0
GPT	Autoregressive	92.4	51.8	66.4	53.2
GPT-2	Autoregressive	51.1	54.8	52.9	54.5
ALBERT	Autoencoding	94.1	91.9	93.0	93.0
RoBERTa	Autoencoding	96.0	94.6	95.3	95.3
XLNet	Autoregressive	94.7	95.1	94.9	94.8
DistilBERT	Autoencoding	94.3	92.7	93.5	93.4
XLM-RoBERTa	Autoencoding	83.1	71.7	77.0	75.2
BART	Encoder-Decoder	96.0	93.3	94.6	94.6
ConvBERT	Autoencoding	95.5	93.7	94.6	94.5
DeBERTa	Autoencoding	95.2	95.0	95.1	95.1
ELECTRA	Generative -Discriminative	95.8	95.4	95.6	95.6
Longformer	Autoregressive	95.9	94.3	95.1	95.0
Reformer	Autoregressive	54.6	52.1	53.3	52.2
T5	Encoder-Decoder	94.8	93.4	94.0	93.9

Ablation Results:

Model	Max Len	Data Cleaned	Recall	Precision	F1	Accuracy
BERT	64	[]	86.8	84.7	85.8	85.6
BERT	384	[]	93.9	94.3	94.1	94.0
BERT	384	[x]	92.6	91.6	92.1	92.2

Best vs. Worsd Model:

Model	Perf.	Accuracy	Loss	Confusion
ELECTRA	Best
GPT2	Worst

Citation

@article{Zekaoui_2023,
  title = {Analysis of the evolution of advanced transformer-based language models: experiments on opinion mining},
  author = {Nour Eddine Zekaoui and Siham Yousfi and Maryem Rhanoui and Mounia Mikram},
  journal = {{IAES} International Journal of Artificial Intelligence ({IJ}-{AI})},
  volume = {12},
  number = {4},
  pages = {1995--2010}
  month = {Dec},
  year = {2023},
  doi = {10.11591/ijai.v12.i4.pp1995-2010},
  ISSN = {2252-8938},
  url = {https://doi.org/10.11591/ijai.v12.i4.pp1995-2010},
}

Contact Info

For help or issues using the paper's code, please submit a GitHub issue. For personal communication related to the paper, please contact: {nour-eddine.zekaoui, syousfi, mrhanoui, mmikram}@esi.ac.ma .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

data

data

notebooks

notebooks

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Analysis of the evolution of advanced transformer-based language models: experiments on opinion mining

Table of Contents

Tech Stack

Repo Structure

Dataset

Model Results

Citation

Contact Info

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
data		data
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

License

zekaouinoureddine/Opinion-Transformers

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Tech Stack

Repo Structure

Dataset

Model Results

Citation

Contact Info

About

Topics

Resources

License

Stars

Watchers

Forks

Languages