russian-reviews-bert-e2e-absa

Exploiting BERT End-to-End Aspect-Based Sentiment Analysis

Architecture

Pre-trained embedding layer: Conversational RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters huggingface, docs.
Task-specific layer:
- Linear
- Recurrent Neural Networks (GRU)
- Self-Attention Networks (SAN, TFM)
- Conditional Random Fields (CRF)

Task

You have a sentence like this:

Средняя продолжительность жизни в рэфии 63 года у мужиков пенсия в 65 ... .травят едой из монеток и ENT соевой и пальмовой .рэфия занимает первое место среди стран третьего мира по смертности от сердечно сосудистых заболеваний .ни вывозит сердечко говноеды из эрзац продуктов отложение бляшек в виде холестерина херакс и тромб

Which contains a special token ENT. This token masks our point-of-interest, an aspect for which we should determine its sentiment using other tokens as a context. For this particular example, you can clearly determine that ENT was used in negative sense (that's obvious - the whole sentence is depressingly negative).

Dataset

You can find the original dataset here and preprocessed dataset here.

This dataset consists of scraped russian news articles, comments, reviews, marketplace item descriptions etc.

Each text item has specified sentiment (1 - positive, 0 - neutral, -1 - negative).

Each entity of interest is masked with ENT tag.

Quick Start

The valid tagging strategies/schemes (i.e., the ways representing text or entity span) in this project are BIEOS (also called BIOES or BMES), BIO (also called IOB2) and OT (also called IO). If you are not familiar with these terms, I strongly recommend you to read the following materials before running the program:
1. Inside–outside–beginning (tagging).
2. Representing Text Chunks.
3. The paper associated with project.
Train the model on other ABSA dataset:
1. Place data files in the directory ./data/[YOUR_DATASET_NAME] (please note that you need to re-organize your data files so that it can be directly adapted to this project, following the input format of ./data/train.txt should be OK).
2. Set TASK_NAME in train.sh as [YOUR_DATASET_NAME].
3. Train the model: sh train.sh
Perform pure inference/direct transfer over test/unseen data using the trained ABSA model:
1. place data file in the directory ./data/[YOUR_EVAL_DATASET_NAME].
2. set TASK_NAME in work.sh as [YOUR_EVAL_DATASET_NAME]
3. set ABSA_HOME in work.sh as [HOME_DIRECTORY_OF_YOUR_ABSA_MODEL]
4. run: sh work.sh

Results

Dataset	Precision	Recall	F1 (macro)	F1 (micro)
Test	76.43	76.35	75.82	76.34

References

Li, X., Bing, L., Zhang, W., & Lam, W. (2019). Exploiting BERT for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1910.00883.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
LICENSE		LICENSE
README.md		README.md
absa_layer.py		absa_layer.py
architecture.jpg		architecture.jpg
bert.py		bert.py
bert_utils.py		bert_utils.py
fast_run.py		fast_run.py
glue_utils.py		glue_utils.py
main.py		main.py
requirements.txt		requirements.txt
seq_utils.py		seq_utils.py
train.sh		train.sh
train_colab.sh		train_colab.sh
work.py		work.py
work.sh		work.sh
work_colab.sh		work_colab.sh

License

yuriy-os/russian-reviews-bert-e2e-absa

Folders and files

Latest commit

History

Repository files navigation

russian-reviews-bert-e2e-absa

Architecture

Task

Dataset

Quick Start

Results

References

About

Resources

License

Stars

Watchers

Forks

Languages