Exploiting BERT End-to-End Aspect-Based Sentiment Analysis
- Pre-trained embedding layer: Conversational RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters huggingface, docs.
- Task-specific layer:
- Linear
- Recurrent Neural Networks (GRU)
- Self-Attention Networks (SAN, TFM)
- Conditional Random Fields (CRF)
You have a sentence like this:
Средняя продолжительность жизни в рэфии 63 года у мужиков пенсия в 65 ... .травят едой из монеток и ENT соевой и пальмовой .рэфия занимает первое место среди стран третьего мира по смертности от сердечно сосудистых заболеваний .ни вывозит сердечко говноеды из эрзац продуктов отложение бляшек в виде холестерина херакс и тромб
Which contains a special token ENT. This token masks our point-of-interest, an aspect for which we should determine its sentiment using other tokens as a context. For this particular example, you can clearly determine that ENT was used in negative sense (that's obvious - the whole sentence is depressingly negative).
You can find the original dataset here and preprocessed dataset here.
This dataset consists of scraped russian news articles, comments, reviews, marketplace item descriptions etc.
Each text item has specified sentiment (1 - positive, 0 - neutral, -1 - negative).
Each entity of interest is masked with ENT tag.
-
The valid tagging strategies/schemes (i.e., the ways representing text or entity span) in this project are BIEOS (also called BIOES or BMES), BIO (also called IOB2) and OT (also called IO). If you are not familiar with these terms, I strongly recommend you to read the following materials before running the program:
-
The paper associated with project.
-
Train the model on other ABSA dataset:
- Place data files in the directory
./data/[YOUR_DATASET_NAME]
(please note that you need to re-organize your data files so that it can be directly adapted to this project, following the input format of./data/train.txt
should be OK). - Set
TASK_NAME
intrain.sh
as[YOUR_DATASET_NAME]
. - Train the model:
sh train.sh
- Place data files in the directory
-
Perform pure inference/direct transfer over test/unseen data using the trained ABSA model:
- place data file in the directory
./data/[YOUR_EVAL_DATASET_NAME]
. - set
TASK_NAME
inwork.sh
as[YOUR_EVAL_DATASET_NAME]
- set
ABSA_HOME
inwork.sh
as[HOME_DIRECTORY_OF_YOUR_ABSA_MODEL]
- run:
sh work.sh
- place data file in the directory
Dataset | Precision | Recall | F1 (macro) | F1 (micro) |
---|---|---|---|---|
Test | 76.43 | 76.35 | 75.82 | 76.34 |
- Li, X., Bing, L., Zhang, W., & Lam, W. (2019). Exploiting BERT for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1910.00883.