Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.77 KB

README.md

File metadata and controls

30 lines (19 loc) · 1.77 KB

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

The code and data used for our EMNLP paper Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding.

Requirements

Datasets

We collect in-domain corpus for embedding training. For evaluation, we use Restaurant and Laptop datasets in Sem-Eval 2015 and Sem-Eval 2016. We preprocessed these datasets in this repository.

Run the Code

Using the same datasets as ours

bash run_jasen.sh

This step runs the whole pipeline from embedding training, to neural network distillation and model evaluation. The --dataset in the script is used to specify which prepared dataset (restaurant or laptop) to use. Generated embedding file is stored under ${dataset}. Prediction results for each dataset are generated at /datasets/${dataset}/prediction.txt.

Preparing your own dataset

Create a new folder under /datasets for your new dataset. The in-domain unlabeled training corpus train.txt used for joint topic embedding training has the format of each line being a document. The test set test.txt used for evaluation is in following format:

line_id	aspect_label_id	sentiment_label_id	text

The keywords for each aspect and sentiment should be listed in aspect_w_kw.txt and senti_w_kw.txt. Each line refers to one aspect/sentiment category. The line order should be consistent with the order of aspect and sentiment label ids. Examples can be found in prepared dataset folders.