Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding
The code and data used for our EMNLP paper Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding.
- GCC compiler (used to compile the source c file): See the guide for installing GCC.
Run the Code
Using the same datasets as ours
This step runs the whole pipeline from embedding training, to neural network distillation and model evaluation. The
--dataset in the script is used to specify which prepared dataset (restaurant or laptop) to use. Generated embedding file is stored under
Prediction results for each dataset are generated at
Preparing your own dataset
Create a new folder under
/datasets for your new dataset. The in-domain unlabeled training corpus
train.txt used for joint topic embedding training has the format of each line being a document. The test set
test.txt used for evaluation is in following format:
line_id aspect_label_id sentiment_label_id text
The keywords for each aspect and sentiment should be listed in
senti_w_kw.txt. Each line refers to one aspect/sentiment category. The line order should be consistent with the order of aspect and sentiment label ids. Examples can be found in prepared dataset folders.