Skip to content
main
Switch branches/tags
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

The code and data used for our EMNLP paper Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding.

Requirements

Datasets

We collect in-domain corpus for embedding training. For evaluation, we use Restaurant and Laptop datasets in Sem-Eval 2015 and Sem-Eval 2016. We preprocessed these datasets in this repository.

Run the Code

Using the same datasets as ours

bash run_jasen.sh

This step runs the whole pipeline from embedding training, to neural network distillation and model evaluation. The --dataset in the script is used to specify which prepared dataset (restaurant or laptop) to use. Generated embedding file is stored under ${dataset}. Prediction results for each dataset are generated at /datasets/${dataset}/prediction.txt.

Preparing your own dataset

Create a new folder under /datasets for your new dataset. The in-domain unlabeled training corpus train.txt used for joint topic embedding training has the format of each line being a document. The test set test.txt used for evaluation is in following format:

line_id	aspect_label_id	sentiment_label_id	text

The keywords for each aspect and sentiment should be listed in aspect_w_kw.txt and senti_w_kw.txt. Each line refers to one aspect/sentiment category. The line order should be consistent with the order of aspect and sentiment label ids. Examples can be found in prepared dataset folders.

About

Code and Data for our EMNLP-2020 paper Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding.

Resources

Releases

No releases published

Packages

No packages published