Skip to content


Repository files navigation

Domain Agnostic Real-Valued Specificity Prediction

Code for Wei-Jen Ko, Greg Durrett and Junyi Jessy Li, "Domain Agnostic Real-Valued Specificity Prediction", The AAAI Conference on Artificial Intelligence (AAAI), 2019

This is a text specificity predictor for any domain.


  author    = {Ko, Wei-Jen and Durrett, Greg and Li, Junyi Jessy},
  title     = {Domain Agnostic Real-Valued Specificity Prediction},
  booktitle = {AAAI},
  year      = {2019},


-Pytorch (Tested on 1.0.0, it is known to produce incorrect results on 1.7.1)


Data and resources

The glove vector file (840B.300d) is required. Download it and set the glove path in and


The twitter, yelp, and movies data and annotations used in the paper is in dataset/data

Data format:

twitters.txt is the sentences

twitterv.txt are average specificity labeled by turkers for each sentence, permutated in the same order.

twitteru.txt is the unlabeled target domain data used in the paper

twitterl.txt is the binary specificity label, which is not used.

For other domains, ubstitute the domain name in the file names.


Training command:

python --gpu_id 0 --test_data twitter

Testing command:

python --gpu_id 0 --test_data twitter

Using it on a new domain

To use it on a new domain, unlabeled sentences of the new domain is required.

When training,change the s1['unlab']['path'] in and the path of xsu in and to the unlabeled data file.

When testing, change the s1['test']['path'] in and the path of xst in to the test sentences file.(And make sure s1['unlab']['path'] in and the path of xsu in is the same file)

The unlabeled data can be the same as test data.

The first line in the testing data is ignored.

For users without GPU/CUDA

Please replace the,, with the files in