No description, website, or topics provided.
Branch: master
Clone or download
Latest commit df87fc5 Jan 16, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
blas first commit Nov 4, 2018
cotraining_models first commit Nov 4, 2018
data Add files via upload Jan 10, 2019
dataset 3 Nov 12, 2018
encoder first commit Nov 4, 2018
liblinear first commit Nov 4, 2018
python first commit Nov 4, 2018
resources first commit Nov 4, 2018
.gitignore first commit Nov 4, 2018
Makefile first commit Nov 4, 2018
Makefile.win first commit Nov 4, 2018
README.md Update README.md Jan 17, 2019
__init__.py first commit Nov 4, 2018
data2.py 5 Nov 12, 2018
features.py first commit Nov 4, 2018
featuresm.py first commit Nov 4, 2018
generatefeatures.py first commit Nov 4, 2018
label.txt first commit Nov 4, 2018
models.py Add files via upload Jan 10, 2019
multisent.py first commit Nov 4, 2018
mutils.py first commit Nov 4, 2018
speciteller.py first commit Nov 4, 2018
test.py Add files via upload Jan 10, 2019
train.py Add files via upload Jan 10, 2019
utils.py Update utils.py Jan 9, 2019

README.md

Domain Agnostic Real-Valued Specificity Prediction

Code for Wei-Jen Ko, Greg Durrett and Junyi Jessy Li, "Domain Agnostic Real-Valued Specificity Prediction", The AAAI Conference on Artificial Intelligence (AAAI), 2019

This is a text specificity predictor for any domain.

Citation:

@InProceedings{ko2019domain,
  author    = {Ko, Wei-Jen and Durrett, Greg and Li, Junyi Jessy},
  title     = {Domain Agnostic Real-Valued Specificity Prediction},
  booktitle = {AAAI},
  year      = {2019},
}

Dependencies

-Pytorch

-Numpy

Data and resources

The glove vector file (840B.300d) is required. Download it and set the glove path in train.py and test.py

wget https://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip

The twitter, yelp, and movies data and annotations used in the paper is in dataset/pdtb

Running

Training command:

python train.py --gpu_id 0 --test_data twitter

Testing command:

python test.py --gpu_id 0 --test_data twitter

Using it on a new domain

To use it on a new domain, unlabeled sentences of the new domain is required.

Change the s1['unlab']['path'] in data2.py and the path of xsu in train.py and test.py to the unlabeled data file.

Also change the s1['test']['path'] in data2.py and the path of xst in test.py to the test sentences file.

The first line in the testing data is ignored.