Skip to content
Denoising methods for distantly-labeled entity typing data
Python Shell
Branch: master
Clone or download
Latest commit 60b2ce4 Jul 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bert init Jun 27, 2019
data_tools init Jun 27, 2019
img init Mar 22, 2019
resources init Jun 27, 2019
scripts denoising Jul 18, 2019
README.md more explanation Jul 18, 2019
config_parser.py update readme Jul 18, 2019
data_utils.py denoising Jul 18, 2019
denoising_models.py denoising Jul 18, 2019
eval_metric.py init Jun 27, 2019
main.py denoising Jul 18, 2019
model_utils.py init Jun 27, 2019
models.py init Jun 27, 2019
scorer.py init Jun 27, 2019

README.md

Learning to Denoise Distantly-Labeled Data for Entity Typing

This is a PyTorch implementation of the fine-grained entity typing system presented in the NAACL 2019 paper Learning to Denoise Distantly-Labeled Data for Entity Typing.

Dependencies

The code is developed with python 3.6 and pytorch 0.4.0. We use spaCy to preprocess data.

Data

The ultra-fine entity typing dataset is available here. Download the data folder from here. Modify ./resources/constant.py accordingly to make shure that all paths are pointing to the right directories.

Preprocessing

Our models require mention headwords. See ./data_tools/add_tree.py how to add headwords to the original data. ./data/crowd contains the preprocessed manually-annotated data.

Training Models

Ultra-Fine Entity Typing

Entity Typing Model:

python3 main.py et_model -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type et_model -remove_el -remove_open

Relabeling Model:

python3 main.py labeler -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type labeler -remove_el -remove_open -mode train_labeler

Filtering Model:

python3 main.py filter -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type filter -remove_el -remove_open -mode train_labeler

BERT:

python3 main.py bert_uncased_small -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type bert_uncase_small -remove_el -remove_open

Ontonotes

Coming soon...

Evaluating Models

Once you trained an entity typing model, you can evaluate it on the dev/test set with the command below. [MODEL NAME] is the model file (without suffix).

Entity Typing Model:

python3 main.py et_model_eval -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type et_model -mode test -reload_model_name [MODEL NAME] -eval_data crowd/dev_tree.json -load

Ontonotes

Coming soon...

Denoising Data

Once filter and relabeling models are trained, you can run them on the dataset of your choice. [MODEL NAME] is the model file (without suffix). [DATA FILE NAME] is the data file that you want to denoise.

Filtering Model:

python3 -u main.py filter_eval -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type filter -mode test_labeler -reload_model_name [MODEL NAME] -eval_data [DATA FILE NAME] -load

After running this command, filter_eval.json will be saved in the current directory. The model predictions are stored with the pred key.

Relabeling Model:

python3 -u main.py filter_eval -enhanced_mention -data_setup joint -add_crowd -multitask -mention_lstm -add_headword_emb -model_type filter -mode test_labeler -reload_model_name [MODEL NAME] -eval_data [DATA FILE NAME] -load

After running this command, labeler_eval.json will be saved in the current directory. The model predictions are stored with the cls_pred key (1 if the example is classified as a bad example, 0 otherwise).

Questions

Contact us at yasumasa@cs.utexas.edu if you have any questions!

Acknowledgements

Our code is largely borrowed from Eunsol Choi's implementation.

GitHub: https://github.com/uwnlp/open_type Paper : https://homes.cs.washington.edu/~eunsol/papers/acl_18.pdf

You can’t perform that action at this time.