This repo is our code and dataset for paper De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention. Our work is based on BOND and PU-Learning. Therefore, our code is created by modifying their codes.
We use the data from BOND. We sampled several sub-dictionaries by sampling entities from the global dictionary. The probability of
each entity being sampled corresponds to its utterance frequency. Therefore, in each dataset (DSCAU/dataset
), there are several train*.json
files, each of them is generated by a single sub-dictionary.
Python 3.7, DSCAU/requirements_bond.txt
is the environment of BOND and DSCAU/requirements_pul.txt
is the environment of PU-Learning.
For BOND:
cd DSCAU/BOND/
./scripts/train_conll2003.sh
./scripts/train_twitter.sh
./scripts/train_webpage.sh
./scripts/train_wikigold.sh
For PU-Learning:
Download glove.6B.100d.txt first, and move it to the directory DSCAU/PUL/data_bond/
cd DSCAU/PUL/
./scripts/train_conll2003.sh
For BOND:
cd DSCAU/BOND/
./scripts/eval_conll2003.sh
./scripts/eval_twitter.sh
./scripts/eval_webpage.sh
./scripts/eval_wikigold.sh
For PU-Learning:
cd DSCAU/PUL/
./scripts/eval_conll2003.sh
You can download the following trained models and replace the SAVED_DIR
in eval_*.sh
with it to obtain the results.
CoNLL03 | Webpage | Wikigold | ||
---|---|---|---|---|
BOND | Download | Download | Download | Download |
PU-Learning | Download | - | - | - |
Please cite our ACL 2021 paper:
@inproceedings{zhang-etal-2021-de,
title = "De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention",
author = "Zhang, Wenkai and
Lin, Hongyu and
Han, Xianpei and
Sun, Le",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
year = "2021",
publisher = "Association for Computational Linguistics",
pages = "4803--4813"
}