Code for DiffusionNER: Boundary Diffusion for Named Entity Recognition, Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang, ACL 2023.
During training, DiffusionNER gradually adds noises to the golden entity boundaries by a fixed forward diffusion process and learns a reverse diffusion process to recover the entity boundaries. In inference, DiffusionNER first randomly samples some noisy spans from a standard Gaussian distribution and then generates the named entities by denoising them with the learned reverse diffusion process.
To run our code, install:
conda create -n diffusionner python=3.8
pip install -r requirements.txt
Nested NER:
- ACE04: https://catalog.ldc.upenn.edu/LDC2005T09
- ACE05: https://catalog.ldc.upenn.edu/LDC2006T06
- GENIA: http://www.geniaproject.org/genia-corpuss
Flat NER:
- OntoNotes: https://github.com/yhcc/OntoNotes-5.0-NER
- CoNLL03: https://data.deepai.org/conll2003.zip
- MSRA: https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/ner2mrc/download.md
We provide the preprocessed datasets in these links: ACE2004, GENIA, CoNLL03, MSRA. Please download them and put them into the data/datasets
folder.
If you need other datasets, please contact me (syl@zju.edu.cn
) by email. Note that you need to state your identity and prove that you have obtained the license.
Take the ACE2004 dataset as a demo and run:
CUBLAS_WORKSPACE_CONFIG=:4096:8 python diffusionner.py train --config configs/ace2004.conf
We also provide the pre-trained checkpoints in these links: ACE2004, GENIA, CoNLL03, MSRA. Please download them and put them into the data/checkpoints
folder.
Set the path of the model checkpoint into eval.conf -> model_path
and run:
CUBLAS_WORKSPACE_CONFIG=:4096:8 python diffusionner.py eval --config configs/eval.conf
If you find this repository useful, please cite our paper:
@article{Shen2023DiffusionNERBD,
title={DiffusionNER: Boundary Diffusion for Named Entity Recognition},
author={Yongliang Shen and Kaitao Song and Xu Tan and Dong Sheng Li and Weiming Lu and Yue Ting Zhuang},
journal={ArXiv},
year={2023},
volume={abs/2305.13298}
}