Skip to content

Chinese Named Entity Recognition Using Neural Network

License

Notifications You must be signed in to change notification settings

zerohd4869/Chinese-NER

Repository files navigation

Chinese NER Using Neural Network

任务简介

命名实体识别 (Named Entity Recognition, NER) 涉及实体边界的确定和命名实体识别类别的识别,是自然语言处理 (NLP) 领域的一项基础性工作。

本项目针对 Chinese NER 任务,已复现 BiLSTM-CRF、Lattice LSTM、LR-CNN、WC-LSTM 等模型。

另外,基于 Graph 的模型 LGN 源码实现见 github,基于 Sequence 的模型 SLK-NER 源码实现见 github

项目运行

环境要求

Pytorch v0.4.0
Python v3.6.2
numpy
tqdm

数据准备

Resume 开源数据集是Yue等人在 Sina Finance 采集的简历数据集,主要包括来自中国股票市场上市公司的高级管理人员的简历数据,可在 [Yang et al., 2018] 中获取,并将其放入目录./data/resume下。

数据统计

Typing Train Dev Test
Sentence 3.8k 0.46k 0.48k
Char 124.1k 13.9k 15.1k

标注策略:BMEO

分割方式: '\t' (吴 \t B-NAME)

标注具体类型:

该数据集使用 YEDDA System [Yang et al.,2018] 手动注释了8种命名实体。

Tag Meaning Train Dev Test
CONT Country 260 33 28
EDU Educational Institution 858 106 112
LOC Location 47 2 6
NAME Personal Name 952 110 112
ORG Organization 4611 523 553
PRO Profession 287 18 33
RACE Ethnicity Background 115 15 14
TITLE Job Title 6308 690 772
Total Entity --- 13438 1497 1630

详见目录data/resume

加载预训练 Embeddings

预训练 Embeddings 使用了分词器 RichWordSegmentor [Yang et al.,2017a] 的 baseline。

模型训练

参数配置文件是 ./*.conf, 运行实例:

python main.py --conf_path ./wclstm_ner.conf # conf_path 配置文件地址

模型评估与预测

在配置文件 ./*.conf 中设置参数 status 为 test,运行实例:

python main.py --conf_path ./wclstm_ner.conf

实验结果

在 Resume 数据集下的结果如下表:

Models P R F1
BiLSTM-CRF [Lample et al., 2016] 93.7 93.3 93.5
BiLSTM-CRF + bichar [Yang et al., 2017a] 93.9 94.1 94.0
CAN [Zhu et al., 2019] 95.1 94.8 94.9
BERT [Devlin et al., 2019] 94.2 95.8 95.0
Lattice LSTM [Yang et al., 2018] 94.8 94.1 94.5
LR-CNN [Gui et al., 2019] 95.4 94.8 95.1
WC-LSTM [Liu et al., 2019] 95.3 95.2 95.2
LGN [Gui et al., 2019] 95.3 95.5 95.4
SLK-NER [Hu et al., 2020] 95.2 96.4 95.8

参考文献

[1] Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2018. Yedda: A lightweight collaborative text span annotation tool. In ACL. Demonstration.

[2] Jie Yang, Zhiyang Teng, Meishan Zhang, and Yue Zhang. 2016. Combining discrete and neural features for sequence labeling. In CICLing.

[3] Ma, Xuezhe, and Eduard Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. Strubell, E., Verga, P. , Belanger,D. , & Mccallum, A. . (2017). Fast and accurate entity recognition with iterated dilated convolutions.

[4] Lample, Guillaume, et al. Neural Architectures for Named Entity Recognition. Proceedings of NAACL-HLT. 2016.

[5] Yang, Jie, Yue Zhang, and Fei Dong. Neural Word Segmentation with Rich Pretraining. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017.

[6] Yuying Zhu and Guoxin Wang. Can-ner: Convolutional attention network for chinese named entity recognition. In NAACL, pages 3384–3393, 2019.

[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi-rectional transformers for language understanding. In NAACL, pages 4171–4186, Minneapolis, June 2019.

[8] Zhang, Yue, and Jie Yang. Chinese NER Using Lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.

[9] Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, & Xuanjing Huang. 2019. CNN-Based Chinese NER with Lexicon Rethinking, In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019), August 10-16.

[10] Liu, Wei, et al. An Encoding Strategy Based Word-Character LSTM for Chinese NER. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[11] Tao Gui, Yicheng Zou, Qi Zhang, Minlong Peng, Jinlan Fu, Zhongyu Wei, and Xuan-Jing Huang. A lexicon-based graph neural network for chinese ner. In EMNLP- IJCNLP, pages 1039–1049, 2019.

[12] Dou Hu and Lingwei Wei. ”SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER.” The 32st International Conference on Software & Knowledge Engineering. 2020.

About

Chinese Named Entity Recognition Using Neural Network

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages