Chinese NER Using Neural Network

任务简介

命名实体识别 (Named Entity Recognition, NER) 涉及实体边界的确定和命名实体识别类别的识别，是自然语言处理 (NLP) 领域的一项基础性工作。

本项目针对 Chinese NER 任务，已复现 BiLSTM-CRF、Lattice LSTM、LR-CNN、WC-LSTM 等模型。

另外，基于 Graph 的模型 LGN 源码实现见 github，基于 Sequence 的模型 SLK-NER 源码实现见 github。

项目运行

环境要求

Pytorch v0.4.0
Python v3.6.2
numpy
tqdm

数据准备

Resume 开源数据集是Yue等人在 Sina Finance 采集的简历数据集，主要包括来自中国股票市场上市公司的高级管理人员的简历数据，可在 [Yang et al., 2018] 中获取，并将其放入目录./data/resume下。

数据统计：

Typing	Train	Dev	Test
Sentence	3.8k	0.46k	0.48k
Char	124.1k	13.9k	15.1k

标注策略：BMEO

分割方式: '\t' (吴 \t B-NAME)

标注具体类型：

该数据集使用 YEDDA System [Yang et al.,2018] 手动注释了8种命名实体。

Tag	Meaning	Train	Dev	Test
CONT	Country	260	33	28
EDU	Educational Institution	858	106	112
LOC	Location	47	2	6
NAME	Personal Name	952	110	112
ORG	Organization	4611	523	553
PRO	Profession	287	18	33
RACE	Ethnicity Background	115	15	14
TITLE	Job Title	6308	690	772
Total Entity	---	13438	1497	1630

详见目录data/resume。

加载预训练 Embeddings

预训练 Embeddings 使用了分词器 RichWordSegmentor [Yang et al.,2017a] 的 baseline。

模型训练

参数配置文件是 ./*.conf, 运行实例：

python main.py --conf_path ./wclstm_ner.conf # conf_path 配置文件地址

模型评估与预测

在配置文件 ./*.conf 中设置参数 status 为 test，运行实例：

python main.py --conf_path ./wclstm_ner.conf

实验结果

在 Resume 数据集下的结果如下表：

Models	P	R	F1
BiLSTM-CRF [Lample et al., 2016]	93.7	93.3	93.5
BiLSTM-CRF + bichar [Yang et al., 2017a]	93.9	94.1	94.0
CAN [Zhu et al., 2019]	95.1	94.8	94.9
BERT [Devlin et al., 2019]	94.2	95.8	95.0
Lattice LSTM [Yang et al., 2018]	94.8	94.1	94.5
LR-CNN [Gui et al., 2019]	95.4	94.8	95.1
WC-LSTM [Liu et al., 2019]	95.3	95.2	95.2
LGN [Gui et al., 2019]	95.3	95.5	95.4
SLK-NER [Hu et al., 2020]	95.2	96.4	95.8

参考文献

[1] Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2018. Yedda: A lightweight collaborative text span annotation tool. In ACL. Demonstration.

[2] Jie Yang, Zhiyang Teng, Meishan Zhang, and Yue Zhang. 2016. Combining discrete and neural features for sequence labeling. In CICLing.

[3] Ma, Xuezhe, and Eduard Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. Strubell, E., Verga, P. , Belanger,D. , & Mccallum, A. . (2017). Fast and accurate entity recognition with iterated dilated convolutions.

[4] Lample, Guillaume, et al. Neural Architectures for Named Entity Recognition. Proceedings of NAACL-HLT. 2016.

[5] Yang, Jie, Yue Zhang, and Fei Dong. Neural Word Segmentation with Rich Pretraining. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017.

[6] Yuying Zhu and Guoxin Wang. Can-ner: Convolutional attention network for chinese named entity recognition. In NAACL, pages 3384–3393, 2019.

[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi-rectional transformers for language understanding. In NAACL, pages 4171–4186, Minneapolis, June 2019.

[8] Zhang, Yue, and Jie Yang. Chinese NER Using Lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.

[9] Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, & Xuanjing Huang. 2019. CNN-Based Chinese NER with Lexicon Rethinking, In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019), August 10-16.

[10] Liu, Wei, et al. An Encoding Strategy Based Word-Character LSTM for Chinese NER. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[11] Tao Gui, Yicheng Zou, Qi Zhang, Minlong Peng, Jinlan Fu, Zhongyu Wei, and Xuan-Jing Huang. A lexicon-based graph neural network for chinese ner. In EMNLP- IJCNLP, pages 1039–1049, 2019.

[12] Dou Hu and Lingwei Wei. ”SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER.” The 32st International Conference on Software & Knowledge Engineering. 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data/resume		data/resume
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
charbl_bichar_ner.conf		charbl_bichar_ner.conf
charbl_ner.conf		charbl_ner.conf
lattice_ner.conf		lattice_ner.conf
load_conf.py		load_conf.py
lrcnn_ner.conf		lrcnn_ner.conf
main.py		main.py
wclstm_ner.conf		wclstm_ner.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese NER Using Neural Network

任务简介

项目运行

环境要求

数据准备

加载预训练 Embeddings

模型训练

模型评估与预测

实验结果

About

Releases

Packages

Languages

License

zerohd4869/Chinese-NER

Folders and files

Latest commit

History

Repository files navigation

Chinese NER Using Neural Network

任务简介

项目运行

环境要求

数据准备

加载预训练 Embeddings

模型训练

模型评估与预测

实验结果

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages