Skip to content

vietanhcdt2/GPT-NER

 
 

Repository files navigation

GPT-NER: Named Entity Recognition via Large LanguageModels

Introduction

This repo contains code for the paper GPT-NER: Named Entity Recognition via Large LanguageModels.

@article{wang2023gpt,
  title={GPT-NER: Named Entity Recognition via Large Language Models},
  author={Wang, Shuhe and Sun, Xiaofei and Li, Xiaoya and Ouyang, Rongbin and Wu, Fei and Zhang, Tianwei and Li, Jiwei and Wang, Guoyin},
  journal={arXiv preprint arXiv:2304.10428},
  year={2023}
}

Usage

Requirements

  • python>=3.7.3
  • openai==0.27.2
  • simcse==0.4

This repor mainly use two addtional packages: SimCSE and OpenAI. So, if you want to know more about the arguments used in codes, please refer to the corresponding documents.

Proposed Dataset

For the full NER dataset, we follow MRC-NER for preprocessing, and you can directly download these here.

For sampled 100-dataset, we have put them on the Google Drive.

Few-shot Demonstrations Retrieval

For sentence-level embeddings, run openai_access/extract_mrc_knn.py.

Note that you should change the directory for the input/output file and the used SimCSE model. In this repo, the model sup-simcse-roberta-large is used for SimCSE, and you can find it here.

OpenAI Access

We follow the official steps to access the GPT-* Models, and the document can be found here. Before you run our scripts, you need to add OPENAI_API_KEY, which you can find it in your account profile, to the environment variable by the command export OPENAI_API_KEY="YOUR_KEY".

To get preditions, please run openai_access/scripts/access_ai.sh, and the used arguments are listed in file openai_access/get_results_mrc_knn.py.

For self-verification, please run openai_access/scripts/verify.sh, and the used arguments are listed in file openai_access/verify_results.py.

Note that accessing to the GPT-3 is very expensive, we thus strongly advise you to start from our sampled 100-dataset.

Evaluate

We use span-level precession, recall and F1-score for evaluation, and to do this, please run the script openai_access/scripts/compute_f1.sh.

Results

Table 1: Results of sampled 100 pieces of data for two Flat NER datasets: CoNLL2003 and OntoNotes5.0.

EnglishCoNLL2003 (Sampled 100) EnglishOntoNotes5.0 (Sampled 100)
Model Precision Recall F1 Precision Recall F1
Baselines (Supervised Model)
ACE+document-context 97.8 98.28 98.04 (SOTA) - - -
BERT-MRC+DSC - - - 93.81 93.95 93.88 (SOTA)
GPT-NER
GPT-3 + random retrieval 88.18 78.54 83.08 64.21 65.51 64.86
GPT-3 + sentence-level embedding 90.47 95 92.68 76.08 83.06 79.57
GPT-3 + entity-level embedding 94.06 96.54 95.3 78.38 83.9 81.14
Self-verification (zero-shot)
GPT-3 + random retrieval 88.95 79.73 84.34 64.94 65.90 65.42
GPT-3 + sentence-level embedding 91.77 96.36 94.01 77.33 83.29 80.31
GPT-3 + entity-level embedding 94.15 96.77 95.46 79.05 83.71 81.38
Self-verification (few-shot)
GPT-3 + random retrieval 90.04 80.14 85.09 65.21 66.25 65.73
GPT-3 + sentence-level embedding 92.92 95.45 94.17 77.64 83.22 80.43
GPT-3 + entity-level embedding 94.73 96.97 95.85 79.25 83.73 81.49

Table 2: Results of full data for two Flat NER datasets: CoNLL2003 and OntoNotes5.0.

English CoNLL2003 (FULL) English OntoNotes5.0 (FULL)
Model Precision Recall F1 Precision Recall F1
Baselines (Supervised Model)
BERT-Tagger - - 92.8 90.01 88.35 89.16
BERT-MRC 92.33 94.61 93.04 92.98 89.95 91.11
GNN-SL 93.02 93.40 93.2 91.48 91.29 91.39
ACE+document-context - - 94.6 (SOTA) - - -
BERT-MRC+DSC 93.41 93.25 93.33 91.59 92.56 92.07 (SOTA)
GPT-NER
GPT-3 + random retrieval 77.04 68.69 72.62 53.8 59.36 56.58
GPT-3 + sentence-level embedding 81.04 88.00 84.36 66.87 73.77 70.32
GPT-3 + entity-level embedding 88.54 91.4 89.97 74.17 79.29 76.73
Self-verification (zero-shot)
GPT-3 + random retrieval 77.13 69.23 73.18 54.14 59.44 56.79
GPT-3 + sentence-level embedding 83.31 88.11 85.71 67.29 73.81 70.55
GPT-3 + entity-level embedding 89.47 91.77 90.62 74.64 79.52 77.08
Self-verification (few-shot)
GPT-3 + random retrieval 77.50 69.38 73.44 54.23 59.65 56.94
GPT-3 + sentence-level embedding 83.73 88.07 85.9 67.35 73.79 70.57
GPT-3 + entity-level embedding 89.76 92.06 90.91 74.89 79.51 77.20

Table 3: Results of full data for three Nested NER datasets: ACE2004, ACE2005 and GENIA.

English ACE2004 (FULL) English ACE2005 (FULL) English GENIA (FULL)
Model Precision Recall F1 Precision Recall F1 Precision Recall F1
Baselines (Supervised Model)
BERT-MRC 85.05 86.32 85.98 87.16 86.59 86.88 85.18 81.12 83.75 (SOTA)
Triaffine+BERT 87.13 87.68 87.40 86.70 86.94 86.82 80.42 82.06 81.23
Triaffine+ALBERT 88.88 88.24 88.56 87.39 90.31 88.83 - - -
BINDER 88.3 89.1 88.7 (SOTA) 89.1 89.8 89.5 (SOTA) - - -
GPT-NER
GPT-3 + random retrieval 55.04 41.76 48.4 44.5 46.24 45.37 44.1 38.64 41.37
GPT-3 + sentence-level embedding 65.31 53.67 60.68 58.04 58.97 58.50 63.43 44.17 51.68
GPT-3 + entity-level embedding 72.23 75.01 73.62 71.72 74.2 73.96 61.38 66.74 64.06
Self-verification (zero-shot)
GPT-3 + random retrieval 55.44 42.22 48.83 45.06 46.62 45.84 44.31 38.79 41.55
GPT-3 + sentence-level embedding 69.64 54.98 62.31 59.49 60.17 59.83 59.54 44.26 51.9
GPT-3 + entity-level embedding 73.58 74.74 74.16 72.63 75.39 73.46 61.77 66.81 64.29
Self-verification (few-shot)
GPT-3 + random retrieval 55.63 42.49 49.06 45.49 46.73 46.11 44.68 38.98 41.83
GPT-3 + sentence-level embedding 70.17 54.87 62.52 59.69 60.35 60.02 59.87 44.39 52.13
GPT-3 + entity-level embedding 73.29 75.11 74.2 72.77 75.51 73.59 61.89 66.95 64.42

Contact

If you have any issues or questions about this repo, feel free to contact wangshuhe@stu.pku.edu.cn.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.6%
  • Shell 3.4%