什么是命名实体识别？

NER 是对文本中的关键信息（实体）进行检测和分类的过程。这可用于各种应用程序，如信息检索、内容分类，并作为许多复杂 NLP 任务的初步步骤。

In [1]:
import spacy 
nlp = spacy.load('en_core_web_sm') 

In [2]:
text = "Apple is looking at buying U.K. startup for $1 billion" 
doc = nlp(text)

In [3]:
for entity in doc.ents:
    print(entity.text,entity.label_)

Apple ORG
U.K. GPE
$1 billion MONEY


自定义 NER 系统
虽然 SpaCy 的默认 NER 模型很健壮，但您有时可能需要对其进行自定义以满足特定需求，尤其是在处理特定于域的文本时。

In [7]:
import random
from spacy.training import Example

# Preparing training data
TRAIN_DATA = [
    ("Walmart is a leading e-commerce company", {"entities": [(0, 7, "ORG")]})
]

# Updating the model
for text, annotations in TRAIN_DATA:
    doc = nlp(text)  # Create Doc object from text
    
    # Collect entities for the example
    entities = []
    for start, end, label in annotations.get("entities"):
        entities.append((start, end, label))  # (start_index, end_index, label)
    
    # Set the doc.ents with correct entity format
    doc.ents = [doc.char_span(start, end, label=label) for start, end, label in entities]

    example = Example.from_dict(doc, {"entities": entities})

    nlp.update([example], drop=0.5)

# Saving the updated model
nlp.to_disk("D:/xuexi/model")