**Named entity disambiguation (NED)** refers to the NLP task of achieving exactly this: assigning a unique identity to entities mentioned in the text. It’s also the first step in moving toward more sophisticated tasks to address the scenario mentioned above by identifying relationships between entities.

NER and NED together are known as **named entity linking (NEL)**.

Some other NLP applications that would need NEL include question answering and constructing large knowledge bases of connected events and entities, such as the Google Knowledge Graph.

Learning an NEL model requires the presence of a large, annotated dataset as well as some kind of encyclopedic resource to link to.

It’s more common to use off-the-shelf, pay-as-you-use services offered by big providers such as IBM (Watson) and Microsoft (Azure) for NEL rather than developing an in-house system

In [6]:
import requests
import pprint 

# not getting the API key :(
my_api_key = 'xxxx' #replace this with your api key

In [2]:
def print_entities(text):
    url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/entities"
    documents = {'documents':[{'id':'1', 'language':'en', 'text':text}]}
    headers = {'Ocp-Apim-Subscription-Key': my_api_key}
    response = requests.post(url, headers=headers, json=documents)
    entities = response.json()
    return entities

In [5]:
mytext = open("Data/myarticle.txt").read() #This file is in the same folder. 
entities = print_entities(mytext)
for document in entities["documents"]:
    pprint.pprint(document["entities"])
#This above code will print you a whole lot of stuff you may or may not use later.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 780: character maps to <undefined>

In [7]:
#Let us clean up a little bit, and not print the whole lot of messy stuff it gives us?
for document in entities['documents']:
    print("Entities in this document: ")
    for entity in document['entities']:
        if entity['type'] in ["Person", "Location", "Organization"]:
            print(entity['name'], "\t", entity['type'])
            if 'wikipediaUrl' in entity.keys():
                print(entity['wikipediaUrl'])

NameError: name 'entities' is not defined

# Practical Advice
- Existing NEL approaches are not perfect, and they’re unlikely to fare well with new names or domain-specific terms. Since NEL also requires further linguistic processing, including syntactic parsing, its accuracy is also affected by how well the different processing steps are done.
- Like with other IE tasks, the first step in any NLP pipeline—text extraction and cleanup—affects what we see as output for NEL as well. When we use third-party services, we have little control over adapting them to our domain, if needed, or understanding their internal workings to modify them to our needs.