## Named Entities:

Named Entities Recognition (NER) seeks to locate and classify Named Entities, mentioned in unstructured text into a pre-defined categories such as the Person, Names and Organizations, locations, medical codes, time expressions, quantities, monetary values and percentages etc.

Our goal is to grab a raw text, and add in some additional information, such as named entities for corresponding words. E.g

"James bought and iPhone for Apple store in St. Louis"

James -> Person,
Apple -> Organization,
St.Louis -> Location

Let's explore NER with Spacy, and also see how to add our own custom entities!

In [1]:
import spacy 

nlp = spacy.load('en_core_web_sm')

In [2]:
def show_ents(doc):
    if doc.ents: # check if the doc has named entities
        for ent in doc.ents: 
            print(ent.text + ' -- ' + ent.label_ + ' -- ' + str(spacy.explain(ent.label_)))
            
    else:
        print('No entities found!')

In [3]:
doc = nlp(u"Hi! How are you?")

show_ents(doc) # let's see if our doc has any Name entity!

No entities found!


In [4]:
# Let's try another one!

doc2 = nlp(u"The quick brown fox jumped over the lazy dog!")

show_ents(doc2)

No entities found!


In [5]:
doc3 = nlp(u"I bought and iPhone recently from Apple store")

show_ents(doc3)
# There you go we finally have some named entities in our sentence!

iPhone -- ORG -- Companies, agencies, institutions, etc.
Apple -- ORG -- Companies, agencies, institutions, etc.


In [6]:
doc3 = nlp(u"can I have a 500 dollars of Microsoft stock")

show_ents(doc3)

500 dollars -- MONEY -- Monetary values, including unit
Microsoft -- ORG -- Companies, agencies, institutions, etc.


In [8]:
# Notice above, spacy is smart enough to understand that, the number 500 and word 'dollars' go together!

In [23]:
doc = nlp(u"Tesla to built a U.K factory for $6 millions")

show_ents(doc4)

U.K -- ORG -- Companies, agencies, institutions, etc.
$6 millions -- MONEY -- Monetary values, including unit


## Adding Named Entities to A Span:

In [24]:
from spacy.tokens import Span

In [25]:
ORG = doc.vocab.strings[u"ORG"]

In [26]:
ORG # reports back the hashed numerical id for that particular word string

381

In [32]:
# let's now create a span for the new entities

new_ent = Span(doc4, 0, 1, label = ORG) # grab the word at index 0, having label = ORG (we defined above)
# assigning our own label to the entity we wanna add to our doc!

In [28]:
# adding an entity to an existing documenct object!

doc.ents = list(doc4.ents) + [new_ent]

In [29]:
doc.ents

(Tesla, U.K, $6 millions)

In [31]:
show_ents(doc) # Now you can see that Tesla is added as an 'entity' Organization in our actual context!

Tesla -- ORG -- Companies, agencies, institutions, etc.
U.K -- ORG -- Companies, agencies, institutions, etc.
$6 millions -- MONEY -- Monetary values, including unit
