## Named Entities Recognition:

Named Entities Recognition (NER) seeks to locate and classify Named Entities, mentioned in unstructured text into a pre-defined categories such as the Person, Names and Organizations, locations, medical codes, time expressions, quantities, monetary values and percentages etc.

Our goal is to grab a raw text, and add in some additional information, such as named entities for corresponding words. E.g

"James bought and iPhone for Apple store in St. Louis"

James -> Person,
Apple -> Organization,
St.Louis -> Location

Let's explore NER with Spacy, and also see how to add our own custom entities!

In [1]:
import spacy 

nlp = spacy.load('en_core_web_sm')

In [2]:
def show_ents(doc):
    if doc.ents: # check if the doc has named entities
        for ent in doc.ents: 
            print(ent.text + ' -- ' + ent.label_ + ' -- ' + str(spacy.explain(ent.label_)))
            
    else:
        print('No entities found!')

In [3]:
doc = nlp(u"Hi! How are you?")

show_ents(doc) # let's see if our doc has any Name entity!

No entities found!


In [4]:
# Let's try another one!

doc2 = nlp(u"The quick brown fox jumped over the lazy dog!")

show_ents(doc2)

No entities found!


In [5]:
doc3 = nlp(u"I bought and iPhone recently from Apple store")

show_ents(doc3)
# There you go we finally have some named entities in our sentence!

iPhone -- ORG -- Companies, agencies, institutions, etc.
Apple -- ORG -- Companies, agencies, institutions, etc.


In [6]:
doc3 = nlp(u"can I have a 500 dollars of Microsoft stock")

show_ents(doc3)

500 dollars -- MONEY -- Monetary values, including unit
Microsoft -- ORG -- Companies, agencies, institutions, etc.


In [8]:
# Notice above, spacy is smart enough to understand that, the number 500 and word 'dollars' go together!

In [23]:
doc = nlp(u"Tesla to built a U.K factory for $6 millions")

show_ents(doc4)

U.K -- ORG -- Companies, agencies, institutions, etc.
$6 millions -- MONEY -- Monetary values, including unit


## Adding Named Entities to A Span:

In [24]:
from spacy.tokens import Span

In [25]:
ORG = doc.vocab.strings[u"ORG"]

In [26]:
ORG # reports back the hashed numerical id for that particular word string

381

In [32]:
# let's now create a span for the new entities

new_ent = Span(doc4, 0, 1, label = ORG) # grab the word at index 0, having label = ORG (we defined above)
# assigning our own label to the entity we wanna add to our doc!

In [28]:
# adding an entity to an existing documenct object!

doc.ents = list(doc4.ents) + [new_ent]

In [29]:
doc.ents

(Tesla, U.K, $6 millions)

In [31]:
show_ents(doc) # Now you can see that Tesla is added as an 'entity' Organization in our actual context!

Tesla -- ORG -- Companies, agencies, institutions, etc.
U.K -- ORG -- Companies, agencies, institutions, etc.
$6 millions -- MONEY -- Monetary values, including unit


## Adding Multiple NERs:

We've seen how to add a single term as our own NER. For example, (Tesla as an Organization). But what if we have severals terms to add as possible NERs. Let's go over and see how to add multiple Phrases as NERs.

For instance, if we're working with a Vaccume cleaner company, we can add both "__Vaccume-cleaner__" and "__Vaccume cleaner__" as a (PROD) -> PRODUCT!

LET'S FIND OUT HOW WE CAN DO THIS..!

In [72]:
doc = nlp(u"Our company created a brand new vacuum cleaner."
         u"This new vacuum-clearner is the best show.")

show_ents(doc) # Just to make sure "vaccume cleaner" is not an entity right now

# Let's go ahead and add it now!

No entities found!


In [73]:
from spacy.matcher import PhraseMatcher

In [80]:
matcher = PhraseMatcher(nlp.vocab) # create a matcher object

In [81]:
phrase_list = ['vacuum cleaner', 'vacuum-cleaner'] # create a phrase_list to be added

In [82]:
phrase_patterns = [nlp(text) for text in phrase_list] # transform the list into phrase patterns list

In [83]:
matcher.add('newproduct', None, *phrase_patterns) # add the items one by one to the matcher object

In [84]:
found_matches = matcher(doc) # check out the matches

In [86]:
print(found_matches)

[(2689272359382549672, 6, 8)]


In [87]:
from spacy.tokens import Span

In [88]:
# Now let's define the category of our named entity

PROD = doc.vocab.strings[u"PRODUCT"]

In [89]:
PROD

384

In [91]:
# found_matches

In [95]:
new_ents = [Span(doc, match[1], match[2], label = PROD) for match in found_matches] # off of the Span class 
# we gave 'doc', 'start', 'end' and 'label' for our entity to grab the Entity! 

# and now we'll add this new entity to our doc.ents list

In [96]:
print(new_ents)

[vacuum cleaner]


In [97]:
doc.ents = list(doc.ents) + new_ents

In [99]:
# Now let's check out if we have successfully added a named entity to our list

show_ents(doc)

vacuum cleaner -- PRODUCT -- Objects, vehicles, foods, etc. (not services)


In [104]:
# Now let's say we're interested in knowing that how many times a particular entity has repeated in our document
# Let's say we're looking for the money!

doc = nlp(u"Originally I paid $29.15 for the car toy, now it has marked down to 10 dollars")

In [106]:
[ent for ent in doc.ents if ent.label_ == 'MONEY'] # reports back the entities haveing label = "MONEY"

[29.15, 10 dollars]

In [110]:
# and if we want to check out the number to time the entity has repeated
# we simply can find out the length of it..

len([ent for ent in doc.ents if ent.label_ == 'MONEY'])

# There you go!

2

## Visualizing Named Entities:

Let's Visualize NER with Spacy and Displacy.

In [111]:
from spacy import displacy

In [118]:
doc = nlp(u"Over the last quarter, Apple sold nearly 20 thousand iPods, with the profit of $4 millions."
         u"In contrast, sony only sold 8 thousand walkman music players")

In [117]:
# print(doc)

In [119]:
displacy.render(doc, style = 'ent', jupyter = True)

In [120]:
# We can use 'options' argument inside render() for customization, which we have done recently,
# but more importantly we can choose, which entity to show or which to not!

# for instance we only wanna see the entities that are PRODUCTS
# so we'll do the following thing

options = {'ents':['PRODUCT']}

In [122]:
displacy.render(doc, style = 'ent', jupyter = True, options = options)

# Now it'll only report back the entities that are the PRODUCTS!

In [124]:
options = {'ents': ['PRODUCT', 'ORG']}

displacy.render(doc, style = 'ent', jupyter = True, options = options)

# perfect!

In [129]:
# Lets do something more cooler

colors = {'ORG':'red'}
options = {'ents':['ORG'], 'colors': colors}

displacy.render(doc, style = 'ent', jupyter = True, options = options)

# ORG -> with a RED color background

In [133]:
colors = {'ORG':'radial-gradient(yellow, green)', 'PRODUCT':'radial-gradient(yellow, red)'}
options = {'ents': ['ORG','PRODUCT'], 'colors': colors}

displacy.render(doc, style = 'ent', jupyter = True, options = options)

# You can do a lots of customization with color scheme using 'options' and 'colors'