<a href="https://colab.research.google.com/github/yeesem/Natural-Laguage-Processing/blob/main/Named_Entity_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [141]:
import spacy

In [142]:
nlp = spacy.load('en_core_web_sm')

# **Named Entity Recognition**

In [143]:
def show_ents(doc):
  print(doc.ents , '\n')
  if doc.ents:
    for ent in doc.ents:
      print(ent.text + ' - ' + ent.label_ + " " + spacy.explain(ent.label_))

  else:
    print("No entities found")

In [144]:
doc = nlp(u"Hi how are you?")

In [145]:
show_ents(doc)

() 

No entities found


In [146]:
doc = nlp(u"May i go to Washington, DC next May to see the Washington Monument")

In [147]:
show_ents(doc)

(Washington, DC, next May, the Washington Monument) 

Washington, DC - GPE Countries, cities, states
next May - DATE Absolute or relative dates or periods
the Washington Monument - ORG Companies, agencies, institutions, etc.


In [148]:
doc = nlp(u"Can I please have 500 dollars of Microsoft stock")

In [149]:
show_ents(doc)

(500 dollars, Microsoft) 

500 dollars - MONEY Monetary values, including unit
Microsoft - ORG Companies, agencies, institutions, etc.


In [150]:
doc = nlp(u"Tesla to build a Malaysia factory for $6 million")

In [151]:
show_ents(doc)

(Malaysia, $6 million) 

Malaysia - GPE Countries, cities, states
$6 million - MONEY Monetary values, including unit


In [152]:
from spacy.tokens import Span

In [153]:
ORG = doc.vocab.strings[u'ORG']

In [154]:
ORG

383

In [155]:
new_ent = Span(doc,0,1,label=ORG)

In [156]:
doc.ents = list(doc.ents) + [new_ent]

In [157]:
show_ents(doc)

(Tesla, Malaysia, $6 million) 

Tesla - ORG Companies, agencies, institutions, etc.
Malaysia - GPE Countries, cities, states
$6 million - MONEY Monetary values, including unit


In [158]:
doc2 = nlp(u"Tesla is going to build a factory in PB")

In [159]:
GPE = doc2.vocab.strings[u'GPE']

In [160]:
GPE

384

In [161]:
new_ent = Span(doc2,8,9,label = GPE)

In [162]:
doc2.ents = list(doc2.ents) + [new_ent]

In [163]:
show_ents(doc2)

(Tesla, PB) 

Tesla - ORG Companies, agencies, institutions, etc.
PB - GPE Countries, cities, states


In [164]:
doc = nlp(u"Our compnay created a brand new vacuum cleaner."
          u"This new vacuum-cleaner is the best in show")

In [165]:
show_ents(doc)

() 

No entities found


In [166]:
from spacy.matcher import PhraseMatcher

In [167]:
matcher = PhraseMatcher(nlp.vocab)

In [168]:
phrase_list = ['vacuum cleaner','vacuum-cleaner']

In [169]:
phrase_patterns = [nlp(text) for text in phrase_list]

In [170]:
matcher.add('newproduct',None,*phrase_patterns)

In [171]:
found_matches = matcher(doc)

In [172]:
found_matches

[(2689272359382549672, 6, 8), (2689272359382549672, 11, 14)]

In [173]:
from spacy.tokens import Span

In [174]:
PROD = doc.vocab.strings[u"PRODUCT"]

In [175]:
new_ents = [Span(doc,match[1],match[2],label = PROD) for match in found_matches]

In [176]:
doc.ents = list(doc.ents) + new_ents

In [177]:
show_ents(doc)

(vacuum cleaner, vacuum-cleaner) 

vacuum cleaner - PRODUCT Objects, vehicles, foods, etc. (not services)
vacuum-cleaner - PRODUCT Objects, vehicles, foods, etc. (not services)


In [178]:
doc = nlp(u"Originally I paid $29.95 for this car toy, but now it is marked down by 10 dollars.")

In [179]:
[ent for ent in doc.ents if ent.label_ == 'MONEY']

[29.95, 10 dollars]

In [180]:
len([ent for ent in doc.ents if ent.label_ == 'MONEY'])

2

# **Visualizing Name Entity Recognition**

In [181]:
from spacy import displacy

In [182]:
doc = nlp(u"Over the last quarter Apple sold nearly 20 thousand Iphone for a $6 million profit.")

In [183]:
displacy.render(doc,style = 'ent',jupyter = True)

In [184]:
doc = nlp(u"Over the last quarter Apple sold nearly 20 thousand Iphone 13 for a $6 million profit."
          u"By contrast, Sony only sold 8 thousand music players.")

In [185]:
displacy.render(doc,style = 'ent',jupyter = True)

In [186]:
for sent in doc.sents:
  displacy.render(nlp(sent.text),style = 'ent',jupyter = True)
  print('\n')









In [187]:
#Customise
#colors = {'ORG':'red'}
colors = {'ORG':'linear-gradient(purple,red,yellow)'}
#colors = {'ORG':'radial-gradient(yellow,green)'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}

In [188]:
displacy.render(doc,style = 'ent',jupyter = True,options = options)

In [189]:
displacy.serve(doc,style='ent',options=options)


Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


In [190]:
from spacy.tokens import Span

In [198]:
PROD = doc.vocab.strings[u"PRODUCT"]
new_ent = Span(doc,9,11,label = PROD)
new_ent = Span(doc,26,28,label = PROD)
doc.ents = list(doc.ents) + [new_ent]

show_ents(doc)

(the last quarter, Apple, nearly 20 thousand, Iphone 13, $6 million, Sony, 8 thousand, music players) 

the last quarter - DATE Absolute or relative dates or periods
Apple - ORG Companies, agencies, institutions, etc.
nearly 20 thousand - CARDINAL Numerals that do not fall under another type
Iphone 13 - PRODUCT Objects, vehicles, foods, etc. (not services)
$6 million - MONEY Monetary values, including unit
Sony - ORG Companies, agencies, institutions, etc.
8 thousand - CARDINAL Numerals that do not fall under another type
music players - PRODUCT Objects, vehicles, foods, etc. (not services)


In [199]:
displacy.render(doc,style = 'ent',jupyter = True,options = options)

In [202]:
for sent in doc.sents:
    displacy.render(sent,style = 'ent',jupyter = True)
    print('\n')







