<a href="https://colab.research.google.com/github/thiago2608santana/Natural_Language_Processing_with_Python/blob/main/Named_Entity_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import spacy

In [2]:
nlp = spacy.load('en_core_web_sm')

In [3]:
def show_ents(doc):
  if doc.ents:
    for ent in doc.ents:
      print(ent.text + ' - ' + ent.label_ + ' - ' + str(spacy.explain(ent.label_)))
  else:
    print('No entities found')

In [4]:
doc = nlp('Hi, how are you?')

In [5]:
show_ents(doc)

No entities found


In [6]:
doc2 = nlp('May I go to Washington, DC next May to see the Washington Monument?')

In [7]:
show_ents(doc2)

Washington, DC - GPE - Countries, cities, states
next May - DATE - Absolute or relative dates or periods
the Washington Monument - ORG - Companies, agencies, institutions, etc.


In [8]:
doc3 = nlp('Can I please have 500 dollars of Microsoft stocks?')

In [9]:
show_ents(doc3)

500 dollars - MONEY - Monetary values, including unit
Microsoft - ORG - Companies, agencies, institutions, etc.


In [10]:
doc4 = nlp('Tesla to build a U.K. factory for 6$ million.')

In [11]:
show_ents(doc4)

U.K. - GPE - Countries, cities, states
6$ million - MONEY - Monetary values, including unit


In [12]:
from spacy.tokens import Span

In [13]:
ORG = doc4.vocab.strings['ORG']

In [14]:
ORG

383

In [15]:
new_ent = Span(doc4, 0, 1, label=ORG)

In [16]:
doc4.ents = list(doc4.ents) + [new_ent]

In [17]:
show_ents(doc4)

Tesla - ORG - Companies, agencies, institutions, etc.
U.K. - GPE - Countries, cities, states
6$ million - MONEY - Monetary values, including unit


# Adicionar múltiplas ocorências de entidades

In [18]:
doc5 = nlp('Our company created a brand new vacuum cleaner. This new vacuum-cleaner is the best in show.')

In [19]:
show_ents(doc5)

No entities found


In [20]:
from spacy.matcher import PhraseMatcher

In [21]:
matcher = PhraseMatcher(nlp.vocab)

In [22]:
phrase_list = ['vacuum cleaner', 'vacuum-cleaner']

In [23]:
phrase_patterns = [nlp(text) for text in phrase_list]

In [25]:
matcher.add('newproduct', phrase_patterns)

In [26]:
found_matches = matcher(doc5)

In [27]:
found_matches

[(2689272359382549672, 6, 8), (2689272359382549672, 11, 14)]

In [28]:
from spacy.tokens import Span

In [30]:
PROD = doc.vocab.strings['PRODUCT']

In [31]:
new_ents = [Span(doc5, match[1], match[2], label=PROD) for match in found_matches]

In [35]:
doc5.ents = list(doc5.ents) + new_ents

In [36]:
show_ents(doc5)

vacuum cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)
vacuum-cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)


In [37]:
doc6 = nlp('Originally I paid $29.95 for this car toy, but now it is marked down by 10 dollars.')

In [41]:
#[ent for ent in doc6.ents if ent.label_ == 'MONEY']
len([ent for ent in doc6.ents if ent.label_ == 'MONEY'])

2