<a href="https://colab.research.google.com/github/victor-roris/mediumseries/blob/master/NLP/SpacyWordnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SpaCy Wordnet

WordNet provides a lexical database for English – in other words, it's a computable thesaurus.

There's a spaCy integration for WordNet called spacy-wordnet by Daniel Vila Suero, an expert in natural language and knowledge graph work.

GitHub: https://github.com/recognai/spacy-wordnet


## Installation

In [3]:
!pip install spacy-wordnet



We download the thesaurus

In [2]:
import nltk
nltk.download("wordnet")

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

We download the spacy model

In [1]:
! python -m spacy download en_core_web_sm

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


## Example of usage

In [0]:
import spacy
nlp = spacy.load("en_core_web_sm")

We add the Wordnet annotator as a component in the spaCy pipeline

In [3]:
from spacy_wordnet.wordnet_annotator import WordnetAnnotator

print("before", nlp.pipe_names)

if "WordnetAnnotator" not in nlp.pipe_names:
    nlp.add_pipe(WordnetAnnotator(nlp.lang), after="tagger")
    
print("after", nlp.pipe_names)

before ['tagger', 'parser', 'ner']
after ['tagger', 'WordnetAnnotator', 'parser', 'ner']


Get a word synonyms

In [4]:
token = nlp("withdraw")[0]
token._.wordnet.synsets()

[Synset('withdraw.v.01'),
 Synset('retire.v.02'),
 Synset('disengage.v.01'),
 Synset('recall.v.07'),
 Synset('swallow.v.05'),
 Synset('seclude.v.01'),
 Synset('adjourn.v.02'),
 Synset('bow_out.v.02'),
 Synset('withdraw.v.09'),
 Synset('retire.v.08'),
 Synset('retreat.v.04'),
 Synset('remove.v.01')]

In [7]:
for s in token._.wordnet.synsets():
  print(s.lemma_names())

['withdraw', 'retreat', 'pull_away', 'draw_back', 'recede', 'pull_back', 'retire', 'move_back']
['retire', 'withdraw']
['disengage', 'withdraw']
['recall', 'call_in', 'call_back', 'withdraw']
['swallow', 'take_back', 'unsay', 'withdraw']
['seclude', 'sequester', 'sequestrate', 'withdraw']
['adjourn', 'withdraw', 'retire']
['bow_out', 'withdraw']
['withdraw', 'draw', 'take_out', 'draw_off']
['retire', 'withdraw']
['retreat', 'pull_back', 'back_out', 'back_away', 'crawfish', 'crawfish_out', "pull_in_one's_horns", 'withdraw']
['remove', 'take', 'take_away', 'withdraw']


Get lemmas

In [14]:
token._.wordnet.lemmas()[0:10]

[Lemma('withdraw.v.01.withdraw'),
 Lemma('withdraw.v.01.retreat'),
 Lemma('withdraw.v.01.pull_away'),
 Lemma('withdraw.v.01.draw_back'),
 Lemma('withdraw.v.01.recede'),
 Lemma('withdraw.v.01.pull_back'),
 Lemma('withdraw.v.01.retire'),
 Lemma('withdraw.v.01.move_back'),
 Lemma('retire.v.02.retire'),
 Lemma('retire.v.02.withdraw')]

In [15]:
for s in token._.wordnet.lemmas()[0:10]:
  print(s.name())

withdraw
retreat
pull_away
draw_back
recede
pull_back
retire
move_back
retire
withdraw


Domains of the Wordnet

In [16]:
token._.wordnet.wordnet_domains()[0:10]

['astronomy',
 'school',
 'telegraphy',
 'industry',
 'psychology',
 'ethnology',
 'ethnology',
 'administration',
 'school',
 'finance']

### Example of applicability

Complete a sentence with the word synonyms for a specific group of domains.



In [17]:
domains = ["finance", "banking"]
sentence = nlp("I want to withdraw 5,000 euros.")

enriched_sent = []

for token in sentence:
    # get synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(domains)
    
    if synsets:
        lemmas_for_synset = []
        
        for s in synsets:
            # get synset variants and add to the enriched sentence
            lemmas_for_synset.extend(s.lemma_names())
            enriched_sent.append("({})".format("|".join(set(lemmas_for_synset))))
    else:
        enriched_sent.append(token.text)

print(" ".join(enriched_sent))

I (need|want|require) to (draw_off|take_out|draw|withdraw) 5,000 euros .
