# Clarify Word Meaning Disambiguation using Natural Language Processing

## Import Libraries

In [1]:
import nltk

In [2]:
from nltk.stem.lancaster import LancasterStemmer    #using lancaster stemmer algorithm
from nltk.tokenize import word_tokenize
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn

Let us now consider a sample text to try with.

We will use Lancaster Stemmer algorithm to find the stem/root for all words. Note that the past, past participles, continuous and other forms of the same word can be ignored.

## Tokenize

In [17]:
text = "Sing in a lower tone, along with the bass."

st = LancasterStemmer()
stemmedWords = [st.stem(word) for word in word_tokenize(text)]

print stemmedWords

['sing', 'in', 'a', 'low', 'ton', ',', 'along', 'with', 'the', 'bass', '.']


In [18]:
print(nltk.pos_tag(word_tokenize(text)))

[('Sing', 'VBG'), ('in', 'IN'), ('a', 'DT'), ('lower', 'JJR'), ('tone', 'NN'), (',', ','), ('along', 'IN'), ('with', 'IN'), ('the', 'DT'), ('bass', 'NN'), ('.', '.')]


Now, suppose we want to see in the text, in which context the "bass" word is used. Before that let us check how many meanings are present for bass in english.

In [19]:
for ss in wn.synsets('bass'):
    print(ss, ss.definition())

(Synset('bass.n.01'), u'the lowest part of the musical range')
(Synset('bass.n.02'), u'the lowest part in polyphonic music')
(Synset('bass.n.03'), u'an adult male singer with the lowest voice')
(Synset('sea_bass.n.01'), u'the lean flesh of a saltwater fish of the family Serranidae')
(Synset('freshwater_bass.n.01'), u'any of various North American freshwater fish with lean flesh (especially of the genus Micropterus)')
(Synset('bass.n.06'), u'the lowest adult male singing voice')
(Synset('bass.n.07'), u'the member with the lowest range of a family of musical instruments')
(Synset('bass.n.08'), u'nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes')
(Synset('bass.s.01'), u'having or denoting a low vocal or instrumental range')


As we see from above, that there are nearly 9 meanings of the word bass. But the bass used in the text is in some musical context. Now, let us see whether we can precdict the same using out technique?

## Prediction

In [20]:
sensel = lesk(word_tokenize(text), "bass")
print(sensel, sensel.definition())

(Synset('bass.n.07'), u'the member with the lowest range of a family of musical instruments')


Indeed, the technique predicts correctly. This technique can again be used in lot of fields and has a good practical implications.