## Short summary to natural language processing

Here are two examples of the most common task of natural language processing: getting the basic form of a word for further analysis. This process is called _stemming_ or _lemmatizing_ depending on the approach used. Furthermore, we have conducted _tagging_ of the words on the latter example. Tags identify different (assumed) meanings words have in the sentence and can be used e.g. in lemmatizing process.

In [8]:
import nltk
import nltk.data

lemma = nltk.stem.wordnet.WordNetLemmatizer()
stemma = nltk.stem.lancaster.LancasterStemmer()

words = nltk.word_tokenize("Does a cat have for legs and two ears? Yes, indeed - a cat has legs and ears.")
for word in words:
    print word, lemma.lemmatize( word ), stemma.stem( word )

Does Does doe
a a a
cat cat cat
have have hav
for for for
legs leg leg
and and and
two two two
ears ear ear
? ? ?
Yes Yes ye
, , ,
indeed indeed indee
- - -
a a a
cat cat cat
has ha has
legs leg leg
and and and
ears ear ear
. . .


In [15]:
import nltk
import nltk.data

lemma = nltk.stem.wordnet.WordNetLemmatizer()
stemma = nltk.stem.lancaster.LancasterStemmer()

from nltk.corpus import wordnet

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    
    return wordnet.NOUN ## default to
    
words = nltk.word_tokenize("Does a cat have for legs and two ears? Yes, indeed - a cat has legs and ears.")
for tagged_word in nltk.pos_tag( words ):
    word = tagged_word[0]
    tag = tagged_word[1]
    print word, lemma.lemmatize( word , get_wordnet_pos( tag ) ), lemma.lemmatize( word ), stemma.stem( word )

Does Does Does doe
a a a a
cat cat cat cat
have have have hav
for for for for
legs leg leg leg
and and and and
two two two two
ears ear ear ear
? ? ? ?
Yes Yes Yes ye
, , , ,
indeed indeed indeed indee
- - - -
a a a a
cat cat cat cat
has have ha has
legs legs leg leg
and and and and
ears ear ear ear
. . . .
