# POS-Lemma

> Specifying the `Part-of-Speech` (POS) of a word to the WordNetLemmatizer makes it more efficient. Run the code below to see the difference.

In [1]:
from nltk.stem import WordNetLemmatizer, wordnet

lemmatizer = WordNetLemmatizer()

print("Without POS tag %s %s" % (":", lemmatizer.lemmatize("loving")))
print("With POS tag %s %s" % (":", lemmatizer.lemmatize("loving", pos = "v")))

Without POS tag : loving
With POS tag : love


Understanding the `pos_tag` from `nltk`.

Run the following cells:

In [2]:
from nltk import pos_tag

In [3]:
noun = "love"
adjective = "big"
adverb = "lovely"
verb = "loving"

In [6]:
import nltk
nltk.download('averaged_perceptron_tagger')
pos_tag([noun])[0][1][0].upper() # --> N for noun

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/aygul_unix/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


'N'

In [7]:
pos_tag([adjective])[0][1][0].upper() # --> J for adjective

'J'

In [8]:
pos_tag([adverb])[0][1][0].upper() # --> R for adverb

'R'

In [9]:
pos_tag([verb])[0][1][0].upper() # --> V for verb

'V'

❓ **Question** ❓

Create a function that lemmatizes your text, taking into account the associated POS tags. 

💡 Hint: The `WordNetLemmatizer` requires the POS tags to be specified in a certain form, different from the tags outputed by `nltk.pos_tag`. You will need to map them to the correct form.

In [10]:
# ------
# Map a POS tag to a format WordNetLemmatizer accepts:
# ------

from nltk.corpus import wordnet

def get_wordnet_pos(word):
    '''returns the POS tag in a format understood
    by the WordNetLemmatizer'''

    tag = pos_tag([word])[0][1][0].upper()

    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}

    return tag_dict.get(tag, wordnet.NOUN)


# ------
# Lemmatize
# ------

from nltk.tokenize import word_tokenize

def pos_lemma(text):

    lemmatized = [lemmatizer.lemmatize(w, get_wordnet_pos(w))
                  for w in word_tokenize(text)]
    return lemmatized


👇 Try your function:

In [11]:
sentence = "I am loving Paris"

In [12]:
pos_lemma(sentence)

['I', 'be', 'love', 'Paris']

🏁 Congratulations. With this minichallenge, you've raised some self-awareness about to find the root of a word, no matter if this is a noun, an adjective, an adverb or a verb.

