#### Lemmatization is closely related to stemming. Lemmatization returns the lemmas of the word which is the base/root word.

Lemmatization in NLTK can be done using WordNet’s Lemmatizer. 

WordNet is a lexical database of English.

In [1]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

import nltk; nltk.download('wordnet')

In [2]:
# Lemmatisation depends upon the Part of Speech of the word
# lemmatize(word, pos=NOUN)
# the default part of speech (pos) for lemmatize method is "n", i.e. noun
# we can specify part of speech (pos) value like below:
# noun = n, verb = v, adjective = a, adverb = r
 
print (lemmatizer.lemmatize('is')) # output: is
print (lemmatizer.lemmatize('are')) # output: are
 
print (lemmatizer.lemmatize('is', pos='v')) # output: be
print (lemmatizer.lemmatize('are', pos='v')) # output: be
 
print (lemmatizer.lemmatize('working', pos='n')) # output: working
print (lemmatizer.lemmatize('working', pos='v')) # output: work


is
are
be
be
working
work


#### Lemmatising text document

We need to first convert the text into word tokens.

After that, we can lemmatize each word of the token list.

In [3]:
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

In [4]:
text = "A quick brown fox jumps over the lazy dog."

In [5]:
# Normalize text
# NLTK considers capital letters and small letters differently.
# For example, Fox and fox are considered as two different words.
# Hence, we convert all letters of our text into lowercase.
text = text.lower()

In [6]:
# tokenize text 
words = word_tokenize(text)
 
print (words)

['a', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']


In [7]:
lemmatizer = WordNetLemmatizer()

In [8]:
words_lemma = [lemmatizer.lemmatize(word) for word in words]
 
# The above line of code is a shorter version of the following code:
'''
words_lemma = []
 
for word in words:
    words_lemma.append(lemmatizer.lemmatize(word))
'''
 
#words_lemma_2 = [str(item) for item in words_lemma]
#print (words_lemma_2)
 
print (words_lemma)

['a', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']


* We can see in the above code that the word jumps has been converted to its base word jump.