# 2.7 Lemmatization

Where stemming removes the last few characters of a word, lemmatization stems the word to a more meaningful base form and ensures it does not lose it's meaning. Lemmatization works more intelligently, referencing a pre-defined dictionary containing the context of words and uses this when diminishing the word to the base form. 

In [4]:
connect_tokens = ['connecting', 'connected', 'connectivity', 'connect', 'connects']

In [5]:
learn_tokens = ['learned', 'learning', 'learn', 'learns', 'learner', 'learners']

In [6]:
likes_tokens = ['likes', 'better', 'worse']

## Stemming

In [7]:
from nltk.stem import PorterStemmer

In [8]:
# create stemmer
ps = PorterStemmer()

In [9]:
for t in connect_tokens:
    print(t, " : ", ps.stem(t))

connecting  :  connect
connected  :  connect
connectivity  :  connect
connect  :  connect
connects  :  connect


In [10]:
for t in learn_tokens:
    print(t, " : ", ps.stem(t))

learned  :  learn
learning  :  learn
learn  :  learn
learns  :  learn
learner  :  learner
learners  :  learner


In [11]:
for t in likes_tokens:
    print(t, " : ", ps.stem(t))

likes  :  like
better  :  better
worse  :  wors


## Lemmatization

In [None]:
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer

In [13]:
# create lemmatizer 
lemmatizer = WordNetLemmatizer()

In [14]:
for t in connect_tokens:
    print(t, " : ", lemmatizer.lemmatize(t))

connecting  :  connecting
connected  :  connected
connectivity  :  connectivity
connect  :  connect
connects  :  connects


In [15]:
for t in learn_tokens:
    print(t, " : ", lemmatizer.lemmatize(t))

learned  :  learned
learning  :  learning
learn  :  learn
learns  :  learns
learner  :  learner
learners  :  learner


In [16]:
for t in likes_tokens:
    print(t, " : ", lemmatizer.lemmatize(t))

likes  :  like
better  :  better
worse  :  worse


## Another example

In [18]:
# default POS
print("learned (noun):", lemmatizer.lemmatize("learned"))
# correct POS
print("learned (verb):", lemmatizer.lemmatize("learned", pos='v'))

learned (noun): learned
learned (verb): learn


## What I Learned

- Lemmatization uses **vocabulary + grammar** → more accurate than stemming

- It requires POS tags for best results.

- Combine with nltk.pos_tag() or spaCy for better lemmatization in pipelines.
