# Speech and Language Processing
### An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st. ed.). Prentice Hall PTR, USA.

### Lemmatization

Lemmatization is a technique in Natural Language Processing (NLP) used to reduce words to their base or dictionary form, known as the lemma. The goal of lemmatization is to improve the accuracy and efficiency of text processing by reducing the number of unique words that need to be processed. Lemmatization is particularly useful in applications such as information retrieval or text classification, where the focus is on the underlying meaning of the words rather than their exact form.

Lemmatization works by using a dictionary or a morphological analysis to obtain the base form of a word. For example, the lemma of the words "ran", "runs", and "running" is "run". This process takes into account the context and part of speech of the word, so that the resulting lemma is a valid word in the language.

One of the advantages of lemmatization over stemming is that it produces a valid word in the language, which can improve the accuracy of text processing and analysis. However, lemmatization can be more computationally expensive than stemming, as it requires access to a dictionary or morphological analyzer.

Lemmatization is commonly used in applications such as information retrieval, text classification, and sentiment analysis. It can also be combined with other NLP techniques, such as part-of-speech tagging, to further improve the accuracy of text processing and analysis.

In summary, lemmatization is a valuable technique in NLP for reducing the number of unique words that need to be processed, improving the accuracy of text processing and analysis. It works by using a dictionary or morphological analysis to obtain the base form of a word, taking into account the context and part of speech of the word. While it can be more computationally expensive than stemming, it produces a valid word in the language, which can improve the accuracy of text processing and analysis.


<img src="https://d2mk45aasx86xg.cloudfront.net/difference_between_Stemming_and_lemmatization_8_11zon_452539721d.webp">

In [11]:
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer

# It's necessary define parts-of-speech (POS) tag in lemmatize call
words = ['change', 'changing', 'changes', 'changed', 'changer']

lemmatizer = WordNetLemmatizer()

lemmas = [lemmatizer.lemmatize(token, wordnet.VERB) for token in words]
lemmas

['change', 'change', 'change', 'change', 'changer']