# Lemmatization

**Lemmatization** is a text-normalization technique that reduces words to their **lemma**—the dictionary/base form—using linguistic rules and a vocabulary.

## Why it’s used
- **Group word variants**: treat *connect*, *connecting*, *connected* as the same underlying term.
- **Improve consistency** for tasks like search, topic modeling, and feature engineering.

## Lemmatization vs. Stemming
- **Stemming**: chops off endings (often produces non-words).  
  Example: *studies → studi*
- **Lemmatization**: returns a valid base word using language knowledge.  
  Example: *studies → study*

## Part-of-speech (POS) matters
The same word can have different lemmas depending on its POS:
- As a **noun**: *connections → connection*
- As a **verb**: *connecting → connect*

> In NLTK, `WordNetLemmatizer` often needs the correct POS (`"n"`, `"v"`, `"a"`, `"r"`) to produce the expected lemma.


In [9]:
import nltk

In [10]:
words = [
    "connect", "connected", "connecting", "connection", "connections",
    "compute", "computer", "computers", "computing", "computation",
    "analyze", "analyzing", "analyzed", "analysis", "analyst", "analytics"
]

In [11]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [17]:
lemmatizer.lemmatize("connecting", pos="v")

'connect'

In [19]:
for word in words:
    print(word+"--->"+lemmatizer.lemmatize(word, pos="n"))

connect--->connect
connected--->connected
connecting--->connecting
connection--->connection
connections--->connection
compute--->compute
computer--->computer
computers--->computer
computing--->computing
computation--->computation
analyze--->analyze
analyzing--->analyzing
analyzed--->analyzed
analysis--->analysis
analyst--->analyst
analytics--->analytics


In [18]:
for word in words:
    print(word+"--->"+lemmatizer.lemmatize(word, pos="v"))

connect--->connect
connected--->connect
connecting--->connect
connection--->connection
connections--->connections
compute--->compute
computer--->computer
computers--->computers
computing--->compute
computation--->computation
analyze--->analyze
analyzing--->analyze
analyzed--->analyze
analysis--->analysis
analyst--->analyst
analytics--->analytics
