# **Stemming**

Stemming is the process of reducing a word to its word stem that affixes and prefixes or to the roots of words known as lemma. Stemming is important in natural language understanding(NLU) and natural language processing(NLP)

# **Porter Stemmer**

In [5]:
from nltk.stem import PorterStemmer

In [6]:
stemming = PorterStemmer()

In [7]:
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]
for word in words:
  print(word+"----->" + stemming.stem(word))

eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalized----->final


In [8]:
stemming.stem('congratulations')


'congratul'

In [None]:
stemming.stem('sitting')

'sit'

# **Lancaster Stemming**

- A more aggressive stemming algorithm than Porter Stemmer.
- Often results in very short word stems, which may not always be useful.

Example:
```
running → run
happiness → happy
```




In [9]:
### Lancaster Stemming algorithm
from nltk.stem import LancasterStemmer

In [10]:
lancaster = LancasterStemmer()

In [11]:
for word in words:
  print(word + "---->" + lancaster.stem(word))

eating---->eat
eats---->eat
eaten---->eat
writing---->writ
writes---->writ
programming---->program
programs---->program
history---->hist
finally---->fin
finalized---->fin


# **RegexpStemmer Class**
NLTK has RegexStemmer class with the help of which we can easily implement Regular Expression Stemmer Algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example:

In [None]:
from nltk.stem import RegexpStemmer

In [None]:
reg_stemmer = RegexpStemmer('ing|s$|e$|able$', min = 4) ### or else you can use here ing$ for checking last characters of the string.

In [None]:
reg_stemmer.stem("eating")

'eat'

In [None]:
reg_stemmer.stem("ingplaying")

'play'

# **Snowball Stemmer**
- An improved version of Porter Stemmer.
- More efficient and provides better accuracy.
- Supports multiple languages.

Example:
```
running → run
happiness → happi

```




In [None]:
from nltk.stem import SnowballStemmer

In [None]:
SnowballStemmer = SnowballStemmer('english', ignore_stopwords = False)

In [None]:
for word in words:
  print(word + "--->" + SnowballStemmer.stem(word))

eating--->eat
eats--->eat
eaten--->eaten
writing--->write
writes--->write
programming--->program
programs--->program
history--->histori
finally--->final
finalized--->final


In [None]:
stemming.stem("fairly"), stemming.stem("sportingly")

('fairli', 'sportingli')

In [None]:
SnowballStemmer.stem("fairly"), SnowballStemmer.stem("sportingly")

('fair', 'sport')

# **Wordnet Lemmatizer**
Lemmatization technique is like stemming. The output we will get after lemmatization is called 'lemma', which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses `morphy()` function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example.

In [None]:
import nltk
nltk.download('wordnet')

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()


[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
for word in words:
  print(word + "--->" + lemmatizer.lemmatize(word))

# '''
# default notation of lemmatization

# def lemmatize(self, word: str, pos: str = "n")

# POS - Noun - n
# verb - v
# adjective - a
# adverb - r
# '''


eating--->eating
eats--->eats
eaten--->eaten
writing--->writing
writes--->writes
programming--->programming
programs--->program
history--->history
finally--->finally
finalized--->finalized


'\ndef lemmatize(self, word: str, pos: str = "n")\n\nPOS - Noun - n\nverb - v\nadjective - a\nadverb - r\n'

In [None]:
for word in words:
  print(word + "--->" + lemmatizer.lemmatize(word, pos='v'))

eating--->eat
eats--->eat
eaten--->eat
writing--->write
writes--->write
programming--->program
programs--->program
history--->history
finally--->finally
finalized--->finalize


In [None]:
for word in words:
  print(word + "--->" + lemmatizer.lemmatize(word, pos='a'))

eating--->eating
eats--->eats
eaten--->eaten
writing--->writing
writes--->writes
programming--->programming
programs--->programs
history--->history
finally--->finally
finalized--->finalized


In [None]:
lemmatizer.lemmatize("good", pos='v')

'good'

In [None]:
### sentiment analysis -- stemming
### chatbot ---> Lemmatization