## Stemming
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [2]:
import nltk
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

# PorterStemmer in NLTK

`PorterStemmer` is a stemming algorithm in NLTK used to reduce words to their **root or base form** by removing suffixes. While it simplifies words to a common base, the results may not always be meaningful words.

## Example

In [3]:
from nltk.stem import PorterStemmer
stemming=PorterStemmer()

In [4]:
for word in words:
    print(word+"=="+stemming.stem(word))

eating==eat
eats==eat
eaten==eaten
writing==write
writes==write
programming==program
programs==program
history==histori
finally==final
finalized==final


## Key Points

- **Normalization:** Converts different forms of the same word (e.g., "playing", "plays") to a common base ("play").  
- **Efficiency:** Helps group related words, improving search engine and NLP model performance.  
- **Limitation:** Results may not always be meaningful (e.g., "better" â†’ "bet").  

PorterStemmer is useful for fast text normalization in NLP pipelines.

- **You can see examples of this limitation below**

In [17]:
stemming.stem('congratulations'),stemming.stem("sitting")


('congratul', 'sit')

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [9]:
from nltk.stem import RegexpStemmer
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

**The min parameter specifies the minimum length a word must have for the regular expression to apply. If the word is shorter than the value provided (in this case, 4 characters), the stemmer will not modify the word, even if it matches the regular expression.**



In [10]:
reg_stemmer.stem('running')

'runn'

In [11]:
reg_stemmer.stem('ingeating')

'ingeat'

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [12]:
from nltk.stem import SnowballStemmer
snowballsstemmer=SnowballStemmer('english')


In [13]:
for word in words:
    print(word+"=="+snowballsstemmer.stem(word))

eating==eat
eats==eat
eaten==eaten
writing==write
writes==write
programming==program
programs==program
history==histori
finally==final
finalized==final


### PorterStemmer vs SnowballStemmer

In [14]:
stemming.stem("fairly"),stemming.stem("sportingly")

#porterstemmer

('fairli', 'sportingli')

In [15]:
snowballsstemmer.stem("fairly"), snowballsstemmer.stem("sportingly")

#snowballstemmer

('fair', 'sport')