### Stemming
Stemming is the process of reducing words to their root form, which helps in text normalization for tasks like search and text analysis. 

| Original words: ['running', 'jumps', 'happily', 'running', 'happily']|
|----------------------------------------------------------------------|
| Stemmed words: ['run', 'jump', 'happili', 'run', 'happili']          |

### Types of Stemmer in NLTK
1) Porter's Stemmer
2) Regexp Stemmer
3) Snowball Stemmer (Best technique)
4) etc.



### ## 1. Porter's Stemmer
The group of stems is mapped on to the same stem and the output stem is not necessarily a meaningful word.

In [1]:
## 1. Porter's Stemmer

from nltk.stem import PorterStemmer

# Create a Porter Stemmer instance
porter_stemmer = PorterStemmer()

# Example words for stemming
words = ["running", "jumps", "happily", "running", "happily"]

# Apply stemming to each word
stemmed_words = [porter_stemmer.stem(word) for word in words]

# Print the results
print("Original words:", words)
print("Stemmed words:", stemmed_words)

Original words: ['running', 'jumps', 'happily', 'running', 'happily']
Stemmed words: ['run', 'jump', 'happili', 'run', 'happili']


### ## 2. Regexp Stemmer
The Regexp Stemmer, or Regular Expression Stemmer, is a stemming algorithm that utilizes regular expressions to identify and remove suffixes from words

In [9]:
from nltk.stem import RegexpStemmer

# Create a Regexp Stemmer with a custom rule
# $ - If youhave not used the $ sign in the last regular expression then it will remove all the ing from the word
custom_rule = r'ing$'
regexp_stemmer = RegexpStemmer(custom_rule)

# Apply the stemmer to a word
word = 'running'
stemmed_word = regexp_stemmer.stem(word)

print(f'Original Word: {word}')
print(f'Stemmed Word: {stemmed_word}')

Original Word: running
Stemmed Word: runn


### ## 3. Snowball Stemmer
The Snowball Stemmer, compared to the Porter Stemmer, is multi-lingual as it can handle non-English words. It supports various languages and is based on the 'Snowball' programming language, known for efficient processing of small strings.

The Snowball stemmer is way more aggressive than Porter Stemmer.

In [10]:
from nltk.stem import SnowballStemmer

# Choose a language for stemming, for example, English
stemmer = SnowballStemmer(language='english')

# Example words to stem
words_to_stem = ['running', 'jumped', 'happily', 'quickly', 'foxes']

# Apply Snowball Stemmer
stemmed_words = [stemmer.stem(word) for word in words_to_stem]

# Print the results
print("Original words:", words_to_stem)
print("Stemmed words:", stemmed_words)

Original words: ['running', 'jumped', 'happily', 'quickly', 'foxes']
Stemmed words: ['run', 'jump', 'happili', 'quick', 'fox']


### Disadvantage of Stemming
Loss of meaning of some words

