# Stemming
Stemming is a technique where we simplify a word to its basic form (called a stem), by removing things like prefixes or suffixes, so that it's closer to the root or dictionary form (called a lemma). Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

- The word stem is the basic core part of the word.
- affixes = things you add to the base word
- suffixes = added to the end (e.g., -ing, -ed)
- prefixes = added to the start (e.g., un-, re-)


In [1]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> [eating, eat,eaten]---> root word is eat [going,gone,goes]---> root word is go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

## PorterStemmer

In [2]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()

for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [7]:
stemming.stem('congratulations')

'congratul'

In [8]:
stemming.stem("sitting")

'sit'

## RegexpStemmer class
RegexpStemmer is a class in the Natural Language Toolkit (NLTK) Python library that removes prefixes or suffixes from words using regular expression (regex) patterns. It is a rule-based stemmer that reduces words to their root forms by applying pattern-matching rules, rather than relying on a predefined dictionary or linguistic rules.

In [28]:
from nltk.stem import RegexpStemmer
reg_stemmer=RegexpStemmer('^ing|s$|e$|able$', min=4)

# ^ing → removes "un" at the start
# able$ → remove 'able' at the end
# min is the minimum length that a word must have before stemming in order for the regex pattern to be applied.

In [29]:
reg_stemmer.stem('eating')
reg_stemmer.stem('ingeating')
reg_stemmer.stem('playings')
reg_stemmer.stem('in')
reg_stemmer.stem('understable')

'underst'

## Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [32]:
from nltk.stem import SnowballStemmer
snowballsstemmer=SnowballStemmer('english')

for word in words:
    print(word+"---->"+snowballsstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [28]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [31]:
snowballsstemmer.stem("fairly"),snowballsstemmer.stem("sportingly")

('fair', 'sport')

In [33]:
snowballsstemmer.stem('goes')

'goe'

In [34]:
stemming.stem('goes')

'goe'

In [1]:
import dataframe as df
df.dropna(how = 'all', axis = 1)

ModuleNotFoundError: No module named 'dataframe'