# STEMMING

### Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, and “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.

In [1]:
words = ["eating","eats","eaten","eat","writing","writes","programming","programs","history","finally","finalize"]
words

['eating',
 'eats',
 'eaten',
 'eat',
 'writing',
 'writes',
 'programming',
 'programs',
 'history',
 'finally',
 'finalize']

### Porter Stemmer

In [3]:
from nltk.stem import PorterStemmer

In [4]:
stemming=PorterStemmer()

In [5]:
for word in words:
    print(word+'---->'+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
eat---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalize---->final


##### Major disadvantage of the stemming process is that the word and the meaning both may change

In [6]:
stemming.stem("Congratulations")

'congratul'

In [7]:
stemming.stem("sitting")

'sit'

##### This issue can be fixed using Lemmatization

### RegexpStemmer class

In [9]:
from nltk.stem import RegexpStemmer

In [10]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$',min=4)

In [11]:
reg_stemmer.stem('eating')

'eat'

### Snowball Stemmer

In [12]:
from nltk.stem import SnowballStemmer

In [13]:
snowballstemmer=SnowballStemmer('english')

In [14]:
for word in words:
    print(word+'--->'+snowballstemmer.stem(word))

eating--->eat
eats--->eat
eaten--->eaten
eat--->eat
writing--->write
writes--->write
programming--->program
programs--->program
history--->histori
finally--->final
finalize--->final


##### How this technique is better than PorterStemmer()

In [15]:
stemming.stem("fairly"),snowballstemmer.stem("fairly")

('fairli', 'fair')