Spam Mail Prediction Using Machine Learning

Author   :  Ushasri Buddha

Dataset  :  https://drive.google.com/file/d/16vUtptPhYN1SWiWsuMrMAXy26zwhueWu/view?usp=sharing

Importing Basic Libraries

In [None]:
import numpy as np
import pandas as pd

from sklearn import svm

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer


%matplotlib inline

Loading the Data Set

In [None]:
spam = pd.read_csv('spam.csv',encoding='ISO-8859-1')

In [None]:
spam.head()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [None]:
spam.tail()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,
5571,ham,Rofl. Its true to its name,,,


In [None]:
spam.shape

(5572, 5)

In [None]:
spam.describe()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""","MK17 92H. 450Ppw 16""","GNT:-)"""
freq,4825,30,3,2,2


In [None]:
spam.isnull().sum()

v1               0
v2               0
Unnamed: 2    5522
Unnamed: 3    5560
Unnamed: 4    5566
dtype: int64

In [None]:
spam = spam.drop(columns="Unnamed: 2")

In [None]:
spam = spam.drop(columns="Unnamed: 3")

In [None]:
spam = spam.drop(columns="Unnamed: 4")

In [None]:
spam.describe()

Unnamed: 0,v1,v2
count,5572,5572
unique,2,5169
top,ham,"Sorry, I'll call later"
freq,4825,30


In [None]:
spam.isnull().sum()

v1    0
v2    0
dtype: int64

In [None]:
spam

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


Model Training

In [None]:
x = spam["v1"]
y = spam["v2"]

In [None]:
print(x)

0        ham
1        ham
2       spam
3        ham
4        ham
        ... 
5567    spam
5568     ham
5569     ham
5570     ham
5571     ham
Name: v1, Length: 5572, dtype: object


In [None]:
print(y)

0       Go until jurong point, crazy.. Available only ...
1                           Ok lar... Joking wif u oni...
2       Free entry in 2 a wkly comp to win FA Cup fina...
3       U dun say so early hor... U c already then say...
4       Nah I don't think he goes to usf, he lives aro...
                              ...                        
5567    This is the 2nd time we have tried 2 contact u...
5568                Will Ì_ b going to esplanade fr home?
5569    Pity, * was in mood for that. So...any other s...
5570    The guy did some bitching but I acted like i'd...
5571                           Rofl. Its true to its name
Name: v2, Length: 5572, dtype: object


In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

In [None]:
print(x_train)

2626     ham
1912     ham
5264     ham
4945     ham
3740    spam
        ... 
4344     ham
161      ham
5445     ham
1331     ham
309      ham
Name: v1, Length: 4457, dtype: object


In [None]:
print(x_test)

5339     ham
2419    spam
5319     ham
3088     ham
2932     ham
        ... 
199      ham
4484     ham
4522     ham
5517     ham
1422    spam
Name: v1, Length: 1115, dtype: object


In [None]:
print(y_train)

2626        Unni thank you dear for the recharge..Rakhesh
1912    For real tho this sucks. I can't even cook my ...
5264    Storming msg: Wen u lift d phne, u say \HELLO\...
4945             I'm already back home so no probably not
3740                                        2/2 146tf150p
                              ...                        
4344                                  Enjoy urself tmr...
161     New car and house for my parents.:)i have only...
5445    And that's fine, I got enough bud to last most...
1331                         Good Morning plz call me sir
309     Where are the garage keys? They aren't on the ...
Name: v2, Length: 4457, dtype: object


In [None]:
print(y_test)

5339                  You'd like that wouldn't you? Jerk!
2419    SMS SERVICES For your inclusive text credits p...
5319                         Kothi print out marandratha.
3088    What Today-sunday..sunday is holiday..so no wo...
2932    Yo do you know anyone  &lt;#&gt;  or otherwise...
                              ...                        
199              Found it, ENC  &lt;#&gt; , where you at?
4484                             What not under standing.
4522    Actually I decided I was too hungry so I haven...
5517    Miles and smiles r made frm same letters but d...
1422    Congratulations ur awarded either å£500 of CD ...
Name: v2, Length: 1115, dtype: object


In [None]:
vectorizer = CountVectorizer()
x_train = vectorizer.fit_transform(x_train)
x_test = vectorizer.transform(x_test)

In [None]:
from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(x_train,y_train)

In [None]:
from sklearn.metrics import accuracy_score

prediction = model.predict(x_test)
print(accuracy_score(y_test,prediction))

0.004484304932735426
