### Naive Bayes Classifier

- Supervise Machine Learning
- Bayes Theorem
- Classification Algorithm
- Fast Machine learning
- Probabilistic Classifier
- Spam Filteration, Sentimental Analysis, Classifying Articles

### Types of Naive Bayes 

- Gaussian
- Multinomial
- Bernoulli

In [167]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

In [168]:
# Load dataset
df = pd.read_csv("spam.csv", encoding='latin1')
df.head(2)

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,


In [169]:
# Drop unwanted columns
df.drop(['Unnamed: 2','Unnamed: 3','Unnamed: 4'], axis=1, inplace=True)

In [170]:
# Rename the columns
df.columns = ['label', 'text']
df.head(2)

Unnamed: 0,label,text
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...


In [171]:
# Convert string labels to numerical
df["label"] = np.where(df['label'] == 'spam', 1, 0)
df.head(3)

Unnamed: 0,label,text
0,0,"Go until jurong point, crazy.. Available only ..."
1,0,Ok lar... Joking wif u oni...
2,1,Free entry in 2 a wkly comp to win FA Cup fina...


In [173]:
# Splitting the dataset into train and test set.
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

In [174]:
# Convert the data into numeric format using scikit

from sklearn.feature_extraction.text import CountVectorizer
vc = CountVectorizer(max_df=0.75)

X_train = vc.fit_transform(X_train)
X_test  = vc.transform(X_test)

In [175]:
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB()
clf.fit(X_train, y_train)

MultinomialNB()

In [176]:
y_pred = clf.predict(X_test)

In [178]:
from sklearn.metrics import accuracy_score, precision_score

a_s = accuracy_score(y_test, y_pred)
ps = precision_score(y_test, y_pred)

In [180]:
print(f"Accuracy : {a_s} Precision : {ps}")

Accuracy : 0.9838565022421525 Precision : 0.9852941176470589
