# Naive Bayes - GlobalAIHub

📌 The main aim of classification is, to find the class corresponding to a given feature set. The **Naive Bayes** classifier algorithm is based on a famous theorem called **“Bayes theorem”** which is centered on conditional probability. Conditional probability is the probability of an event ‘A’ happening given that another event ‘B’ has already happened. For example, consider event ‘A’ to be “having a fever”, and event ‘B’ to be “infected with Covid-19”. With conditional probability, we can ask the question: what is the chance of having a fever given that you have been infected with Covid-19.

📌 The Bayes’ theorem is an extension of conditional probability. In a sense, it allows us to use reverse reasoning. The Naive Bayes algorithm does the same for the class and its features. Instead of calculating the probability of a feature belonging to a class, it approaches the issue from another angle. 

📌 There are three types of Naive Bayes Classifiers in sklearn; **Bernoulli Naive Bayes**, **Multinomial Naive Bayes** and **Gaussian Naive Bayes**. We use Bernoulli Naive Bayes when our data is binary like true or false, yes or no and so on. We use Multinomial Naive Bayes when we have discrete values such as number of family members or, pages in a book. We use Gaussian Naive Bayes when all of our features are continuous variables, like temperature or height. Let’s take the dataset on tumors from our classification session, which only has continuous variables. We use the Gaussian Naive Bayes algorithm. As always, we start with importing the Pandas library for reading our data file. Then we can read and display the first few rows of the dataset.

In [None]:
import pandas as pd

In [None]:
dataset = pd.read_csv("breast-cancer.csv")
dataset.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,1.097064,-2.073335,1.269934,0.984375,1.568466,3.283515,2.652874,2.532475,2.217515,...,1.88669,-1.359293,2.303601,2.001237,1.307686,2.616665,2.109526,2.296076,2.750622,1.937015
1,M,1.829821,-0.353632,1.685955,1.908708,-0.826962,-0.487072,-0.023846,0.548144,0.001392,...,1.805927,-0.369203,1.535126,1.890489,-0.375612,-0.430444,-0.146749,1.087084,-0.24389,0.28119
2,M,1.579888,0.456187,1.566503,1.558884,0.94221,1.052926,1.363478,2.037231,0.939685,...,1.51187,-0.023974,1.347475,1.456285,0.527407,1.082932,0.854974,1.955,1.152255,0.201391
3,M,-0.768909,0.253732,-0.592687,-0.764464,3.283553,3.402909,1.915897,1.451707,2.867383,...,-0.281464,0.133984,-0.249939,-0.550021,3.394275,3.893397,1.989588,2.175786,6.046041,4.93501
4,M,1.750297,-1.151816,1.776573,1.826229,0.280372,0.53934,1.371011,1.428493,-0.00956,...,1.298575,-1.46677,1.338539,1.220724,0.220556,-0.313395,0.613179,0.729259,-0.868353,-0.3971


📌 we need to convert our target variable from categorical to numerical type using label encoding. Then, we can continue with defining our features and a target.kalın metin

In [None]:
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
dataset["diagnosis"] = labelencoder.fit_transform(dataset["diagnosis"].values) 

In [None]:
X = dataset.drop("diagnosis", axis =1)
y = dataset["diagnosis"]

📌 After we define the features and a target, we can split them into train and test.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

📌 Now, the most exciting part! Let’s create our model, Gaussian Naive Bayes, teach some hidden patterns to it with training data, and finally use it to make predictions.

In [None]:
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)

Iteration 1, loss = 0.50227662
Iteration 2, loss = 0.27552596
Iteration 3, loss = 0.18388074
Iteration 4, loss = 0.14159198
Iteration 5, loss = 0.11646013
Iteration 6, loss = 0.09870059
Iteration 7, loss = 0.08675798
Iteration 8, loss = 0.07846542
Iteration 9, loss = 0.07155860
Iteration 10, loss = 0.06593413




GaussianNB()

📌 Now, we can check the strength of our predictions.

In [None]:
predictions = model.predict(X_test)
predictions

array([0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0,
       1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0])

In [None]:
from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, predictions)

array([[102,   8],
       [  3,  58]])

📌 Looking at the results, we have accuracy and precision above 90% which can be considered as a good result.

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.97      0.93      0.95       110
           1       0.88      0.95      0.91        61

    accuracy                           0.94       171
   macro avg       0.93      0.94      0.93       171
weighted avg       0.94      0.94      0.94       171

