# Classification using Bayesian Classifier 

### Bayesian Classifiers
These classifiers are "probabilistic classifiers" based on Bayes' theorem. Bayesian classifiers are highly scalable. They are often used when dimensionality of the inputs is high. 

### Types
1. Naïve Bayes
2. Bayesian Belief Network

### Problem Statement

UCI dataset: Skin Segmentation Data Set (https://archive.ics.uci.edu/ml/machine-learning-databases/00229/).
The Skin Segmentation dataset is constructed over the B, G, R color space. Skin and Nonskin dataset is generated using skin textures from face images of people with diverse age, gender, and race. The task is to identify whether the BGR combination is a skin color or not.


In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

Following code reads a text file into a dataframe. Modify the below code to split the values of B, G, R and class label into different columns. Each column must have a column name as specified below:

```
Column No.   Expected Column Name
1            BLUE
2            GREEN
3            RED
4            RESULT     
```

In [6]:
df = pd.read_csv( 'data/Skin_NonSkin.txt'
, sep = ','
, names = ["BLUE", "GREEN", "RED", "RESULT"]
)

In [7]:
df.head(10)

Unnamed: 0,BLUE,GREEN,RED,RESULT
0,74,85,123,1
1,73,84,122,1
2,72,83,121,1
3,70,81,119,1
4,70,81,119,1
5,69,80,118,1
6,70,81,119,1
7,70,81,119,1
8,76,87,125,1
9,76,87,125,1


Write some code to define X and y dataframes containing R G B components in X and the class in y. Then these will be used to split the data into test / train data. We will be using the Test-Train Split in order to calculate the accuracy of a classification model.

In [8]:
X = df[["BLUE", "GREEN", "RED"]]
y = df[["RESULT"]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

The code in the next cell is used to classify the test data by following the steps below:
    1. Import Gaussian Naïve Bayes Classifier
    2. Fit the model with training data (X: attributes and y:labels)
    3. Use the trained model to predict labels of test data (X_test)
    4. Calculate the accuracy score using actual labels (y_test) and predicted labels (y_pred)

In [9]:
from sklearn.metrics import accuracy_score
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train.values.ravel()).predict(X_test)
print('Accuracy Score :-')
print(accuracy_score(y_test, y_pred))

Accuracy Score :-
0.9239206724883702


Write some code to calculate Precision, Recall and F-score of the results obtained using the given GaussianNB classification model.


In [10]:
from sklearn.metrics import f1_score, precision_score, recall_score
print("Precision :-")
print(precision_score(y_test, y_pred))
print("Recall :-")
print(recall_score(y_test, y_pred))
print("F-score :-")
print(f1_score(y_test, y_pred))

Precision :-
0.8738865447726207
Recall :-
0.7375751820196265
F-score :-
0.7999656667095834


Write some code to classify X_train and y_train using Multinomial Naive Bayes Classifier or Bernoulli Naive Bayes Classifier. Calculate the accuracy, precision, recall and f1-score values for your trained model. Use the scikit learn library for this task.

In [14]:
from sklearn.naive_bayes import MultinomialNB   #Multinomial Naive Bayes Classifier
mnb = MultinomialNB()
y_pred_new = mnb.fit(X_train, y_train.values.ravel()).predict(X_test)

In [15]:
from sklearn.metrics import f1_score, precision_score, recall_score
print('Accuracy Score :-')
print(accuracy_score(y_test, y_pred_new))
print("Precision :-")
print(precision_score(y_test, y_pred_new))
print("Recall :-")
print(recall_score(y_test, y_pred_new))
print("F-score :-")
print(f1_score(y_test, y_pred_new))

Accuracy Score :-
0.9345792867052967
Precision :-
0.7593795093795094
Recall :-
0.9995251661918328
F-score :-
0.8630586305863058
