# **Assignment - 6**

Ques1. (Gaussian Naïve Bayes Classifier) Implement Gaussian Naïve Bayes Classifier on the Iris dataset from sklearn.datasets using  
 &emsp; &emsp; &emsp; (i) Step-by-step implementation  
 &emsp; &emsp; &emsp; (ii) In-built function


In [13]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [14]:
# (i) Step-by-step implementation


data = load_iris()
X = data.data
Y = data.target

X_tr, X_te, y_tr, y_te = train_test_split(X, Y, test_size=0.3, random_state=1)

def get_stats(X, y):
    cls = np.unique(y)
    mean_vals = {}
    std_vals = {}
    for c in cls:
        Xc = X[y == c]
        mean_vals[c] = Xc.mean(axis=0)
        std_vals[c] = Xc.std(axis=0)
    return mean_vals, std_vals

def gauss(x, mean, std):
    return (1/(np.sqrt(2*np.pi)*std)) * np.exp(-((x-mean)**2)/(2*std**2 + 1e-8))

mean_vals, std_vals = get_stats(X_tr, y_tr)

def predict_nb(X):
    preds = []
    for row in X:
        scores = []
        for c in mean_vals:
            p = np.prod(gauss(row, mean_vals[c], std_vals[c]))
            scores.append(p)
        preds.append(np.argmax(scores))
    return np.array(preds)

pred1 = my_predict(X_te)
print("Manual Gaussian NB Accuracy:", accuracy_score(y_te, pred1))


Manual Gaussian NB Accuracy: 0.9555555555555556


In [15]:
# (ii) In-built function

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_tr, y_tr)

pred_builtin = model.predict(X_te)
print("In-built GaussianNB Accuracy:", accuracy_score(y_te, pred_builtin))

In-built GaussianNB Accuracy: 0.9333333333333333


---

Ques2. Explore about GridSearchCV toot in scikit-learn. This is a tool that is often used for tuning hyperparameters of machine learning models.   
&emsp; &emsp; &emsp;Use this tool to find the best value of K for K-NN Classifier using any dataset.

In [16]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

data = load_iris()
X = data.data
y = data.target

X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=1)

param_grid = {
    'n_neighbors': list(range(1, 21))   
}

knn = KNeighborsClassifier()

grid = GridSearchCV(knn, param_grid, cv=5)
grid.fit(X_tr, y_tr)

print("Best K:", grid.best_params_['n_neighbors'])
print("Best CV Score:", grid.best_score_)

best_knn = KNeighborsClassifier(n_neighbors=grid.best_params_['n_neighbors'])
best_knn.fit(X_tr, y_tr)

pred = best_knn.predict(X_te)
print("Test Accuracy:", accuracy_score(y_te, pred))


Best K: 5
Best CV Score: 0.961904761904762
Test Accuracy: 0.9777777777777777


---