## __Applying K-Nearest Neighbors__
Let's examine how to create a KNN classifier model.

## Step 1: Import Required Libraries and Read the Dataset

- Import pandas, NumPy, matplotlib.pyplot, and Seaborn libraries
- Configure matplotlib settings
- Read the dataset and display the first five rows


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df = pd.read_csv('Social_Network_Ads.csv')

In [None]:
df.head()

__Observations__
- In the above output, you can see the first few rows of the dataset.
- There are different columns such as user ID, gender, age, estimated salary, and purchased data.

Let us check the info.

In [None]:
df.info()

__Observation__
- There are no null values.

## Step 2: Check How Many People Have Purchased

In [None]:
df['Purchased'].value_counts()

__Observation__
- The output above indicates that 143 people purchased while 257 people didn't.

Let us create a dummy variable for gender.

In [None]:
Gender = pd.get_dummies(df['Gender'],drop_first=True)

In [None]:
df = pd.concat([df,Gender],axis=1)

Drop the gender column as the dummy variable is converted

In [None]:
df.drop(['Gender'],axis=1,inplace=True)

## Step 3: Define Features and Target Variable

- Define the feature matrix X and target variable y

In [None]:
X = df[['Age','EstimatedSalary','Male']]
y = df['Purchased']

## Step 4: Standardize the Features

- Import StandardScaler and scale the features


In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit(X)

Now, let's transform the features.

In [None]:
scaled_features = scaler.transform(X)
scaled_features

Now, let's create a DataFrame.

In [None]:
df_feat = pd.DataFrame(scaled_features,columns=X.columns)
df_feat.head()

__Observation__
- The data is transformed here.

## Step 5: Split the Data into Training and Testing Sets

- Import train_test_split and split the data


In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(scaled_features,y,
                                                    test_size=0.20)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
import matplotlib.pyplot as plt
import seaborn as sns

# Create and train the logistic regression model
lr = LogisticRegression(multi_class='ovr', solver='liblinear')
model = lr.fit(X_train, y_train)

# Predict the labels for the test set
y_pred = model.predict(X_test)

# Classification Report
report = classification_report(y_test, y_pred)
print("Classification Report:")
print(report)

# ROC Curve
proba = model.predict_proba(X_test)
proba_class1 = proba[:, 1]  # Probability of positive class
fpr, tpr, thresholds = roc_curve(y_test, proba_class1)
roc_auc = roc_auc_score(y_test, proba_class1)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend(loc="lower right")
plt.show()

**Observations**
- Here, we can observe the classification report and ROC curve of the classification.

## Step 6: Train the KNN Model and Make Predictions

- Import KNeighborsClassifier and train the model
- Make predictions using the model
- Print confusion matrix and classification report


In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
knn = KNeighborsClassifier(n_neighbors=1)

In [None]:
knn.fit(X_train,y_train)

In [None]:
pred = knn.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

In [None]:
print(confusion_matrix(y_test,pred))

__Observation__
- This is a confusion matrix, where 2 is classified and 4 is misclassified for the non-purchased case.

In [None]:
print(classification_report(y_test,pred))

__Observations__

- In the above output, we can see that we are able to achieve 82% accuracy.
- For the purchase, we are able to have a precision of 77 and a recall of 90 with an f1-score of 77.


# GridSearchCV

In [None]:
36*5

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {"n_neighbors":[6,7,8],
             "algorithm":['ball_tree', 'kd_tree', 'brute'],
             'p':[1,2,3,4]}
knn = KNeighborsClassifier()
grid = GridSearchCV(estimator=knn, param_grid=param_grid,cv=5,verbose=2,scoring='f1')
grid.fit(X_train,y_train)

In [None]:
grid.best_estimator_

In [None]:
grid.best_score_

In [None]:
pd.DataFrame(grid.cv_results_)

In [None]:
from sklearn.model_selection import RandomizedSearchCV

param_grid = {"n_neighbors":[6,7,8],
             "algorithm":['ball_tree', 'kd_tree', 'brute'],
             'p':[1,2,3,4]}
knn = KNeighborsClassifier()
random = RandomizedSearchCV(estimator=knn, param_distributions=param_grid,cv=5,verbose=2,scoring='f1',n_iter=11)
random.fit(X_train,y_train)

In [None]:
random.best_estimator_

In [None]:
random.best_score_