# K-Nearest Neighbour Intuition (K-NN)
### NOTE: This version is for CSV file imports!

---

Created By: Xavier De Carvalho  
Created On: 06/07/2021  
Upated By: N/A  
Updated On: N/A  
Version: knn0.0.01

### Requirements

---

##### Required Data Format
- File Type: CSV
- File Shape: 2 Columns, (n) Rows

##### Required Python Packages
- Numpy
- Matplotlib
    - PyPlot
    - ListedColormap
- Pandas
- ScikitLearn
    - Model_Selection
    - StandardScaler
    - KNeighborsClassifier
    - confusion_matrix
    - accuracy_score

### Description

---

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other.

It's typically best suited to recommender systems.

### Steps

---

- **Step 1** Choose the number K of neighbours
- **Step 2** Take the K nearest neighbours of the new data point, according to the Euclidean distance
- **Step 3** Among these K neighbours, count the number of data points in each category
- **Step 4** Assign the new data point to the category where you counted the most neighbours
- **Step 5** Your model is ready

### Install Dependencies If Needed

---

NOTE: This might not be required if you're running your notebook instance in the cloud! 

Delete the cell below if this is the case...

In [None]:
# Import the sys dependency
import sys
# Install dependencies
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install sklearn

### Import Packages

---

In [None]:
# Import packages
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrix import confusion_matrix, accuracy_score
# Confirm packages have been imported
print("Packages imported!")

### Import the dataset

---

In [None]:
# Read data from CSV file
dataset = pd.read_csv('YOUR_CSV')
# Allocate X and Y
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values
# Confirm data was imported
print("Data imported from CSV!")

### Create Training set and Test set
---

In [None]:
# Create training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Confirm training set was created
print("Training set created!")

In [None]:
print(X_train)

In [None]:
print(y_train)

In [None]:
print(X_test)

In [None]:
print(y_test)

### Feature scaling
---

In [None]:
# Feature scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Confirm feature scaling complete
print('Feature scaling complete!')

In [None]:
print(X_train)

In [None]:
print(X_test)

### Train the model using the training set
---

In [None]:
# Train the model using the test set
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p=2)
classifier.fit(X_train, y_train)
# Confirm model was trained
print('Model training complete!')

### Predict a new result
---

In [None]:
print(classifier.predict(sc.transform([[30,87000]])))

### Predict the Test set results
---

In [None]:
# Predict results for test set
y_pred = classifier.predict(X_test)
# Print prediction
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

### Build the confusion matrix
---

In [None]:
# Build confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Confirm cm output
print(cm)
# Calculate accuracy score
accuracy_score(y_test, y_pred)

### Visualise the Training set results
---

In [None]:
# Setup
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 1),
                     np.arrange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 1))
# Plots
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', green))(i), label = j)
plt.title('K-NN (Training Set)')
# Replace the text marked with '@' with your own text.
# Don't forget to remove the '@' character!
plt.xlabel('@YOUR_X_LABEL')
plt.ylabel('@YOUR_Y_LABEL')
plt.legend()
plt.show()