# Chapter 6 - Other Popular Machine Learning Models Models
## Segment 3 - Instance-based learning w/ k-Nearest Neighbor
#### Setting up for classification analysis

### K-Nearest Neighbor (KNN) Classification 
**A supervised classifier that memorizes observations from within a test set to predict classification labels for new, unlabeled observations.**

A supervised machine learning method that you can use to classify instances based on arithmetic difference between features in a labeled dataset. 

**KNN makes predictions based on how similar training obsevations are to the new, incoming observations.** The more similar the observation values, the more likely they will be classified with the same label.

### K-Nearest Neighbor Use Cases
- **Stock Price Prediction**
- **Credit Risk Analysis**
- **Predictive Trip Planning**
- **Recommendation Systems**

### K-Nearest Neighbor Assumptions
- Dataset has little noise
- Dataset is labeled
- Dataset contains only relevant features
- Dataset has distinguishable subgroups

***You also want to avoid using K-Nearest Neighbor on large datasets because it's probably going to take a very, very long time depending on how big your dataset actually is.***

In [7]:
import numpy as np
import pandas as pd
import scipy
import urllib
import sklearn

import matplotlib.pyplot as plt
from pylab import rcParams

from sklearn import neighbors
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [8]:
from sklearn.neighbors import KNeighborsClassifier

In [9]:
np.set_printoptions(precision=4, suppress=True) 
%matplotlib inline
rcParams['figure.figsize'] = 7, 4
plt.style.use('seaborn-whitegrid')

## Importing your data

In [10]:
address = 'C:/Users/Lillian/Desktop/ExerciseFiles/Data/mtcars.csv'

cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']

X_prime = cars[['mpg', 'disp', 'hp', 'wt']].values
y = cars.iloc[:,9].values

In [11]:
X = preprocessing.scale(X_prime)

In [12]:
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=.2, random_state=17)

## Building and training your model with training data

In [13]:
clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, y_train)
print(clf)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')


## Evaluating your model's predictions

In [14]:
y_pred= clf.predict(X_test)
y_expect = y_test

print(metrics.classification_report(y_expect, y_pred))

              precision    recall  f1-score   support

           0       0.80      1.00      0.89         4
           1       1.00      0.67      0.80         3

    accuracy                           0.86         7
   macro avg       0.90      0.83      0.84         7
weighted avg       0.89      0.86      0.85         7



## Interpreting Results

### Recall: a measure of your model's completeness
- Of all your points that were labeled 1, only 67% of the results that were returned were truly relevant.
- Of the entire dataset, 83% of the results that were returned were truly relevant.

### High precision + Low recall = 
**= Few results returned, but many of the label predictions that are returned are correct.**

***Thank you for going through this project. 
Your comments are more then welcome to ybezginova2021@gmail.com***

***Best wishes,***

***Yulia***