<a href="https://colab.research.google.com/github/zhouchun0105/Sensor-Data-to-Predict-Human-Status/blob/main/KNN_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **KNN**



*   Assumption: similar data points are close to each other in feature space.
*   For each point of the dataset a Euclidian distance is calculated.
*   For classification the mode of the K nearest data set is used.
*   All data needs to be standardized to remove dominant effect.



### Advantage

* No need to build a model, tune several parameters, or make additional assumptions
* The algorithm is versatile and can be used for classification and regression

### Disadvantage

* The algorithm gets significantly slower as the number of samples and features increase

# **Import Dataset**

In [2]:
import pandas as pd

In [4]:
train = pd.read_csv("/content/drive/MyDrive/Datasets/train.csv")

In [7]:
test = pd.read_csv("/content/drive/MyDrive/Datasets/test.csv")

In [9]:
train_feature = train.drop(columns="Status")

In [12]:
test_feature = test.drop(columns="Status")

In [32]:
train.head(3)

Unnamed: 0,Sound (dB),Light (Lux),Vibration (db),Time of day (H),Status
0,40,340,35,9,Home
1,36,20,23,4,Sleeping
2,56,39,15,13,Not Home


In [33]:
test.head(3)

Unnamed: 0,Sound (dB),Light (Lux),Vibration (db),Time of day (H),Status
0,32,27,17,3,
1,54,20,27,6,
2,43,200,23,10,


# **KNN Algorithm**

Define Euclidean Distance Function

In [24]:
import math
def eudis(point1, point2):
  distance = 0
  for i in range(len(point1) - 1):
    distance += (point1[i] - point2[i])**2
  return math.sqrt(distance)

Locate Nearest Neighbors

In [49]:
def get_neighbors(train_list, test_point, k):
	distances = []
	for train_point in train_list:
		dist = eudis(train_point[:-1], test_point)
		distances.append((train_point, dist))
	distances.sort(key=lambda tup: tup[1])
	neighbors = []
	for i in range(k):
		neighbors.append(distances[i][0])
	return neighbors

Make Predictions

In [54]:
def prediction(train_list, test_point, k):
	neighbors = get_neighbors(train_list, test_point, k)
	status = [neighbor[-1] for neighbor in neighbors]
	prediction = max(set(status), key=status.count)
	return prediction

# **Application on Dataset**

Calculate Eulidean distance between each training and testing dataset

In [50]:
train_list = train.values.tolist()

In [57]:
test_list = test_feature.values.tolist()

Show result for different k

In [73]:
def show_pred(k):
  status = []
  for test_point in test_list:
    pred = prediction(train_list,test_point,k)
    status.append(pred)
  return status

k=3

In [74]:
show_pred(3)

['Sleeping', 'Sleeping', 'Not Home', 'Home']

k=5

In [75]:
show_pred(5)

['Sleeping', 'Sleeping', 'Not Home', 'Home']

k=7

In [76]:
show_pred(5)

['Sleeping', 'Sleeping', 'Not Home', 'Home']

Conclusion: choice of k doesn't affect our result