## KNN
K-近邻算法（K-Nearest Neighbors, KNN）是一种基于实例的学习方法，属于监督学习范畴。它的工作原理简单直观：给定一个训练数据集，对新的输入实例，KNN算法通过计算其与训练集中每个实例的距离，找出距离最近的K个邻居，然后根据这些邻居的类别（对于分类任务）或值（对于回归任务）来预测新实例的类别或值。KNN因其简单高效和无需训练过程的特点，在众多领域中得到广泛应用，如模式识别、推荐系统、图像分类等。


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt




<center>欧几里得距离:
$$d(x,y)=\sqrt{\sum_{i=1}^n (x_{i}-y_{i})^2}$$

In [2]:
def EuclideanDistance(x,y):
    """
    Calculate the Euclidean distance between two points.
    :param x: point 1
    :param y: point 2
    :return: the Euclidean distance
    """
    x=np.array(x)
    y=np.array(y)
    return np.sqrt(np.sum((x-y)**2,axis=1))
    

<center>曼哈顿距离:<br>
$$d(x,y)=\sum_{i=1}^n\mid x_{i}-y_{i} \mid$$

In [3]:
def ManhattanDistance(x, y):
    """
    Calculate the Manhattan distance between two points.
    :param x: point 1,2D
    :param y: point 2,2D
    :return: the Manhattan distance
    """
    x=np.array(x)
    y=np.array(y)
    return np.sum(np.abs(x-y), axis=1)

In [4]:
#Test
x=[[1,2,3,4,5]]
y=[[2,3,3,4,5]]
EuclideanDistance(x,y),ManhattanDistance(x,y)


(array([1.41421356]), array([2]))

In [5]:
def _Knn(Data:pd.DataFrame, Test, k, metric=EuclideanDistance):
    """
    利用knn算法得到测试点所属类别
    :param Test: 测试点
    :param Data: dataframe类型，规定最后一列为类别
    :param k: k的取值
    :return: 类别
    """
    Test=np.array(Test)
    distance=pd.DataFrame(Data.iloc[:, -1].values, columns=[ "class"])  
    for i in range(Test.shape[0]):
        distance["distance{}".format(i)]=metric(Test[i],Data.iloc[:,:-1])
    res=[]
    for i in range(Test.shape[0]):
        distance_d=distance.sort_values(by=["distance{}".format(i)]).iloc[0:k].reset_index(drop=True)
        classCount={}
        # print(distance_d)
        for i in range(k):
            classCount[distance_d["class"][i]]=classCount.get(distance_d["class"][i], 0)+1
        sortedClassCount = sorted(classCount.items(),key=lambda x:x[1],reverse=True)
        res.append(sortedClassCount[0][0])
    return res
    
   
        
    

    
    

In [6]:
from func.Classfy import Knn

In [7]:
data=pd.DataFrame({"feature_1":[1,1,1,5,5],"feature_2":[1,1,1,5,5],"feature_3":[1,1,1,5,5],"class": ["A","A","A","B","C"]})
test=[[4,5,5],[3,5,5]]
data

Unnamed: 0,feature_1,feature_2,feature_3,class
0,1,1,1,A
1,1,1,1,A
2,1,1,1,A
3,5,5,5,B
4,5,5,5,C


In [8]:
model=Knn(k=5)
model.fit(data)

In [9]:
model.predict(test)

['A', 'A']

In [10]:
_Knn(data,test,5)

['A', 'A']

In [11]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris

iris=load_iris()
iris_df=pd.DataFrame(iris.data,columns=["sepal-length","sepal-width","petal-length","petal-width"])
iris_df["target"]=iris["target"]

In [12]:
iris_df

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [13]:
knn_model=KNeighborsClassifier(n_neighbors=50)
x_test=np.array([[5,4,2,0.1],[6.7,3,5.2,2.3]])

knn_model.fit(iris_df.iloc[:, :-1],iris_df["target"])
knn_model.predict(x_test),x_test



(array([0, 2]),
 array([[5. , 4. , 2. , 0.1],
        [6.7, 3. , 5.2, 2.3]]))

In [14]:
Knn(iris_df,x_test,50)

TypeError: __init__() takes from 2 to 3 positional arguments but 4 were given