K最邻近(k-Nearest Neighbors，KNN)分类是一种监督式的分类方法.
首先根据已标记的数据对模型进行训练，然后根据模型对新的数据点进行预测.
预测新数据点的标签(label)，也就是该数据所属的分类。

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets,neighbors
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV,RandomizedSearchCV
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
data = datasets.load_iris() #鸢尾花数据集
x_train,x_test,y_train,y_test = train_test_split(data.data,data.target,test_size = 0.2,random_state =25)

In [2]:
st = StandardScaler()
st.fit(x_train)
x_train = st.transform(x_train)
x_test = st.transform(x_test)

In [3]:
knn = neighbors.KNeighborsClassifier(n_neighbors = 6)
knn.fit(x_train,y_train)
y_pred = knn.predict(x_test)
print(accuracy_score(y_test,y_pred))
print(knn.score(x_test,y_test))


0.9666666666666667
0.9666666666666667


In [4]:
param_grid = {"n_neighbors":range(1,25),"weights":["uniform","distance"]}
kn = neighbors.KNeighborsClassifier(n_neighbors = 5)
grid = GridSearchCV(kn,param_grid,cv = 10,scoring = "accuracy")
grid.fit(data.data,data.target)

print('网格搜索-最佳度量值:',grid.best_score_)  # 获取最佳度量值
print('网格搜索-最佳参数：',grid.best_params_)  # 获取最佳度量值时的代定参数的值。是一个字典
print('网格搜索-最佳模型：',grid.best_estimator_)  # 获取最佳度量时的分类器模型

网格搜索-最佳度量值: 0.9800000000000001
网格搜索-最佳参数： {'n_neighbors': 13, 'weights': 'uniform'}
网格搜索-最佳模型： KNeighborsClassifier(n_neighbors=13)


In [5]:
param_grid = {"n_neighbors":range(1,25),"weights":["uniform","distance"],'metric': ['euclidean','manhattan','chebyshev','minkowski']}
kn = neighbors.KNeighborsClassifier(n_neighbors = 5)
grid = RandomizedSearchCV(kn,param_grid,cv = 10,scoring = "accuracy")
grid.fit(x_train,y_train)

print('随机搜索-最佳度量值:',grid.best_score_)  # 获取最佳度量值
print('随机搜索-最佳参数：',grid.best_params_)  # 获取最佳度量值时的代定参数的值。是一个字典
print('随机搜索-最佳模型：',grid.best_estimator_)  # 获取最佳度量时的分类器模型

随机搜索-最佳度量值: 0.9666666666666666
随机搜索-最佳参数： {'weights': 'distance', 'n_neighbors': 17, 'metric': 'euclidean'}
随机搜索-最佳模型： KNeighborsClassifier(metric='euclidean', n_neighbors=17, weights='distance')
