<a href="https://colab.research.google.com/github/yiruchen1993/nvidia_gtc_dli_rapids_2020/blob/section_notebooks%2Fmachine_learning/2_06_knn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# KNN

在此notebook中，您將使用GPU加速的k-nearest neighbors 來識別距醫院最近的道路節點。

## 目標

在您完成本notebook時，您將能夠：

-使用單個GPU使用GPU加速的k-nearest neighbors

## 載入

In [None]:
import cudf
import cuml

## 載入資料

### 道路節點

我們從讀取道路節點資料開始

In [None]:
road_nodes = cudf.read_csv('./data/road_nodes_2-06.csv', dtype=['str', 'float32', 'float32', 'str'])

In [None]:
road_nodes.dtypes

node_id     object
east       float32
north      float32
type        object
dtype: object

In [None]:
road_nodes.shape

(3121148, 4)

In [None]:
road_nodes.head()

Unnamed: 0,node_id,east,north,type
0,id02FE73D4-E88D-4119-8DC2-6E80DE6F6594,320608.09375,870994.0,junction
1,id634D65C1-C38B-4868-9080-2E1E47F0935C,320628.5,871103.8125,road end
2,idDC14D4D1-774E-487D-8EDE-60B129E5482C,320635.46875,870983.875,junction
3,id51555819-1A39-4B41-B0C9-C6D2086D9921,320648.6875,871083.5625,junction
4,id9E362428-79D7-4EE3-B015-0CE3F6A78A69,320658.1875,871162.375,junction


### 醫院

接下來，我們載入醫院數據。

In [None]:
hospitals = cudf.read_csv('./data/hospitals_2-06.csv')

In [None]:
hospitals.dtypes

﻿OrganisationID         int64
OrganisationCode       object
OrganisationType       object
SubType                object
Sector                 object
OrganisationStatus     object
IsPimsManaged          object
OrganisationName       object
Address1               object
Address2               object
Address3               object
City                   object
County                 object
Postcode               object
Latitude              float64
Longitude             float64
ParentODSCode          object
ParentName             object
Phone                  object
Email                  object
Website                object
Fax                    object
northing              float64
easting               float64
dtype: object

In [None]:
hospitals.shape

(1226, 24)

In [None]:
hospitals.head()

Unnamed: 0,﻿OrganisationID,OrganisationCode,OrganisationType,SubType,Sector,OrganisationStatus,IsPimsManaged,OrganisationName,Address1,Address2,...,Latitude,Longitude,ParentODSCode,ParentName,Phone,Email,Website,Fax,northing,easting
0,17970,NDA07,Hospital,Hospital,Independent Sector,Visible,True,Walton Community Hospital - Virgin Care Servic...,,Rodney Road,...,51.379997,-0.406042,NDA,Virgin Care Services Ltd,01932 414205,,,01932 253674,165810.4688,510917.5313
1,17981,NDA18,Hospital,Hospital,Independent Sector,Visible,True,Woking Community Hospital (Virgin Care),,Heathside Road,...,51.315132,-0.556289,NDA,Virgin Care Services Ltd,01483 715911,,,,158381.3438,500604.8438
2,18102,NLT02,Hospital,Hospital,NHS Sector,Visible,True,North Somerset Community Hospital,North Somerset Community Hospital,Old Street,...,51.437195,-2.847193,NLT,North Somerset Community Partnership Community...,01275 872212,,http://www.nscphealth.co.uk,,171305.7813,341119.375
3,18138,NMP01,Hospital,Hospital,Independent Sector,Visible,False,Bridgewater Hospital,120 Princess Road,,...,53.459743,-2.245469,NMP,Bridgewater Hospital (Manchester) Ltd,0161 2270000,,www.bridgewaterhospital.com,,395944.5625,383703.5938
4,18142,NMV01,Hospital,Hospital,Independent Sector,Visible,True,Kneesworth House,Old North Road,Bassingbourn,...,52.078121,-0.030604,NMV,Partnerships In Care Ltd,01763 255 700,reception_kneesworthhouse@partnershipsincare.c...,www.partnershipsincare.co.uk,,244071.7031,534945.1875


## K-Nearest Neighbors

我們將使用[k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)算法為每家醫院找到最近的*k*路節點。我們需要使用道路數據擬合KNN模型，然後提供經過訓練的醫院模型位置，以便它可以返回最近的道路。

## 練習: 準備KNN模型

通過使用`cuml.NearestNeighbors`構造函數創建一個k-nearest neighbors模型`knn`，並將命名參數`n_neighbors`設置為3。

#### 解答

In [None]:
# %load solutions/prep_knn
knn = cuml.NearestNeighbors(n_neighbors=3)


## 練習: Fit the KNN Model

使用`road_nodes`欄位`east`和`north`建立一個新的dataframe`road_locs`。欄位的順序無關緊要，只不過我們需要它們在多個操作上保持一致，因此請使用順序`['east'，'north']`。

使用`knn.fit`方法將`knn`模型與`road_locs`擬合。

In [None]:
road_nodes.columns

Index(['node_id', 'east', 'north', 'type'], dtype='object')

#### 解答

In [None]:
# %load solutions/fit_knn
road_locs = road_nodes[['east', 'north']]
knn.fit(road_locs)


NearestNeighbors(n_neighbors=3, verbose=False, handle=<cuml.common.handle.Handle object at 0x7fea5b769d80>, algorithm='brute', metric='euclidean')

## 練習: 離每個醫院最近的道路節點

使用`knn.kneighbors`方法查找離每個醫院最近的3個道路節點。knn.kneighbors需要兩個參數：X，您應該使用醫院的easting列和northing欄位（請記住，與您使用上面的knn模型時保持相同的列順序）和`n_neighbors`，即要搜索的鄰居數-在這種情況下為3。

`knn.kneighbors`將返回2個cudf dataframe，您應分別將其命名為`distances`和`indices`。

#### 解答

In [None]:
# %load solutions/k_closest_nodes
distances, indices = knn.kneighbors(hospitals[['easting', 'northing']], 3) # order has to match the knn fit order (east, north)


In [None]:
distances

Unnamed: 0,0,1,2
0,0.0,0.0,181.019333
1,0.0,0.0,0.000000
2,0.0,128.0,128.000000
3,0.0,0.0,0.000000
4,256.0,256.0,362.038666
...,...,...,...
1221,0.0,128.0,128.000000
1222,0.0,128.0,128.000000
1223,0.0,0.0,0.000000
1224,0.0,0.0,0.000000


In [None]:
indices

Unnamed: 0,0,1,2
0,2133560,2133614,2133567
1,2145301,2145288,2145299
2,1649517,1649696,1649525
3,1339548,1339744,1339755
4,751990,751995,751988
...,...,...,...
1221,2781755,2781759,2781757
1222,2781755,2781759,2781757
1223,966490,966491,966496
1224,2111705,2111708,2111704


## 觀看特定醫院的結果

現在，我們可以使用`indices`, `hospitals`, 和 `road_nodes`來導出特定於給定醫院的信息。在這裡，我們將檢查索引為`10`的醫院。首先，我們查看醫院的網格坐標：

In [None]:
SELECTED_RESULT = 10
print('hospital coordinates:\n', hospitals.loc[SELECTED_RESULT, ['easting', 'northing']], sep='')

hospital coordinates:
easting     260713.17190
northing     56303.21875
Name: 10, dtype: float64


現在，我們查看3個最接近的道路節點的道路節點ID：

In [None]:
nearest_road_nodes = indices.iloc[SELECTED_RESULT, 0:3]
print('node_id:\n', nearest_road_nodes, sep='')

node_id:
0    118559
1    118560
2    118678
Name: 10, dtype: int64


最後，我們可以確認的是，距離醫院最近的3個道路節點的網格坐標位於：

In [None]:
print('road_node coordinates:\n', road_nodes.loc[nearest_road_nodes, ['east', 'north']], sep='')

road_node coordinates:
                 east         north
118559  260697.859375  56322.710938
118560  260722.812500  56207.925781
118678  260540.000000  56105.000000


<br>
<div align="center"><h2>請重啟Kernel</h2></div>

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## 下一步

在下一個notebook中，您將返回到K-means算法，但是這次使用的是可擴展到多節點，多GPU的Dask版本。