<a href="https://colab.research.google.com/github/yiruchen1993/nvidia_gtc_dli_rapids_2020/blob/section_notebooks%2Fproject/3_03_nearest_facilities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 第二周: 確認最近的醫療設施

<span style="color:red">
**更新**

感謝您的分析。 儘管到目前為止我們已經做出了警告，但該病毒仍在繼續迅速傳播。我們希望盡快得到感染者的治療，因此我們需要您的幫助來計算哪個醫院或診所距離人口中每個已知的感染者最近。
</span>

您接下來的目標是為每個感染者確定最近的醫院或診所。

## 載入相關套件

In [None]:
import cudf
import cuml
import cupy as cp

## 載入人口資料

從`'./data/week2.csv'` 開始載入`lat`, `long` 和 `infected`欄位，並存入cuDF data frame ，命名為 `gdf`.

In [None]:
gdf = cudf.read_csv('./data/week2.csv', usecols=['lat', 'long', 'infected'])

## 載入醫院和診所數據

在這個步驟，您的目標是建立一個`all_med` cuDF data frame，其中將包含醫院(資料在 `'./data/hospitals.csv'`)以及診所(資料在`'./data/clinics.csv'`).關於經度和緯度的資訊。

In [None]:
hospitals = cudf.read_csv('./data/hospitals.csv')
clinics = cudf.read_csv('./data/clinics.csv')

由於我們將使用這些設施的坐標，因此僅保留在`Latitude` 和 `Longitude`中均非空的列。

In [None]:
print(hospitals.shape)
print(clinics.shape)

(1229, 22)
(19082, 19)


In [None]:
hospitals = hospitals.dropna(subset=['Latitude', 'Longitude'])
hospitals.shape

(1226, 22)

In [None]:
clinics = clinics.dropna(subset=['Latitude', 'Longitude'])
clinics.shape

(19075, 19)

In [None]:
all_med = cudf.concat([hospitals[['Latitude', 'Longitude']], clinics[['Latitude', 'Longitude']]])
all_med.shape

(20301, 2)

## 建立醫療設施的網格坐標

在下一個單元格中為您提供的（您可以通過單擊“ ...”來展開，然後在單擊該單元格的藍色左邊界時執行後再次收縮）是您使用的經/緯到網格坐標轉換器。 使用此轉換器可創建存儲在上一步創建的`all_med`的`northing`和 `easting`列中的網格坐標值。


In [None]:
# https://www.ordnancesurvey.co.uk/docs/support/guide-coordinate-systems-great-britain.pdf

def latlong2osgbgrid_cupy(lat, long, input_degrees=True):
    '''
    Converts latitude and longitude (ellipsoidal) coordinates into northing and easting (grid) coordinates, using a Transverse Mercator projection.
    
    Inputs:
    lat: latitude coordinate (N)
    long: longitude coordinate (E)
    input_degrees: if True (default), interprets the coordinates as degrees; otherwise, interprets coordinates as radians
    
    Output:
    (northing, easting)
    '''
    
    if input_degrees:
        lat = lat * cp.pi/180
        long = long * cp.pi/180

    a = 6377563.396
    b = 6356256.909
    e2 = (a**2 - b**2) / a**2

    N0 = -100000 # northing of true origin
    E0 = 400000 # easting of true origin
    F0 = .9996012717 # scale factor on central meridian
    phi0 = 49 * cp.pi / 180 # latitude of true origin
    lambda0 = -2 * cp.pi / 180 # longitude of true origin and central meridian
    
    sinlat = cp.sin(lat)
    coslat = cp.cos(lat)
    tanlat = cp.tan(lat)
    
    latdiff = lat-phi0
    longdiff = long-lambda0

    n = (a-b) / (a+b)
    nu = a * F0 * (1 - e2 * sinlat ** 2) ** -.5
    rho = a * F0 * (1 - e2) * (1 - e2 * sinlat ** 2) ** -1.5
    eta2 = nu / rho - 1
    M = b * F0 * ((1 + n + 5/4 * (n**2 + n**3)) * latdiff - 
                  (3*(n+n**2) + 21/8 * n**3) * cp.sin(latdiff) * cp.cos(lat+phi0) +
                  15/8 * (n**2 + n**3) * cp.sin(2*(latdiff)) * cp.cos(2*(lat+phi0)) - 
                  35/24 * n**3 * cp.sin(3*(latdiff)) * cp.cos(3*(lat+phi0)))
    I = M + N0
    II = nu/2 * sinlat * coslat
    III = nu/24 * sinlat * coslat ** 3 * (5 - tanlat ** 2 + 9 * eta2)
    IIIA = nu/720 * sinlat * coslat ** 5 * (61-58 * tanlat**2 + tanlat**4)
    IV = nu * coslat
    V = nu / 6 * coslat**3 * (nu/rho - cp.tan(lat)**2)
    VI = nu / 120 * coslat ** 5 * (5 - 18 * tanlat**2 + tanlat**4 + 14 * eta2 - 58 * tanlat**2 * eta2)

    northing = I + II * longdiff**2 + III * longdiff**4 + IIIA * longdiff**6
    easting = E0 + IV * longdiff + V * longdiff**3 + VI * longdiff**5

    return(northing, easting)

In [None]:
all_med = all_med.reset_index()

In [None]:
cupy_lat = cp.asarray(all_med['Latitude'])
cupy_long = cp.asarray(all_med['Longitude'])
n_cupy_array, e_cupy_array = latlong2osgbgrid_cupy(cupy_lat, cupy_long)
all_med['northing'] = cudf.Series(n_cupy_array).astype('float32')
all_med['easting'] = cudf.Series(e_cupy_array).astype('float32')

In [None]:
all_med.head()

Unnamed: 0,index,Latitude,Longitude,northing,easting
0,0,51.379997,-0.406042,165810.46875,510917.53125
1,1,51.315132,-0.556289,158381.34375,500604.84375
2,2,51.437195,-2.847193,171305.78125,341119.375
3,3,53.459743,-2.245469,395944.5625,383703.59375
4,4,52.078121,-0.030604,244071.703125,534945.1875


## 替被感染者找到最近的醫院與診所

把參數`n_neighbors`設置為`1`，並將`cuml.NearestNeighbors`與`all_med`的`northing`和`easting`值做匹配，並將模型另存為`knn`。

In [None]:
knn = cuml.NearestNeighbors(n_neighbors=1)
all_med_locs = all_med[['northing', 'easting']]
knn.fit(all_med_locs)

NearestNeighbors(n_neighbors=1, verbose=False, handle=<cuml.common.handle.Handle object at 0x7f9378017138>, algorithm='brute', metric='euclidean')

將`gdf`中的每個受感染成員保存到一個名為`infected_gdf`的新dataframe中。

In [None]:
infected_gdf = gdf.loc[gdf.infected == 1, :]

替`infected_gdf`建立 `northing` 和 `easting` 數值.

In [None]:
infected_gdf = infected_gdf.reset_index()
cupy_lat = cp.asarray(infected_gdf['lat'])
cupy_long = cp.asarray(infected_gdf['long'])
n_cupy_array, e_cupy_array = latlong2osgbgrid_cupy(cupy_lat, cupy_long)
infected_gdf['northing'] = cudf.Series(n_cupy_array).astype('float32')
infected_gdf['easting'] = cudf.Series(e_cupy_array).astype('float32')
infected_gdf.head()

Unnamed: 0,index,lat,long,infected,northing,easting
0,1346586,53.715826,-2.430079,1.0,424489.78125,371619.6875
1,1350932,53.664881,-2.425673,1.0,418820.6875,371876.5
2,1352085,53.696765,-2.48894,1.0,422394.40625,367721.0
3,1352799,53.696966,-2.488897,1.0,422416.8125,367723.96875
4,1357529,53.727804,-2.392959,1.0,425808.125,374076.5625


在`infected_gdf`的`northing`和`easting`值上將`knn.kneighbors`與`n_neighbors = 1`一起使用。 將返回值保存在`distances`和`indices`中。


In [None]:
distances, indices = knn.kneighbors(infected_gdf[['easting', 'northing']], 1)

### 檢查您的解決方案

您剛剛在上方使用`knn.kneighbors`所返回的`indices`應該將人員索引映射到與其最近的診所/醫院索引：

In [None]:
indices.head()

Unnamed: 0,0
0,16696
1,686
2,11757
3,11757
4,16696


在這裡，您可以從`infected_gdf`找出受感染者的坐標：

In [None]:
infected_gdf.iloc[0] # get the coords of an infected individual (in this case, individual 0)

index       1.346586e+06
lat         5.371583e+01
long       -2.430079e+00
infected    1.000000e+00
northing    4.244898e+05
easting     3.716197e+05
Name: 0, dtype: float64

您應該能夠使用最近設施的映射索引來查看確實最近設施在附近的坐標處：

In [None]:
all_med.iloc[16696] # printing the entry for facility 1234 (replace with the index identified as closest to the individual)

index         15473.000000
Latitude         53.246147
Longitude        -1.617808
northing     372224.500000
easting      425500.437500
Name: 16696, dtype: float64

<div align="center"><h2>請重啟核心</h2></div>

...在往下一個notebook前進前

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)