## Capstone Project - The Battle of Neighborhoods (Week 1)(Data section)  

#### Use Taiwan's Latitude and Longitude information dataset. 

#### In order to use the Foursquare query, the township Chinese-English comparison data set was merged.

#### The location category information of the two cities can be obtained by transferring the data from New Taipei City and Taipei City in the above data set to the Foursquare API. 

#### The  Foursquare Venue Category Hierarchy data is then obtained by the Foursquare API. 

#### By combining venue category information with  Foursquare Venue Category Hierarchy data, it is the data set to be analyzed.

### 0. Import packages

In [1]:
import numpy as np 
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests 
from pandas import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

### 1. Latitude and longitude of townships in Taiwan

- County, township, Chinese-English Excel file (Chinese Pinyin)  
    - Source [National Development Council] https://data.gov.tw/dataset/25489  
    - Download [Taiwan_postal.csv] https://quality.data.gov.tw/dq_download_csv.php?nid=25489&md5_url=ab48007db9f630e51fec0cb608e32d61  
- 3 yards postal code and administrative center latitude and longitude comparison table
    - Source [Chunghwa Post Co., Ltd.] https://www.post.gov.tw/post/internet/Download/index.jsp?ID=220306  
    - Download [county_mapping.xls] https://www.post.gov.tw/post/download/county_h_10706.xls  

#### 1-1 Loading Taiwan latitude and longitude data

In [2]:
ll_data = pd.read_csv('Taiwan_postal.csv')
ll_data.rename(columns={"_x0033_碼郵遞區號": "Postal code", "中心點經度": "Longitude", "中心點緯度": "Latitude"},inplace=True)
ll_data.drop(['TGOS_URL'],axis=1,inplace=True)
ll_data

Unnamed: 0,行政區名,Postal code,Longitude,Latitude
0,臺北市中正區,100,121.519884,25.032405
1,臺北市大同區,103,121.513042,25.063424
2,臺北市中山區,104,121.538160,25.069699
3,臺北市松山區,105,121.557588,25.059991
4,臺北市大安區,106,121.543445,25.026770
...,...,...,...,...
366,花蓮縣瑞穗鄉,978,121.407347,23.515612
367,花蓮縣萬榮鄉,979,121.318953,23.727726
368,花蓮縣玉里鎮,981,121.360448,23.371436
369,花蓮縣卓溪鄉,982,121.180422,23.390629


#### 1-2 Loading township Chinese-English data

In [3]:
map_data = pd.read_excel('county_mapping.xls', header=None)
map_data.columns = ['Postal code','行政區名','District name']
map_data

Unnamed: 0,Postal code,行政區名,District name
0,100,臺北市中正區,"Zhongzheng Dist., Taipei City"
1,103,臺北市大同區,"Datong Dist., Taipei City"
2,104,臺北市中山區,"Zhongshan Dist., Taipei City"
3,105,臺北市松山區,"Songshan Dist., Taipei City"
4,106,臺北市大安區,"Da’an Dist., Taipei City"
...,...,...,...
366,978,花蓮縣瑞穗鄉,"Ruisui Township, Hualien County"
367,979,花蓮縣萬榮鄉,"Wanrong Township, Hualien County"
368,981,花蓮縣玉里鎮,"Yuli Township, Hualien County"
369,982,花蓮縣卓溪鄉,"Zhuoxi Township, Hualien County"


#### **Notice: The administrative districts in the two datasets have different names, but those places are not habitable, so they can be ignored.**

In [4]:
diff = pd.concat([ll_data['行政區名'],map_data['行政區名']]).drop_duplicates(keep=False)
diff

64     宜蘭縣釣魚臺列嶼
270    南海諸島東沙群島
271    南海諸島南沙群島
64          釣魚台
270     高雄市東沙群島
271     高雄市南沙群島
Name: 行政區名, dtype: object

#### 1-3 Merge two dataset

In [5]:
taiwan_data = pd.merge(map_data, ll_data,on=['Postal code', '行政區名'])
taiwan_data

Unnamed: 0,Postal code,行政區名,District name,Longitude,Latitude
0,100,臺北市中正區,"Zhongzheng Dist., Taipei City",121.519884,25.032405
1,103,臺北市大同區,"Datong Dist., Taipei City",121.513042,25.063424
2,104,臺北市中山區,"Zhongshan Dist., Taipei City",121.538160,25.069699
3,105,臺北市松山區,"Songshan Dist., Taipei City",121.557588,25.059991
4,106,臺北市大安區,"Da’an Dist., Taipei City",121.543445,25.026770
...,...,...,...,...,...
363,978,花蓮縣瑞穗鄉,"Ruisui Township, Hualien County",121.407347,23.515612
364,979,花蓮縣萬榮鄉,"Wanrong Township, Hualien County",121.318953,23.727726
365,981,花蓮縣玉里鎮,"Yuli Township, Hualien County",121.360448,23.371436
366,982,花蓮縣卓溪鄉,"Zhuoxi Township, Hualien County",121.180422,23.390629


### 2. Venue categories in New Taipei City 

#### 2-1 Using Taipei City and New Taipei City data

In [6]:
city_data = taiwan_data[taiwan_data['District name'].str.contains("Taipei City")]
city_data

Unnamed: 0,Postal code,行政區名,District name,Longitude,Latitude
0,100,臺北市中正區,"Zhongzheng Dist., Taipei City",121.519884,25.032405
1,103,臺北市大同區,"Datong Dist., Taipei City",121.513042,25.063424
2,104,臺北市中山區,"Zhongshan Dist., Taipei City",121.53816,25.069699
3,105,臺北市松山區,"Songshan Dist., Taipei City",121.557588,25.059991
4,106,臺北市大安區,"Da’an Dist., Taipei City",121.543445,25.02677
5,108,臺北市萬華區,"Wanhua Dist., Taipei City",121.497986,25.02859
6,110,臺北市信義區,"Xinyi Dist., Taipei City",121.57167,25.030621
7,111,臺北市士林區,"Shilin Dist., Taipei City",121.550847,25.125467
8,112,臺北市北投區,"Beitou Dist., Taipei City",121.517799,25.148068
9,114,臺北市內湖區,"Neihu Dist., Taipei City",121.592383,25.083706


#### 2-2 Foursquare Settings

In [7]:
CLIENT_ID = 'M143ASAFEMECNO05TR44UEDIRA2CL4X0VPXJZCGX1QCJF3ZF' # your Foursquare ID
CLIENT_SECRET = 'BCELZQVSDCZ0DZZXR14FZA5SYMFGT21ITWK0T4FMQJ13VEFQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 
radius = 5000

#### 2-3 Functions for getting information on venues

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### 2-4 Get information on venues in Taipei and New Taipei City

In [9]:
taipei_categories = getNearbyVenues(names=city_data['District name'],
                                   latitudes=city_data['Latitude'],
                                   longitudes=city_data['Longitude']
                                  )
taipei_categories

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Kinfen Braised Pork Rice (金峰魯肉飯),25.032194,121.518534,Taiwanese Restaurant
1,"Zhongzheng Dist., Taipei City",25.032405,121.519884,臻味赤肉胡椒餅 烤地瓜,25.033022,121.518246,Bakery
2,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Chiang Kai-Shek Memorial Hall (中正紀念堂),25.034555,121.521835,Monument / Landmark
3,"Zhongzheng Dist., Taipei City",25.032405,121.519884,虎記商行,25.031744,121.519284,Café
4,"Zhongzheng Dist., Taipei City",25.032405,121.519884,National Theater (國家戲劇院),25.035197,121.518188,Theater
...,...,...,...,...,...,...,...
332,"Luzhou Dist., New Taipei City",25.089272,121.471246,星巴克 STARBUCKS 蘆洲三民門市,25.087397,121.471177,Coffee Shop
333,"Luzhou Dist., New Taipei City",25.089272,121.471246,肯德基 KFC（蘆洲三民餐廳）,25.087043,121.471592,Fried Chicken Joint
334,"Luzhou Dist., New Taipei City",25.089272,121.471246,FamilyMart_蘆洲民族店,25.091190,121.474040,Deli / Bodega
335,"Luzhou Dist., New Taipei City",25.089272,121.471246,捷運三民高中站 MRT Sanmin Senior High School Station,25.085741,121.473080,Metro Station


#### 2-5 See how many venues are in each area

In [10]:
taipei_categories['District'].value_counts()

Songshan Dist., Taipei City         57
Da’an Dist., Taipei City            40
Banqiao Dist., New Taipei City      39
Zhongzheng Dist., Taipei City       33
Wenshan Dist., Taipei City          23
Neihu Dist., Taipei City            23
Datong Dist., Taipei City           21
Yonghe Dist., New Taipei City       20
Xinyi Dist., Taipei City            19
Zhongshan Dist., Taipei City        13
Xinzhuang Dist., New Taipei City     8
Zhonghe Dist., New Taipei City       7
Nangang Dist., Taipei City           5
Wanhua Dist., Taipei City            5
Luzhou Dist., New Taipei City        4
Sanchong Dist., New Taipei City      4
Shilin Dist., Taipei City            3
Jinshan Dist., New Taipei City       3
Tucheng Dist., New Taipei City       2
Shenkeng Dist., New Taipei City      2
Sanzhi Dist., New Taipei City        1
Wanli Dist., New Taipei City         1
Shuangxi Dist., New Taipei City      1
Xizhi Dist., New Taipei City         1
Yingge Dist., New Taipei City        1
Shiding Dist., New Taipei

#### 2-6 Areas with too little data cannot be analyzed, so remove them

In [11]:
taipei_categories = taipei_categories.groupby('District').filter(lambda x : len(x)>=10)
taipei_categories

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Kinfen Braised Pork Rice (金峰魯肉飯),25.032194,121.518534,Taiwanese Restaurant
1,"Zhongzheng Dist., Taipei City",25.032405,121.519884,臻味赤肉胡椒餅 烤地瓜,25.033022,121.518246,Bakery
2,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Chiang Kai-Shek Memorial Hall (中正紀念堂),25.034555,121.521835,Monument / Landmark
3,"Zhongzheng Dist., Taipei City",25.032405,121.519884,虎記商行,25.031744,121.519284,Café
4,"Zhongzheng Dist., Taipei City",25.032405,121.519884,National Theater (國家戲劇院),25.035197,121.518188,Theater
...,...,...,...,...,...,...,...
305,"Yonghe Dist., New Taipei City",25.008102,121.516745,50嵐樂華店,25.007965,121.513975,Bubble Tea Shop
306,"Yonghe Dist., New Taipei City",25.008102,121.516745,阿里小廚,25.008220,121.513679,Steakhouse
307,"Yonghe Dist., New Taipei City",25.008102,121.516745,台南蔡虱目魚,25.004513,121.517750,Taiwanese Restaurant
308,"Yonghe Dist., New Taipei City",25.008102,121.516745,龜叟什錦麵,25.006000,121.513205,Noodle House


### 3. Foursquare category data

#### 3-1 Get json data

In [12]:
url = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            VERSION)           
# make the GET request
json_categories = requests.get(url).json()

#### 3-2 Flatten the json data and get the top parent category of all category

In [13]:
categories = json_normalize(json_categories['response']['categories'])
categories['1st_categories'] = ""

for idx in range(0,len(categories)):
    s1 = json_normalize(json_categories['response']['categories'][idx]['categories'])
    for idx2 in range(0,len(s1)):
        s2 = json_normalize(json_categories['response']['categories'][idx]['categories'][idx2]['categories'])
        if(json_categories['response']['categories'][idx]['categories'][idx2]['categories']!=[]):
            for idx3 in range(0,len(s2)):
                s3 = json_normalize(json_categories['response']['categories'][idx]['categories'][idx2]['categories'][idx3]['categories'])
                if(json_categories['response']['categories'][idx]['categories'][idx2]['categories'][idx3]['categories']!=[]):
                    for idx4 in range(0,len(s3)):
                        s4 = json_normalize(json_categories['response']['categories'][idx]['categories'][idx2]['categories'][idx3]['categories'][idx4]['categories'])
                        s4['1st_categories'] = json_categories['response']['categories'][idx]['name']
                        categories = categories.append(s4)
                s3['1st_categories'] = json_categories['response']['categories'][idx]['name']   
                categories = categories.append(s3)
        s2['1st_categories'] = json_categories['response']['categories'][idx]['name']
        categories = categories.append(s2)
    s1['1st_categories'] = json_categories['response']['categories'][idx]['name']
    categories = categories.append(s1)

categories = categories.drop(['id', 'pluralName','shortName','categories','icon.prefix','icon.suffix'], axis=1).reset_index(drop=True)

#### 3-3 Merge two dataset

In [14]:
taipei_merge = pd.merge(taipei_categories, categories, left_on = 'Venue Category', right_on = 'name')
taipei_merge.drop(['name'],axis=1)

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,1st_categories
0,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Kinfen Braised Pork Rice (金峰魯肉飯),25.032194,121.518534,Taiwanese Restaurant,Food
1,"Datong Dist., Taipei City",25.063424,121.513042,大橋頭老牌筒仔米糕,25.064753,121.511127,Taiwanese Restaurant,Food
2,"Datong Dist., Taipei City",25.063424,121.513042,旗魚新竹米粉,25.067305,121.511002,Taiwanese Restaurant,Food
3,"Datong Dist., Taipei City",25.063424,121.513042,昌吉豬血湯,25.065864,121.516489,Taiwanese Restaurant,Food
4,"Datong Dist., Taipei City",25.063424,121.513042,珠記大橋頭油飯,25.062899,121.514732,Taiwanese Restaurant,Food
...,...,...,...,...,...,...,...,...
283,"Banqiao Dist., New Taipei City",25.011865,121.457968,HiMall麗寶百貨廣場,25.012792,121.462615,Shopping Mall,Shop & Service
284,"Banqiao Dist., New Taipei City",25.011865,121.457968,Bon Appétit (朋派),25.011638,121.462537,Buffet,Food
285,"Banqiao Dist., New Taipei City",25.011865,121.457968,Hilton Executive Lounge,25.011340,121.462566,Lounge,Nightlife Spot
286,"Yonghe Dist., New Taipei City",25.008102,121.516745,太原養生會館,25.006819,121.513094,Massage Studio,Shop & Service


#### 3-4 See the number of different types of places

In [15]:
taipei_merge['1st_categories'].value_counts()

Food                           196
Shop & Service                  48
Outdoors & Recreation           20
Arts & Entertainment            11
Nightlife Spot                   8
Travel & Transport               3
Professional & Other Places      2
Name: 1st_categories, dtype: int64

#### 3-5 The number of venues is too small to analyze, so remove them

In [16]:
taipei_merge = taipei_merge.groupby('1st_categories').filter(lambda x : len(x)>=10)
taipei_merge

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,name,1st_categories
0,"Zhongzheng Dist., Taipei City",25.032405,121.519884,Kinfen Braised Pork Rice (金峰魯肉飯),25.032194,121.518534,Taiwanese Restaurant,Taiwanese Restaurant,Food
1,"Datong Dist., Taipei City",25.063424,121.513042,大橋頭老牌筒仔米糕,25.064753,121.511127,Taiwanese Restaurant,Taiwanese Restaurant,Food
2,"Datong Dist., Taipei City",25.063424,121.513042,旗魚新竹米粉,25.067305,121.511002,Taiwanese Restaurant,Taiwanese Restaurant,Food
3,"Datong Dist., Taipei City",25.063424,121.513042,昌吉豬血湯,25.065864,121.516489,Taiwanese Restaurant,Taiwanese Restaurant,Food
4,"Datong Dist., Taipei City",25.063424,121.513042,珠記大橋頭油飯,25.062899,121.514732,Taiwanese Restaurant,Taiwanese Restaurant,Food
...,...,...,...,...,...,...,...,...,...
282,"Banqiao Dist., New Taipei City",25.011865,121.457968,AWHILE. 外兒小館,25.014028,121.460790,Deli / Bodega,Deli / Bodega,Food
283,"Banqiao Dist., New Taipei City",25.011865,121.457968,HiMall麗寶百貨廣場,25.012792,121.462615,Shopping Mall,Shopping Mall,Shop & Service
284,"Banqiao Dist., New Taipei City",25.011865,121.457968,Bon Appétit (朋派),25.011638,121.462537,Buffet,Buffet,Food
286,"Yonghe Dist., New Taipei City",25.008102,121.516745,太原養生會館,25.006819,121.513094,Massage Studio,Massage Studio,Shop & Service


In [17]:
taipei_merge.to_csv('taipei_geo.csv')