## Introduction/Business Problem

As Tokyo is the most populated city in Japan, there are a bunch of people who is willing to try different kinds of  restaurants in this city. The goal of this project is to find restaurants that people like mostly. We aim to provide effective information based on our investigation on multiple neighborhoods and to assist in the selection of most appropriate areas that will promote restaurant industry.

## Data

The datasets we need are displayed below:
1. Wikipedia: Special wards of Tokyo
2. Foursquare API: information on restaurants in the neighborhoods of Tokyo
3. Geopy: geological location in Tokyo

## Methodology

First, use Special wards of Tokyo from Wikipedia to create a dataframe that includes names, districts, areas of Tokyo.

In [47]:
import numpy as np 
import pandas as pd 
from geopy.geocoders import Nominatim 

In [23]:
df=pd.read_html('https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards')[3]
df.head()

Unnamed: 0,No.,Flag,Name,Kanji,Population(as of October 2016,Density(/km2),Area(km2),Major districts
0,1,,Chiyoda,千代田区,59441,5100,11.66,"Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,..."
1,2,,Chūō,中央区,147620,14460,10.21,"Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb..."
2,3,,Minato,港区,248071,12180,20.37,"Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong..."
3,4,,Shinjuku,新宿区,339211,18620,18.22,"Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich..."
4,5,,Bunkyō,文京区,223389,19790,11.29,"Hongō, Yayoi, Hakusan"


Then we do data-processing step.

In [24]:
df=df.drop(columns=['Flag', 'Major districts'] )
df.head()

Unnamed: 0,No.,Name,Kanji,Population(as of October 2016,Density(/km2),Area(km2)
0,1,Chiyoda,千代田区,59441,5100,11.66
1,2,Chūō,中央区,147620,14460,10.21
2,3,Minato,港区,248071,12180,20.37
3,4,Shinjuku,新宿区,339211,18620,18.22
4,5,Bunkyō,文京区,223389,19790,11.29


In [44]:
df=df.rename(columns={df.columns[3]:'Population', 'Density(/km2)':'Density', 'Area(km2)':'Area'})
df=df.drop([23])
df

Unnamed: 0,No.,Name,Kanji,Population,Density,Area
0,1,Chiyoda,千代田区,59441,5100,11.66
1,2,Chūō,中央区,147620,14460,10.21
2,3,Minato,港区,248071,12180,20.37
3,4,Shinjuku,新宿区,339211,18620,18.22
4,5,Bunkyō,文京区,223389,19790,11.29
5,6,Taitō,台東区,200486,19830,10.11
6,7,Sumida,墨田区,260358,18910,13.77
7,8,Kōtō,江東区,502579,12510,40.16
8,9,Shinagawa,品川区,392492,17180,22.84
9,10,Meguro,目黒区,280283,19110,14.67


In [45]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



Now we use geopy to get geospatial data.

In [49]:
geolocator = Nominatim(user_agent="Tokyo_explorer")

df['Major_Dist_Coord']= df['Kanji'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df[['Latitude', 'Longitude']] = df['Major_Dist_Coord'].apply(pd.Series)
df=df.drop(['Major_Dist_Coord'], axis=1)
df

Unnamed: 0,No.,Name,Kanji,Population,Density,Area,Latitude,Longitude
0,1,Chiyoda,千代田区,59441,5100,11.66,35.69381,139.753216
1,2,Chūō,中央区,147620,14460,10.21,35.666255,139.775565
2,3,Minato,港区,248071,12180,20.37,35.643227,139.740055
3,4,Shinjuku,新宿区,339211,18620,18.22,35.693763,139.703632
4,5,Bunkyō,文京区,223389,19790,11.29,35.71881,139.744732
5,6,Taitō,台東区,200486,19830,10.11,35.71745,139.790859
6,7,Sumida,墨田区,260358,18910,13.77,35.700429,139.805017
7,8,Kōtō,江東区,502579,12510,40.16,35.649154,139.81279
8,9,Shinagawa,品川区,392492,17180,22.84,35.599252,139.73891
9,10,Meguro,目黒区,280283,19110,14.67,35.62125,139.688014


Then using python folium library to visualize geographic details of Tokyo and its 23 major districts.

In [50]:
import json
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [51]:
address = 'Tokyo'
geolocator = Nominatim(user_agent="Tokyo_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tokyo are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Tokyo are 35.6828387, 139.7594549.


In [52]:
# create map of Tokyo using latitude and longitude values
map_tokyo = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tokyo) 
    
map_tokyo

Now using Foursquare API to explore the neighborhoods.

In [53]:
CLIENT_ID = 'MQ0XIFC4T5X5MCUWN1M3VZ011RFZQPP5JZ1AO02PIIWJZPN0' # your Foursquare ID
CLIENT_SECRET = 'DO3SU02BKYOAAVWOWC33O2I5QZNZLI1ULXDJSWE4A0CHR5S5' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MQ0XIFC4T5X5MCUWN1M3VZ011RFZQPP5JZ1AO02PIIWJZPN0
CLIENT_SECRET:DO3SU02BKYOAAVWOWC33O2I5QZNZLI1ULXDJSWE4A0CHR5S5


Now using Chiyoda as an example.

In [56]:
neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Name'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Chiyoda are 35.6938097, 139.7532163.


In [57]:
LIMIT = 50
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [58]:
results = requests.get(url).json()

In [59]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [60]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Jimbocho Kurosu (神保町 黒須),Ramen Restaurant,35.695539,139.754851
1,Kanda Tendonya (神田天丼家),Tempura Restaurant,35.695765,139.754682
2,Kitanomaru Park (北の丸公園),Park,35.691653,139.751201
3,Nippon Budokan (日本武道館),Stadium,35.693356,139.749865
4,Mori no Butchers (森のブッチャーズ),Gastropub,35.69477,139.75598


In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    
    return(nearby_venues)

In [62]:
Tokyo_venues = getNearbyVenues(names=df['Name'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Chiyoda
Chūō
Minato
Shinjuku
Bunkyō
Taitō
Sumida
Kōtō
Shinagawa
Meguro
Ōta
Setagaya
Shibuya
Nakano
Suginami
Toshima
Kita
Arakawa
Itabashi
Nerima
Adachi
Katsushika
Edogawa


In [65]:
Tokyo_restaurant = Tokyo_venues[Tokyo_venues['Venue Category'].str.contains('Restaurant')].reset_index(drop=True)
Tokyo_restaurant.index = np.arange(1, len(Tokyo_Venues_only_restaurant )+1)

In [67]:
print (Tokyo_restaurant['Venue Category'].value_counts())

Ramen Restaurant             41
Japanese Restaurant          33
Chinese Restaurant           31
Sushi Restaurant             26
Italian Restaurant           16
Soba Restaurant              13
Donburi Restaurant            9
Tonkatsu Restaurant           9
Unagi Restaurant              7
Indian Restaurant             6
Restaurant                    5
Yakitori Restaurant           5
Seafood Restaurant            4
Yoshoku Restaurant            4
Thai Restaurant               3
Japanese Curry Restaurant     3
Korean Restaurant             3
Dumpling Restaurant           3
Tempura Restaurant            3
Szechuan Restaurant           2
Sukiyaki Restaurant           2
Asian Restaurant              2
Nabe Restaurant               2
Kaiseki Restaurant            1
Fast Food Restaurant          1
Kushikatsu Restaurant         1
Hotpot Restaurant             1
Middle Eastern Restaurant     1
Vietnamese Restaurant         1
French Restaurant             1
Kosher Restaurant             1
Spanish 

In [76]:
# create a dataframe of top 10 categories
Tokyo_Top10 = Tokyo_restaurant['Venue Category'].value_counts()[0:10].to_frame(name='frequency')
Tokyo_Top10=Tokyo_Top10.reset_index()
Tokyo_Top10

Tokyo_Top10.rename(index=str, columns={"index": "Venue_Category", "frequency": "Frequency"}, inplace=True)
Tokyo_Top10

Unnamed: 0,Venue_Category,Frequency
0,Ramen Restaurant,41
1,Japanese Restaurant,33
2,Chinese Restaurant,31
3,Sushi Restaurant,26
4,Italian Restaurant,16
5,Soba Restaurant,13
6,Donburi Restaurant,9
7,Tonkatsu Restaurant,9
8,Unagi Restaurant,7
9,Indian Restaurant,6


In [79]:
Tokyo_Venues_restaurant = Tokyo_restaurant.groupby(['Neighborhood'])['Venue Category'].apply(lambda x: x[x.str.contains('Restaurant')].count())

In [80]:
Tokyo_Venues_restaurant

Neighborhood
Adachi         2
Arakawa        9
Bunkyō         3
Chiyoda       20
Chūō          39
Edogawa        2
Itabashi       3
Katsushika     4
Kita          14
Kōtō           3
Meguro         6
Minato        11
Nakano         8
Nerima         1
Setagaya       7
Shibuya       13
Shinagawa      7
Shinjuku      15
Suginami      12
Sumida         8
Taitō         16
Toshima       20
Ōta           28
Name: Venue Category, dtype: int64

In [83]:
Tokyo_Venues_restaurant_df = Tokyo_Venues_restaurant.to_frame().reset_index()
Tokyo_Venues_restaurant_df.columns = ['Neighborhood', 'Number of Restaurant']
Tokyo_Venues_restaurant_df.index = np.arange(1, len(Tokyo_Venues_restaurant_df)+1)
Tokyo_Venues_restaurant_df['Number of Restaurant'].to_list()
Tokyo_Venues_restaurant_df['Neighborhood'].to_list()
Tokyo_Venues_restaurant_df

Unnamed: 0,Neighborhood,Number of Restaurant
1,Adachi,2
2,Arakawa,9
3,Bunkyō,3
4,Chiyoda,20
5,Chūō,39
6,Edogawa,2
7,Itabashi,3
8,Katsushika,4
9,Kita,14
10,Kōtō,3


In [97]:
Tokyo_onehot = pd.get_dummies(Tokyo_restaurant[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Tokyo_onehot['Neighborhood'] = Tokyo_restaurant['Neighborhood'] 
Tokyo_onehot.head()
Tokyo_onehot.shape

(251, 43)

In [93]:
onehot_df=Tokyo_onehot.groupby('Neighborhood').mean().reset_index()
onehot_df

Unnamed: 0,Neighborhood,African Restaurant,Asian Restaurant,Brazilian Restaurant,Chinese Restaurant,Donburi Restaurant,Dongbei Restaurant,Dumpling Restaurant,Fast Food Restaurant,French Restaurant,...,Sushi Restaurant,Szechuan Restaurant,Tempura Restaurant,Thai Restaurant,Tonkatsu Restaurant,Udon Restaurant,Unagi Restaurant,Vietnamese Restaurant,Yakitori Restaurant,Yoshoku Restaurant
0,Adachi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arakawa,0.0,0.0,0.0,0.222222,0.111111,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bunkyō,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chiyoda,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05
4,Chūō,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,...,0.435897,0.0,0.025641,0.0,0.025641,0.0,0.051282,0.0,0.025641,0.0
5,Edogawa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Itabashi,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Katsushika,0.0,0.0,0.0,0.0,0.5,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Kita,0.0,0.0,0.0,0.071429,0.071429,0.0,0.071429,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Kōtō,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [102]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [105]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']

for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


In [110]:
neighborhoods_venues = pd.DataFrame(columns=columns)
neighborhoods_venues['Neighborhood'] = onehot_df['Neighborhood']

for ind in np.arange(onehot_df.shape[0]):
    neighborhoods_venues.iloc[ind, 1:] = return_most_common_venues(onehot_df.iloc[ind, :], num_top_venues)

neighborhoods_venues.head(23)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adachi,Restaurant,Japanese Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Curry Restaurant,Italian Restaurant,Indian Restaurant
1,Arakawa,Ramen Restaurant,Chinese Restaurant,Indian Restaurant,Donburi Restaurant,Japanese Restaurant,Italian Restaurant,Yoshoku Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant
2,Bunkyō,Chinese Restaurant,Japanese Restaurant,Szechuan Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Curry Restaurant,Italian Restaurant
3,Chiyoda,Ramen Restaurant,Chinese Restaurant,Japanese Curry Restaurant,Sushi Restaurant,Italian Restaurant,Indian Restaurant,Yoshoku Restaurant,Tempura Restaurant,Hotpot Restaurant,Kebab Restaurant
4,Chūō,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Soba Restaurant,Unagi Restaurant,German Restaurant,Tonkatsu Restaurant,Yakitori Restaurant,Tempura Restaurant,Donburi Restaurant
5,Edogawa,Ramen Restaurant,Italian Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Indian Restaurant
6,Itabashi,Chinese Restaurant,Restaurant,Italian Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Restaurant,Japanese Curry Restaurant
7,Katsushika,Donburi Restaurant,Ramen Restaurant,Dumpling Restaurant,Yoshoku Restaurant,Indian Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Restaurant,Japanese Curry Restaurant
8,Kita,Ramen Restaurant,Japanese Restaurant,Italian Restaurant,Chinese Restaurant,Donburi Restaurant,Dumpling Restaurant,Kushikatsu Restaurant,Yoshoku Restaurant,Korean Restaurant,Kebab Restaurant
9,Kōtō,Chinese Restaurant,Restaurant,Indian Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Restaurant,Japanese Curry Restaurant


### Clustering

In [111]:
kclusters = 5

onehot_df_clustering = onehot_df.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(onehot_df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 4, 1, 4, 1, 0, 1, 1, 0], dtype=int32)

In [112]:
neighborhoods_venues.insert(0, 'Cluster Labels', kmeans.labels_)

Tokyo_merged = df

Tokyo_merged.rename(columns={'Name':'Neighborhood'}, inplace=True)
Tokyo_merged = Tokyo_merged.join(neighborhoods_venues.set_index('Neighborhood'), on='Neighborhood')

Tokyo_merged.head()

Unnamed: 0,No.,Neighborhood,Kanji,Population,Density,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Chiyoda,千代田区,59441,5100,11.66,35.69381,139.753216,1,Ramen Restaurant,Chinese Restaurant,Japanese Curry Restaurant,Sushi Restaurant,Italian Restaurant,Indian Restaurant,Yoshoku Restaurant,Tempura Restaurant,Hotpot Restaurant,Kebab Restaurant
1,2,Chūō,中央区,147620,14460,10.21,35.666255,139.775565,4,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Soba Restaurant,Unagi Restaurant,German Restaurant,Tonkatsu Restaurant,Yakitori Restaurant,Tempura Restaurant,Donburi Restaurant
2,3,Minato,港区,248071,12180,20.37,35.643227,139.740055,4,Soba Restaurant,Yakitori Restaurant,Indian Restaurant,Kosher Restaurant,Korean Restaurant,Chinese Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Restaurant,French Restaurant
3,4,Shinjuku,新宿区,339211,18620,18.22,35.693763,139.703632,4,Japanese Restaurant,Thai Restaurant,Yakitori Restaurant,Yoshoku Restaurant,Unagi Restaurant,Tonkatsu Restaurant,Chinese Restaurant,Sushi Restaurant,Hotpot Restaurant,Soba Restaurant
4,5,Bunkyō,文京区,223389,19790,11.29,35.71881,139.744732,4,Chinese Restaurant,Japanese Restaurant,Szechuan Restaurant,Yoshoku Restaurant,Hotpot Restaurant,Korean Restaurant,Kebab Restaurant,Kaiseki Restaurant,Japanese Curry Restaurant,Italian Restaurant


In [114]:

# create map
map_restaurants = folium.Map(location=[latitude,longitude], tiles='cartodbpositron', 
                               attr="<a href=https://github.com/python-visualization/folium/>Folium</a>")

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
#rainbow = ['#00ff00', '#ff00ff','#0000ff','#ffa500' ,'#ff0000']
#Districts = ['Nagatacho', 'Nihonbashi', 'Shinjuku', 'Shinagawa', 'Shibuya']

# add markers to the map
for lat, lon, poi, cluster in zip(Tokyo_merged['Latitude'], 
                                  Tokyo_merged['Longitude'], 
                                  Tokyo_merged['Neighborhood'], 
                                  Tokyo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=list_rest_no[list_dist.index(poi)]*0.5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_restaurants)
       
map_restaurants

## Results

From the analysis above, we conclude the following results:
1. Chuo ward and Ota ward are two wards that have most restaurants
2. Most restaurants in Tokyo are divided into 2 clusters.
3. Ramen restaurants are the most common venue in Tokyo.


## Discussion

Based on our observations above, most restaurants are in cluster 1 and cluster 4, which indicates that people are most likely to go to restaurants in these areas. However, there still are some shortcomings in our analysis since we only take restaurants categories such as ramen restaurants, Chinese restaurants into consideration. Other factors such as restaurants' prices, people's salaries are also important for us to do the analysis.

## Conclusion


In this example, we use geospatial data of Tokyo to cluster neighborhoods based on the most common restaurants. The final results help people choose restaurants more easily. As we can see, many real-life cases could be solved by using data analysis. In this example, however, besides the frequency of restaurants categories we chose, many other factors should be taken into consideration in order to conclude a more comprehensive result.