# Segmenting and Clustering Neighborhoods in Tokyo
### Capstone Project - The Battle of Neighborhoods
##### IBM DATA SCIENCE PROFESSIONAL CERTIFICATE - # 9 Applied Data Science Capstone - WEEK 5 | COURSERA
##### *Jiahe Yu*

### **INTRODUCTION**

***Is Kichijoji the Only Place to Live?*** (Japanese: 吉祥寺だけが住みたい街ですか?; Romaji: Kichijoji dake ga Sumitai Machi Desu ka?), a Japanese TV series adapted from a manga by Hirochi Maki, tells how the Shigeta twins from an apartment rental agency in Kichijoji - the most sought-after neighborhood in Tokyo - show their customers some great under-the-radar neighborhoods other than Kichijoji in Tokyo.

The shigeta twins can always meet the needs of the clients because the twins are extremely familiar with the neighborhoods in Tokyo and they stand in the customers' shoes. What if an apartment rental agency without the Shigeta twins wants to make recommendations as good as they did? A machine learning approach is to segement and cluster the neighborhoods in Tokyo based on their features, so that it can help apartment rental agencies - our target audience - to make smart recommendations at high proficiency and at low cost.

This project applies K-means clustering on the neighborhoods in Tokyo and the results will help the apartment rental agencies in Tokyo identify the neighborhoods that match the needs of customers in an efficient and economical way. 

### **DATA**

Tokyo neighborhood data and Foursquare location data will be used together to explore and cluster the neighborhoods in Tokyo.

Tokyo neighborhood data can be accessed from this website: *http://japanzipcodes.blogspot.com/2013/07/the-complete-zip-codes-of-tokyo-japan.html*. This web page lists out zip codes of all neighborhoods in Tokyo. This web page will be scraped and the data will be wrangled, cleaned, and read into a pandas dataframe as my first step.

In [1]:
# import the libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas.compat import StringIO

In [2]:
# scrap data from page
website_url = requests.get('http://japanzipcodes.blogspot.com/2013/07/the-complete-zip-codes-of-tokyo-japan.html').text
soup = BeautifulSoup(website_url, 'html.parser') # can also use lxml in the quotes
text = soup.find('div', class_='post-body entry-content').text

In [3]:
# transform the data to a dataframe
table = pd.read_csv(StringIO(text), sep='\n', header = None)
new = table[0].str.split(" ", n = 19, expand = True)
new2 = new[19].str.split(",", n = 2, expand = True)
new3 = new2[2].str.split(" ", n = 4, expand = True)

In [4]:
# check the created dataframe
print(new.shape)
new.head() # 16 is neightborhood; 18 is borough

(3754, 20)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,The,Zip,code,of,,,,,,,,,,,,,,,Adachiku,", Tokyo , Japan is 120-0000 ."
1,The,Zip,code,of,,,,,,,,,,,,,Adachi,",",Adachiku,", Tokyo , Japan is 120-0015 ."
2,The,Zip,code,of,,,,,,,,,,,,,Aoi(1-3Chome),",",Adachiku,", Tokyo , Japan is 120-0012 ."
3,The,Zip,code,of,,,,,,,,,,,,,Aoi(4-6Chome),",",Adachiku,", Tokyo , Japan is 121-0012 ."
4,The,Zip,code,of,,,,,,,,,,,,,Ayase,",",Adachiku,", Tokyo , Japan is 120-0005 ."


In [5]:
# checked the created dataframe
print(new3.shape)
new3.head() # 3 is zip code

(3754, 5)


Unnamed: 0,0,1,2,3,4
0,,Japan,is,120-0000,.
1,,Japan,is,120-0015,.
2,,Japan,is,120-0012,.
3,,Japan,is,121-0012,.
4,,Japan,is,120-0005,.


In [6]:
# drop the unnecessary columns in the dataframes
cols = [0,1,2,3,4,5,6,7,8,8,9,10,11,12,13,14,15,17,19]
new.drop(new.columns[cols],axis=1,inplace=True)
cols3 = [0,1,2,4]
new3.drop(new3.columns[cols3], axis =1,inplace = True)

In [7]:
# combine the dataframe of neighborhoods and boroughs with the dataframe of zipcode
df = pd.concat([new, new3], axis=1)
# rename columns
df.columns = ['neighborhood', 'borough', 'zipcode']

In [8]:
print(df.shape)
df.head()

(3754, 3)


Unnamed: 0,neighborhood,borough,zipcode
0,,Adachiku,120-0000
1,Adachi,Adachiku,120-0015
2,Aoi(1-3Chome),Adachiku,120-0012
3,Aoi(4-6Chome),Adachiku,121-0012
4,Ayase,Adachiku,120-0005


In [9]:
# remove the rows that the same neighborhood name

# sort by neighborhood name
df2 = df.sort_values("neighborhood") 

# dropping duplicate neighborhoods
df2.drop_duplicates(subset ="neighborhood", inplace = True)

In [10]:
# check the trimmed dataframe
print(df2.shape)
print(df2.head())
# check if any duplicates in zipcode column
print(df2.drop_duplicates(subset ="zipcode").shape) # no duplicated zip codes

(1458, 3)
     neighborhood     borough   zipcode
0                    Adachiku  120-0000
92       Aburadai     Akiruno  197-0827
1          Adachi    Adachiku  120-0015
2829     Agebacho  Shinjukuku  162-0824
1501  Aiharamachi     Machida  194-0211
(1458, 3)


In [11]:
# drop the rows that has NA

# drop the first row which does not contain neighborhood information
df2.drop(df2.index[0], inplace=True)
print(df2.shape)
print(df2.head())

# drop the last row that has "none" values
df2.dropna(axis = 0, inplace = True)
print(df2.shape)
print(df2.head())
print(df2.tail())

(1457, 3)
     neighborhood     borough   zipcode
92       Aburadai     Akiruno  197-0827
1          Adachi    Adachiku  120-0015
2829     Agebacho  Shinjukuku  162-0824
1501  Aiharamachi     Machida  194-0211
1204      Aioicho  Itabashiku  174-0044
(1456, 3)
     neighborhood     borough   zipcode
92       Aburadai     Akiruno  197-0827
1          Adachi    Adachiku  120-0015
2829     Agebacho  Shinjukuku  162-0824
1501  Aiharamachi     Machida  194-0211
1204      Aioicho  Itabashiku  174-0044
     neighborhood          borough   zipcode
2351      Zambori  Musashimurayama  208-0034
3567    Zempukuji       Suginamiku  167-0041
3752    Zoshigaya        Toshimaku  171-0032
1160      Zoshiki    Higashiyamato  207-0032
1549   Zushimachi          Machida  194-0203


In [12]:
# sort the dataset back by borough and neighborhood
df2 = df2.sort_values(by = ["borough","neighborhood"]) 

# reset the index
df2.reset_index(drop=True, inplace = True)

In [13]:
# check the cleaned dataset
print(df2.shape)
df2.head()

(1456, 3)


Unnamed: 0,neighborhood,borough,zipcode
0,Adachi,Adachiku,120-0015
1,Aoi(1-3Chome),Adachiku,120-0012
2,Aoi(4-6Chome),Adachiku,121-0012
3,Ayase,Adachiku,120-0005
4,Chuohoncho(1-2Chome),Adachiku,120-0011


As you can see from the above cleaned dataset, there are two columns: neighborhood and borough. Tokyo is often referred to as a city but it is officially known as "Tokyo-to" - Tokyo Metropolis or the Greater Tokyo Area. It contains 23 special wards, 26 cities, 5 towns, and 8 villages, each of which has a local government. The Tokyo Metropolitan Government administers the whole metropolis including the special wards, cities, towns, and villages. In Japan, a ward/city/town/village as an administrative unit of a metropolis is closely equivalent to a London borough or a New York borough. Therefore in this cleaned dataset, the second column which contains the names of wards/cities/towns/villages in Tokyo was named as "borough" so that it is easier to understand.

Now we have built the dataframe combining postcodes, neighborhoods, and boroughs, I will obtain the latitude and longitude coordinates using the *pgeocode* package for each neighborhood in order to utilize the Foursquare location data.

In [14]:
# install the package for accessing geographic coordinates
!pip install pgeocode
import pgeocode



In [15]:
# tell package that we want geographic coordinates of Japan
nomi = pgeocode.Nominatim('jp')

In [16]:
# test by using the first five neighborhoods 
print(df2.head())
nomi.query_postal_code(["120-0015", "120-0012", "121-0012", "120-0005","120-0011"])

           neighborhood   borough   zipcode
0                Adachi  Adachiku  120-0015
1         Aoi(1-3Chome)  Adachiku  120-0012
2         Aoi(4-6Chome)  Adachiku  121-0012
3                 Ayase  Adachiku  120-0005
4  Chuohoncho(1-2Chome)  Adachiku  120-0011


Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,120-0015,JP,Adachi,Tokyo To,40,Adachi Ku,1865750.0,,,35.7632,139.8076,4
1,120-0012,JP,Aoi(1-3-Chome),Tokyo To,40,Adachi Ku,1865750.0,,,35.7651,139.8129,1
2,121-0012,JP,Aoi(4-6-Chome),Tokyo To,40,Adachi Ku,1865750.0,,,35.7874,139.8195,1
3,120-0005,JP,Ayase,Tokyo To,40,Adachi Ku,1865750.0,,,35.7691,139.8264,4
4,120-0011,JP,Chuohoncho(1.2-Chome),Tokyo To,40,Adachi Ku,1865750.0,,,35.7651,139.8129,1


In [17]:
# convert the zipcode column in the dataframe into a list named as zipcode, 
# so that it can be used to access the geographic coordinates
zipcode = df2['zipcode'].values.tolist()

In [18]:
# access the geographic coordinates based on the zipcode of neighborhoods in Tokyo and save it as a dataframe
df_geo = nomi.query_postal_code(zipcode)

In [19]:
# check the dataframe
print(df_geo.shape) 
df_geo.head(2)

(1456, 12)


Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,120-0015,JP,Adachi,Tokyo To,40,Adachi Ku,1865750.0,,,35.7632,139.8076,4
1,120-0012,JP,Aoi(1-3-Chome),Tokyo To,40,Adachi Ku,1865750.0,,,35.7651,139.8129,1


In [20]:
# check our cleaned dataset
print(df2.shape) # same shape
df2.head(2)

(1456, 3)


Unnamed: 0,neighborhood,borough,zipcode
0,Adachi,Adachiku,120-0015
1,Aoi(1-3Chome),Adachiku,120-0012


In [21]:
# merge df_geo and our cleaned dataset
df3 = pd.merge(df2, df_geo, left_on="zipcode", right_on="postal_code")
# drop the unnecessary columns after merging
df3.drop(['postal_code','country code','place_name','state_name','state_code','county_name','county_code','community_name','community_code','accuracy'], axis=1, inplace = True)

In [22]:
# checking the final dataset: df3
print(df3.shape)
df3.head()

(1456, 5)


Unnamed: 0,neighborhood,borough,zipcode,latitude,longitude
0,Adachi,Adachiku,120-0015,35.7632,139.8076
1,Aoi(1-3Chome),Adachiku,120-0012,35.7651,139.8129
2,Aoi(4-6Chome),Adachiku,121-0012,35.7874,139.8195
3,Ayase,Adachiku,120-0005,35.7691,139.8264
4,Chuohoncho(1-2Chome),Adachiku,120-0011,35.7651,139.8129


Now we have a cleaned dataset containing the geography information of Tokyo neighborhoods that we need in this project. The next step is to access venue information of those neighborhoods by using Foursquare location data.

I already obtained Foursquare credentials by setting up an account on Foursquare Developer API. Having both the coordinates of the neighborhoods and the Foursquare credentials enables me to access Foursquare location data. Foursquare location data offers comprehensive and accurate information about venues of given locations, for examples, restaurants, entertainment, hotels, stores, and others. The massive dataset of location data built by Foursquare also powers third-party apps, including Evernote, Uber, Flickr and Jawbone.

I will explore the neighborhoods and conduct cluster analysis by leveraging the Foursquare location data in combination with Tokyo neighborhood data. Visualizations and recommendations based on results of the cluster analysis will be made to help the apartment rental agencies in Tokyo identify the neighborhoods that match the needs of customers.

How I accessed and utilized Foursquare location data and how I conducted cluster analysis on Tokyo neighborhoods will be further explained in details in the methodology part.

### **METHODOLOGY**

To get started, I will create a map of Tokyo with neighborhoods superimposed on top

In [23]:
# import the libraries that we may need

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [24]:
# get the latitude and longitude values of Tokyo

address = 'Tokyoto, JP'

geolocator = Nominatim(user_agent="tokyo_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tokyo are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Tokyo are 35.7033139, 139.7604984.


In [25]:
# create map of Toronto using latitude and longitude values
map_tokyo = folium.Map(location=[latitude, longitude], zoom_start=10)

In [26]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df3['latitude'], df3['longitude'], df3['borough'], df3['neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='cadetblue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tokyo)

In [27]:
# display the map of toronto
map_tokyo

You can zoom in and zoom out to check the neighborhoods in Tokyo on the above interactive map.

Then I will explore the first neighoborhood as both an example and a practice for exploring all the neighborhoods later.

In [28]:
# define foursquare credentials and version

CLIENT_ID = 'TTLEAE2OSKHU540HSUGRBWFJOF11SNV5FUVTDUQLYMKF1DZA' # your Foursquare ID
CLIENT_SECRET = '0TCLYFNQ01KYBIXQULKX1M0INDBVUJRXQPIJ0XY4BSSPKJ1N' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TTLEAE2OSKHU540HSUGRBWFJOF11SNV5FUVTDUQLYMKF1DZA
CLIENT_SECRET:0TCLYFNQ01KYBIXQULKX1M0INDBVUJRXQPIJ0XY4BSSPKJ1N


In [29]:
# Get the neighborhood's name.
df3.loc[0, 'neighborhood']

'Adachi'

In [30]:
# Get the neighborhood's latitude and longitude values
n_lat = df3.loc[0, 'latitude'] # neighborhood latitude value
n_lng = df3.loc[0, 'longitude'] # neighborhood longitude value
n_name = df3.loc[0, 'neighborhood'] # neighborhood name
print('Latitude and longitude values of {} are {}, {}.'.format(n_name, 
                                                               n_lat, 
                                                               n_lng))

Latitude and longitude values of Adachi are 35.7632, 139.8076.


Now we want to get the top 100 venues that are in this neighbourhood within a radius of 500 meters

In [31]:
# create the GET request URL
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    n_lat, 
    n_lng, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=TTLEAE2OSKHU540HSUGRBWFJOF11SNV5FUVTDUQLYMKF1DZA&client_secret=0TCLYFNQ01KYBIXQULKX1M0INDBVUJRXQPIJ0XY4BSSPKJ1N&v=20180605&ll=35.7632,139.8076&radius=500&limit=100'

In [32]:
# Send the GET request and examine the resutls
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c7b0a024c1f67636dcfe038'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Adachi',
  'headerFullLocation': 'Adachi, Tokyo',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 20,
  'suggestedBounds': {'ne': {'lat': 35.7677000045, 'lng': 139.81313535199772},
   'sw': {'lat': 35.758699995499995, 'lng': 139.8020646480023}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e192b11e4cd49a7e3f157e4',
       'name': 'カラツケグレ 五反野店',
       'location': {'address': '足立4-37-9',
        'lat': 35.765637991153326,
        'lng': 139.80814389359088,
        'labeledLatLngs': [{'label': 'display',
          'lat': 35.765637991153326,
          '

In [33]:
# borrow the get_category_type function from the Foursquare lab
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [34]:
# clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,カラツケグレ 五反野店,Noodle House,35.765638,139.808144
1,サミットストア 五反野店,Supermarket,35.767233,139.809015
2,7-Eleven (セブンイレブン 足立一丁目店),Convenience Store,35.764449,139.807945
3,サーティワン アイスクリーム 五反野駅前店,Ice Cream Shop,35.766368,139.808839
4,Gotanno Station (TS11) (五反野駅),Train Station,35.766107,139.809383


In [35]:
# number of venues returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

20 venues were returned by Foursquare.


As shown in the table above, we have obtained all the venue information of the first neighborhood in Tokyo. Now we are going to explore all the neighborhoods in Tokyo.

In [36]:
# create a function to repeat the same process to all the neighborhoods in Toronto

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
# run the above function on each neighborhood and create a new dataframe called tokyo_venues

tokyo_venues = getNearbyVenues(names=df3['neighborhood'],
                                   latitudes=df3['latitude'],
                                   longitudes=df3['longitude'])

Adachi
Aoi(1-3Chome)
Aoi(4-6Chome)
Ayase
Chuohoncho(1-2Chome)
Chuohoncho(3-5Chome)
Hanahata
Higashiayase
Higashihokima
Higashiiko
Higashirokugatsucho
Hinodecho
Hirano
Hitotsuya
Hokima
Hozukacho
Iko
Ikohoncho
Iriya
Iriyamachi
Kaga
Kahei
Kitakaheicho
Kodo
Kohoku
Kojiya
Kojiyahoncho
Kurihara
Minamihanahata
Miyagi
Motoki
Motokihigashimachi
Motokikitamachi
Motokiminamimachi
Motokinishimachi
Mutsuki
Nakagawa
Nishiarai
Nishiaraihoncho
Nishiaraisakaecho
Nishiayase
Nishihokima
Nishiiko
Nishiikocho
Nishikahei
Nishitakenotsuka
Odai
Ogi
Okino
Oyata
Rokucho
Rokugatsu
Sano
Saranuma
Sekibara
Senju
Senjuakebonocho
Senjuasahicho
Senjuazuma
Senjuhashidocho
Senjukawaracho
Senjukotobukicho
Senjumidoricho
Senjumiyamotocho
Senjumotomachi
Senjunakacho
Senjunakaicho
Senjuokawacho
Senjusakuragi
Senjusekiyacho
Senjutatsutacho
Senjuyanagicho
Shikahama
Shimane
Shinden
Shinmei
Shinmeiminami
Takenotsuka
Tatsunuma
Toneri
Tonerikoen
Tonerimachi
Towa
Tsubaki
Umeda
Umejima
Yanagihara
Yanaka
Yazaike
Aburadai
Ajiro
Akiga

In [38]:
# check the size of the resulting dataframe
print(tokyo_venues.shape)
tokyo_venues.head()

(50004, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adachi,35.7632,139.8076,カラツケグレ 五反野店,35.765638,139.808144,Noodle House
1,Adachi,35.7632,139.8076,サミットストア 五反野店,35.767233,139.809015,Supermarket
2,Adachi,35.7632,139.8076,7-Eleven (セブンイレブン 足立一丁目店),35.764449,139.807945,Convenience Store
3,Adachi,35.7632,139.8076,サーティワン アイスクリーム 五反野駅前店,35.766368,139.808839,Ice Cream Shop
4,Adachi,35.7632,139.8076,Gotanno Station (TS11) (五反野駅),35.766107,139.809383,Train Station


In [39]:
# the number of venues that were returned for each neighborhood
tokyo_venues['Neighborhood'] = tokyo_venues['Neighborhood'].astype(str) # import to convert it to str! 
tokyo_venues_count = tokyo_venues.groupby('Neighborhood').count()
print(tokyo_venues_count.shape)
tokyo_venues_count.head()

(1441, 6)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aburadai,9,9,9,9,9,9
Adachi,20,20,20,20,20,20
Agebacho,100,100,100,100,100,100
Aiharamachi,15,15,15,15,15,15
Aioicho,30,30,30,30,30,30


In [40]:
# number of unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(tokyo_venues['Venue Category'].unique())))

There are 440 uniques categories.


In [41]:
# one hot encoding
tokyo_onehot = pd.get_dummies(tokyo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tokyo_onehot['Neighborhood'] = tokyo_venues['Neighborhood'] 

print(tokyo_onehot.shape)

(50004, 440)


In [42]:
# check where is the neighborhood column in the dataframe
tokyo_onehot.columns.get_loc('Neighborhood')

273

In [43]:
# move neighborhood column to the first column
fixed_columns = [tokyo_onehot.columns[264]] + list(tokyo_onehot.columns[:264]) + list(tokyo_onehot.columns[265:])
tokyo_onehot = tokyo_onehot[fixed_columns]

In [44]:
print(tokyo_onehot.shape)
tokyo_onehot.head()

(50004, 440)


Unnamed: 0,Movie Theater,ATM,Acai House,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,...,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yakitori Restaurant,Yoga Studio,Yoshoku Restaurant,Yunnan Restaurant,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [45]:
tokyo_grouped = tokyo_onehot.groupby('Neighborhood').mean().reset_index()

In [46]:
# confirm the size
tokyo_grouped.shape

(1441, 440)

In [47]:
# print each neighborhood along with the top 5 most common venues

num_top_venues = 5

tokyo_grouped['Neighborhood'] = tokyo_grouped['Neighborhood'].apply(str)

for hood in tokyo_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tokyo_grouped[tokyo_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aburadai----
               venue  freq
0  Convenience Store  0.33
1  Food & Drink Shop  0.11
2       Intersection  0.11
3       Concert Hall  0.11
4               Park  0.11


----Adachi----
                venue  freq
0   Convenience Store  0.15
1  Donburi Restaurant  0.10
2         Supermarket  0.05
3      Ice Cream Shop  0.05
4        Dessert Shop  0.05


----Agebacho----
                 venue  freq
0   Italian Restaurant  0.12
1  Japanese Restaurant  0.10
2             Sake Bar  0.06
3    French Restaurant  0.05
4           Steakhouse  0.03


----Aiharamachi----
                       venue  freq
0          Convenience Store  0.20
1                   Pharmacy  0.13
2         Donburi Restaurant  0.13
3                 Steakhouse  0.07
4  Japanese Curry Restaurant  0.07


----Aioicho----
                venue  freq
0   Convenience Store  0.27
1        Intersection  0.13
2         Bus Station  0.07
3            Bus Stop  0.07
4  Chinese Restaurant  0.07


----Aizumicho----
     

Now let's put that into a *pandas* dataframe.

In [48]:
# write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [49]:
# create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tokyo_grouped['Neighborhood']

for ind in np.arange(tokyo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tokyo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aburadai,Convenience Store,Intersection,Park,Indian Restaurant,Concert Hall,Food & Drink Shop,Udon Restaurant,Zoo Exhibit,Farm,Fast Food Restaurant
1,Adachi,Convenience Store,Donburi Restaurant,Restaurant,Bakery,Ice Cream Shop,Noodle House,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Dessert Shop
2,Agebacho,Italian Restaurant,Japanese Restaurant,Sake Bar,French Restaurant,Soba Restaurant,Bar,Yakitori Restaurant,Ramen Restaurant,Coffee Shop,Kaiseki Restaurant
3,Aiharamachi,Convenience Store,Pharmacy,Donburi Restaurant,Intersection,Video Store,Food & Drink Shop,Bus Stop,Park,Auto Garage,Steakhouse
4,Aioicho,Convenience Store,Intersection,Chinese Restaurant,Bus Stop,Bus Station,Auto Garage,Hobby Shop,Liquor Store,Grocery Store,Golf Driving Range


In [50]:
# confirm the size
print(neighborhoods_venues_sorted.shape)
print(tokyo_grouped.shape)

(1441, 11)
(1441, 440)


Now we have the dataframe ready for cluster analysis. I will run K-means clustering analysis to segment the neighborhoods into 10 clusters. 

In [51]:
# set number of clusters
kclusters = 10

tokyo_grouped_clustering = tokyo_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tokyo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 5, 1, 7, 7, 1, 5, 1, 4, 0], dtype=int32)

In [52]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# make the wording consistent: change neighborhood to Neighborhood, otherwise they cannot be successfully merged
tokyo_merged = df3
tokyo_merged.columns = ['Neighborhood', 'Borough', 'Zipcode', 'Latitude', 'Longitude']
tokyo_merged['Neighborhood'] = tokyo_merged['Neighborhood'].apply(str) # import to convert to str before merging!

# merge tokyo_grouped with tokyo_data to add latitude/longitude for each neighborhood
tokyo_merged = tokyo_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# drop the rows where the cluster labels are NaN so that it can be visuallized later
tokyo_merged = tokyo_merged.dropna(how='any')

# convert the cluster labels from float to integer so that it can be visualized later
tokyo_merged['Cluster Labels'] = tokyo_merged['Cluster Labels'].astype(int)

# check the final dataframe
tokyo_merged.head()

Unnamed: 0,Neighborhood,Borough,Zipcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adachi,Adachiku,120-0015,35.7632,139.8076,5,Convenience Store,Donburi Restaurant,Restaurant,Bakery,Ice Cream Shop,Noodle House,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Dessert Shop
1,Aoi(1-3Chome),Adachiku,120-0012,35.7651,139.8129,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery
2,Aoi(4-6Chome),Adachiku,121-0012,35.7874,139.8195,0,Convenience Store,Bus Stop,Japanese Restaurant,Intersection,Furniture / Home Store,Café,Fast Food Restaurant,Motorcycle Shop,Bakery,Supermarket
3,Ayase,Adachiku,120-0005,35.7691,139.8264,0,Convenience Store,Dessert Shop,Sushi Restaurant,Okonomiyaki Restaurant,Motel,Park,Gym,Baseball Field,BBQ Joint,Video Store
4,Chuohoncho(1-2Chome),Adachiku,120-0011,35.7651,139.8129,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery


So far we have conducted clustering analysis on Tokyo neighborhoods and segemented them into 10 clusters based on their features. Neighborhoods with close features are grouped together. The results can help rental agencies identify and recommend suitable neighborhoods based on clients' needs. The results will be displayed in Results section.

### **RESULTS AND DISCUSSION**

First let's visualize the resulting clusters and check how the ten clusters are distributed on the map.

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tokyo_merged['Latitude'], tokyo_merged['Longitude'], tokyo_merged['Neighborhood'], tokyo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

# display the map
map_clusters

In [54]:
tokyo_merged.head()

Unnamed: 0,Neighborhood,Borough,Zipcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adachi,Adachiku,120-0015,35.7632,139.8076,5,Convenience Store,Donburi Restaurant,Restaurant,Bakery,Ice Cream Shop,Noodle House,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Dessert Shop
1,Aoi(1-3Chome),Adachiku,120-0012,35.7651,139.8129,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery
2,Aoi(4-6Chome),Adachiku,121-0012,35.7874,139.8195,0,Convenience Store,Bus Stop,Japanese Restaurant,Intersection,Furniture / Home Store,Café,Fast Food Restaurant,Motorcycle Shop,Bakery,Supermarket
3,Ayase,Adachiku,120-0005,35.7691,139.8264,0,Convenience Store,Dessert Shop,Sushi Restaurant,Okonomiyaki Restaurant,Motel,Park,Gym,Baseball Field,BBQ Joint,Video Store
4,Chuohoncho(1-2Chome),Adachiku,120-0011,35.7651,139.8129,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery


Now let's examine the 10 clusters of neighborhoods one by one and make recommendations based on the features of each cluster.

This is the first cluster. The neighborhoods in this cluster are a good choice for people who especially like bakery and Donburi restaurants. People who like to live near convenience stores will also like this cluster.

In [55]:
# cluster 1
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     0, tokyo_merged.columns[[0]+ [1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Aoi(4-6Chome),Adachiku,0,Convenience Store,Bus Stop,Japanese Restaurant,Intersection,Furniture / Home Store,Café,Fast Food Restaurant,Motorcycle Shop,Bakery,Supermarket
3,Ayase,Adachiku,0,Convenience Store,Dessert Shop,Sushi Restaurant,Okonomiyaki Restaurant,Motel,Park,Gym,Baseball Field,BBQ Joint,Video Store
5,Chuohoncho(3-5Chome),Adachiku,0,Convenience Store,Bus Stop,Japanese Restaurant,Intersection,Furniture / Home Store,Café,Fast Food Restaurant,Motorcycle Shop,Bakery,Supermarket
16,Iko,Adachiku,0,Convenience Store,Japanese Restaurant,Okonomiyaki Restaurant,Pharmacy,Supermarket,Italian Restaurant,Plaza,Grocery Store,Mobile Phone Shop,Deli / Bodega
26,Kojiyahoncho,Adachiku,0,Convenience Store,Bookstore,Deli / Bodega,Zoo Exhibit,Fish Market,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market
28,Minamihanahata,Adachiku,0,Convenience Store,Japanese Restaurant,Climbing Gym,Hobby Shop,Soba Restaurant,Ramen Restaurant,Supermarket,Discount Store,Fabric Shop,Exhibit
31,Motokihigashimachi,Adachiku,0,Convenience Store,Ramen Restaurant,BBQ Joint,Pharmacy,Grocery Store,Bus Stop,Noodle House,Factory,Falafel Restaurant,Filipino Restaurant
49,Oyata,Adachiku,0,Convenience Store,Restaurant,Supermarket,Food & Drink Shop,Furniture / Home Store,Kids Store,Deli / Bodega,Farm,Field,Fast Food Restaurant
50,Rokucho,Adachiku,0,Convenience Store,Bus Stop,Bakery,Supermarket,Fast Food Restaurant,Café,Train Station,Czech Restaurant,Fish Market,Falafel Restaurant
52,Sano,Adachiku,0,Convenience Store,Restaurant,Japanese Restaurant,Grocery Store,Park,Ramen Restaurant,Wagashi Place,Japanese Curry Restaurant,Supermarket,Field


The neighborhoods in the second cluster will be loved by people who enjoy Japanese food, including Ramen, Sushi, Sake, Okonomiyaki, and others. People who like to live near convenience stores will also like this cluster.

In [56]:
# cluster 2
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     1, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Kodo,Adachiku,1,Supermarket,Donburi Restaurant,Noodle House,BBQ Joint,Grocery Store,Park,Dessert Shop,Discount Store,Factory,Falafel Restaurant
65,Senjunakacho,Adachiku,1,Convenience Store,Park,Japanese Restaurant,Café,Sake Bar,Ramen Restaurant,Platform,Vietnamese Restaurant,Intersection,Cantonese Restaurant
69,Senjusekiyacho,Adachiku,1,Intersection,Train Station,Convenience Store,Shopping Plaza,Event Space,Park,Donburi Restaurant,Soccer Field,Coffee Shop,Athletics & Sports
86,Yanagihara,Adachiku,1,Intersection,Convenience Store,Park,Ramen Restaurant,Japanese Restaurant,Train Station,Indian Restaurant,Spa,Café,Sake Bar
114,Otsu,Akiruno,1,River,Café,Campground,Diner,Food Court,Food & Drink Shop,Exhibit,Fabric Shop,Factory,Food Truck
144,Tsutsujigaoka,Akishima,1,BBQ Joint,Food Court,Udon Restaurant,Electronics Store,Fast Food Restaurant,Toy / Game Store,Sushi Restaurant,Sandwich Place,Pet Store,Donburi Restaurant
161,Mejirodai,Bunkyoku,1,Japanese Restaurant,French Restaurant,Bus Stop,Convenience Store,Café,Chinese Restaurant,Playground,Sushi Restaurant,Supermarket,Garden
163,Nezu,Bunkyoku,1,Café,Japanese Restaurant,Soba Restaurant,Sake Bar,Udon Restaurant,Hotel,Wagashi Place,Miscellaneous Shop,Tree,Fish Market
168,Suido,Bunkyoku,1,Intersection,Café,Japanese Restaurant,Unagi Restaurant,Convenience Store,Sake Bar,Museum,Soba Restaurant,Grocery Store,Dumpling Restaurant
169,Yushima,Bunkyoku,1,Café,Ramen Restaurant,Convenience Store,Bar,Hotel,Wagashi Place,Noodle House,Sake Bar,Art Gallery,Udon Restaurant


The neighborhoods in the third cluster are suitable for people who like cooking by themselves since there are many supermarkets and grocery stores nearby.

In [57]:
# cluster 3
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     2, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Kitakaheicho,Adachiku,2,Bus Stop,Park,Zoo Exhibit,Fish & Chips Shop,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
29,Miyagi,Adachiku,2,Convenience Store,Bus Stop,Intersection,Chinese Restaurant,Grocery Store,Park,Film Studio,Fabric Shop,Factory,Falafel Restaurant
33,Motokiminamimachi,Adachiku,2,Convenience Store,Intersection,Chinese Restaurant,Bus Stop,Film Studio,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm
35,Mutsuki,Adachiku,2,Convenience Store,Park,Bus Stop,Donburi Restaurant,Zoo Exhibit,Film Studio,Fabric Shop,Factory,Falafel Restaurant,Farm
36,Nakagawa,Adachiku,2,Steakhouse,Convenience Store,Bakery,Ramen Restaurant,Park,Bus Stop,Zoo Exhibit,Film Studio,Fabric Shop,Factory
38,Nishiaraihoncho,Adachiku,2,Convenience Store,Bus Stop,Tonkatsu Restaurant,Shopping Mall,Grocery Store,Ramen Restaurant,Furniture / Home Store,Drugstore,Park,Donburi Restaurant
46,Odai,Adachiku,2,Bus Stop,Convenience Store,Boat or Ferry,Chinese Restaurant,Grocery Store,Café,History Museum,Baseball Stadium,Flower Shop,Food
54,Sekibara,Adachiku,2,Convenience Store,Noodle House,Bus Stop,Korean Restaurant,Pharmacy,Discount Store,Steakhouse,Factory,Falafel Restaurant,Film Studio
130,Gochicho,Akishima,2,Bus Station,Convenience Store,Bus Stop,Udon Restaurant,History Museum,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm
139,Ogamicho,Akishima,2,Pharmacy,Convenience Store,Historic Site,Bus Stop,Zoo Exhibit,Film Studio,Exhibit,Fabric Shop,Factory,Falafel Restaurant


The neighborhoods in the fourth cluster are perfect for people who love beach, zoo, farmers market and fish market.

In [58]:
# cluster 4
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     3, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
262,Nihombashi,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
264,Nihombashihakozakicho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
266,Nihombashihisamatsucho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
270,Nihombashikabutocho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
273,Nihombashikoamicho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
274,Nihombashikobunacho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
275,Nihombashikodenmacho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
277,Nihombashinakasu,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
279,Nihombashiodenmacho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
280,Nihombashitomizawacho,Chuoku,3,Beach,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


The fifth cluster is a good fit for these groups of people: 1) people who like and use electronics frequently, 2) coffee lovers since there are many cafes and coffee stores, and 3) people who enjoy different kinds of cuisines such as Italian and French.

In [59]:
# cluster 5
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     4, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Hanahata,Adachiku,4,Park,Bus Stop,Convenience Store,Discount Store,Supermarket,Sushi Restaurant,Shopping Mall,Drugstore,Pharmacy,Italian Restaurant
25,Kojiya,Adachiku,4,Plaza,Park,Dog Run,Clothing Store,Tennis Court,Bus Station,Field,Light Rail Station,Fishing Store,Szechuan Restaurant
62,Senjumidoricho,Adachiku,4,Park,Intersection,Convenience Store,Liquor Store,Soccer Field,Zoo Exhibit,Filipino Restaurant,Exhibit,Fabric Shop,Factory
74,Shinden,Adachiku,4,Convenience Store,Park,Shipping Store,Golf Course,Bus Stop,Zoo Exhibit,Filipino Restaurant,Exhibit,Fabric Shop,Factory
135,Mihoricho,Akishima,4,Convenience Store,Food Truck,Park,Hot Spring,Zoo Exhibit,Film Studio,Exhibit,Fabric Shop,Factory,Falafel Restaurant
243,Nomizu,Chofu,4,Park,Bus Station,Snack Place,Soba Restaurant,Garden,Plaza,Historic Site,Tennis Court,Field,Exhibit
330,Rinkaicho,Edogawaku,4,Bus Station,Convenience Store,Park,Bus Stop,Train Station,Toll Booth,Theme Park Ride / Attraction,Golf Driving Range,Shopping Plaza,Market
331,Seishincho,Edogawaku,4,Park,Bus Station,Convenience Store,Playground,Plaza,Gym / Fitness Center,Harbor / Marina,Fast Food Restaurant,Bus Stop,Track Stadium
347,Koyanagicho,Fuchu,4,Intersection,Convenience Store,Park,Sushi Restaurant,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm
370,Kitadenen,Fussa,4,Grocery Store,Brewery,Intersection,Park,Sushi Restaurant,Rock Club,Convenience Store,Indian Restaurant,Bus Station,Factory


The following groups of people can be recommended with the neighborhoods in the sixth cluster: 1) People who take bus a lot, 2) people who like Korean and Chinese food, 3) people who like parks, museums, or art gallaries.

In [60]:
# cluster 6
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     5, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adachi,Adachiku,5,Convenience Store,Donburi Restaurant,Restaurant,Bakery,Ice Cream Shop,Noodle House,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Dessert Shop
1,Aoi(1-3Chome),Adachiku,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery
4,Chuohoncho(1-2Chome),Adachiku,5,Donburi Restaurant,Convenience Store,Noodle House,Discount Store,Park,Pharmacy,Café,Chinese Restaurant,Train Station,Bakery
7,Higashiayase,Adachiku,5,Convenience Store,Sushi Restaurant,Ramen Restaurant,Music Store,Park,Playground,Steakhouse,Supermarket,Bus Stop,Golf Driving Range
8,Higashihokima,Adachiku,5,Convenience Store,Japanese Restaurant,Drugstore,Supermarket,Food Truck,Discount Store,Shoe Store,Tennis Stadium,Fast Food Restaurant,Café
9,Higashiiko,Adachiku,5,Convenience Store,Pharmacy,Sushi Restaurant,Baseball Field,Pet Store,Tennis Court,Clothing Store,Garden Center,Bookstore,Deli / Bodega
10,Higashirokugatsucho,Adachiku,5,Ramen Restaurant,Convenience Store,Intersection,Bus Stop,Discount Store,Tonkatsu Restaurant,Café,Chinese Restaurant,Dumpling Restaurant,BBQ Joint
11,Hinodecho,Adachiku,5,Convenience Store,Coffee Shop,Soba Restaurant,Ramen Restaurant,Restaurant,Supermarket,Bakery,BBQ Joint,Café,Spa
15,Hozukacho,Adachiku,5,Convenience Store,Ramen Restaurant,Dumpling Restaurant,Korean Restaurant,Motorcycle Shop,Auto Garage,Donburi Restaurant,Asian Restaurant,Café,Discount Store
19,Iriyamachi,Adachiku,5,Convenience Store,Ramen Restaurant,Restaurant,Sushi Restaurant,Udon Restaurant,Discount Store,Pool,Coffee Shop,Arcade,Toll Booth


Cluster 7 is for people who want to live near everything and wish to live close to public transportation.

In [61]:
# cluster 7
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     6, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
379,Asahicho,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
381,Bessho,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
384,Hachimancho,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
388,Higashinakano,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
416,Matsugaya,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
431,Nakacho,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
447,Otsuka,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
460,Tairamachi,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
465,Tamachi,Hachioji,6,Pharmacy,Shoe Store,Shipping Store,Chinese Restaurant,Kids Store,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1017,Honcho,Ome,6,Chinese Restaurant,Zoo Exhibit,Entertainment Service,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market


Sports lovers will like the neighborhoods in the eighth cluster since there are many parks, gold courts, and gyms in those neighborhoods.

In [62]:
# cluster 8
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     7, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Hirano,Adachiku,7,Convenience Store,Intersection,Bus Stop,Motorcycle Shop,Ramen Restaurant,Japanese Restaurant,Furniture / Home Store,Fishing Store,Electronics Store,Donburi Restaurant
13,Hitotsuya,Adachiku,7,Convenience Store,Intersection,Ramen Restaurant,Arcade,Thrift / Vintage Store,Supermarket,Bus Stop,Donburi Restaurant,Auto Garage,Soup Place
14,Hokima,Adachiku,7,Convenience Store,Intersection,Park,Men's Store,Sushi Restaurant,Bus Stop,Food Truck,Supermarket,Dessert Shop,BBQ Joint
17,Ikohoncho,Adachiku,7,Convenience Store,Intersection,BBQ Joint,Bus Stop,Bus Station,Baseball Field,Park,Pharmacy,Camera Store,Udon Restaurant
18,Iriya,Adachiku,7,Convenience Store,Ramen Restaurant,BBQ Joint,Intersection,Park,Pharmacy,Ice Cream Shop,Golf Driving Range,Café,Farmers Market
24,Kohoku,Adachiku,7,Convenience Store,Intersection,Restaurant,Soba Restaurant,Liquor Store,Motorcycle Shop,Sushi Restaurant,Supermarket,Coffee Shop,Donburi Restaurant
34,Motokinishimachi,Adachiku,7,Intersection,Spa,Convenience Store,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market
40,Nishiayase,Adachiku,7,Convenience Store,Restaurant,Dessert Shop,Grocery Store,Donburi Restaurant,Park,Historic Site,Pharmacy,Intersection,Floating Market
41,Nishihokima,Adachiku,7,Convenience Store,Intersection,Plaza,Grocery Store,Donburi Restaurant,Pet Store,Pool,Japanese Restaurant,Sushi Restaurant,Pastry Shop
43,Nishiikocho,Adachiku,7,Intersection,Donburi Restaurant,Campground,Chinese Restaurant,Clothing Store,Bus Station,Convenience Store,Food & Drink Shop,Food,Fabric Shop


People who have kids can be recommended with neighborhoods in the ninth cluster. There are many kids stores, playgrounds, supermarkets in those neighborhoods.

In [63]:
# cluster 9
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     8, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
101,Ina,Akiruno,8,Golf Course,Zoo Exhibit,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field
124,Uenodai,Akiruno,8,Golf Course,Zoo Exhibit,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field
126,Yamada,Akiruno,8,Golf Course,Zoo Exhibit,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field
127,Yokosawa,Akiruno,8,Golf Course,Zoo Exhibit,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field


Neighborhoods in the last cluster are perfect for golf players, and people who enjoy nature and suburban life.

In [64]:
# cluster 10
tokyo_merged.loc[tokyo_merged['Cluster Labels'] == 
                     9, tokyo_merged.columns[[0]+[1] + list(range(5, tokyo_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Motoki,Adachiku,9,Convenience Store,Noodle House,Ramen Restaurant,Zoo Exhibit,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm
452,Sandamachi,Hachioji,9,Convenience Store,Ramen Restaurant,Zoo Exhibit,Fish Market,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market
493,Hanenishi,Hamura,9,Convenience Store,Noodle House,Restaurant,Tonkatsu Restaurant,Steakhouse,Supermarket,Intersection,African Restaurant,Ethiopian Restaurant,Exhibit
542,Hino,Hino,9,Convenience Store,Supermarket,Noodle House,History Museum,Film Studio,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant
544,Hinohonmachi,Hino,9,Convenience Store,Supermarket,Noodle House,History Museum,Film Studio,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant
547,Kamida,Hino,9,Convenience Store,Supermarket,Noodle House,History Museum,Film Studio,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant
548,Kawabehorinouchi,Hino,9,Convenience Store,Supermarket,Noodle House,History Museum,Film Studio,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant
552,Miya,Hino,9,Convenience Store,Supermarket,Noodle House,History Museum,Film Studio,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant
797,Hirobakama,Machida,9,Convenience Store,Noodle House,Ramen Restaurant,Café,Park,Zoo Exhibit,Film Studio,Fabric Shop,Factory,Falafel Restaurant
798,Hirohakamamachi,Machida,9,Convenience Store,Noodle House,Ramen Restaurant,Café,Park,Zoo Exhibit,Film Studio,Fabric Shop,Factory,Falafel Restaurant


### **CONCLUSION**

This projects segments and clusters the neighborhoods in Tokyo into 10 clusters based on their features utilizing Tokyo neighborhood data and Foursquare location data which provides us with the venue information of the neighborhoods in Tokyo. The results can help apartment rental agencies make recommendations of neighborhoods that match the needs of customers.