# Capstone Final Project- Battle of the Neighborhoods-New York City vs Toronto

## Introduction

When trying to find the best places to live, it's always a good idea to compare cities and if possible, to compare neighborhoods. After all, when you go to buy a car or a house or any big ticket item, you usually try out a few models or visit a few homes before you decide. The same tactic applies to finding the best places to live. 

There are lots of ways that can help you asses different factors when comparing two cities, some are listed as following:
1. **Overall Comparison**: This is a comparison of the same factors for each city, resulting in having a general overview of the two cities. Some of the popular factors include population, cost of living, average rent, crime rate, tax rates, and air quality.

2. **Crime Rates**: Here, the comparison is made to know the crime rates of two cities, then measures them both against the national statistics.

3. **Cost of Living and Salary Comparison**: This takes into account comparing salaries and cost of living within cities for a decision to be made. Some factors for this comparison includes statistics on food, housing, utilities, transportation and more. This is a useful way to find out if your salary will measure up in the new city.

4. **Compare Schools**: This is helpful in finding the best school in a vicinity by doing a comparison between different places. It mostly takes into consideration test scores and teacher and student ratios, including the teacher's experience of the lists schools in the city of your choice.

5. **Neighborhood Comparison**: This looks at neighborhood comparison and helps one choose the best place to live within any given city. These sites allow you to see some pretty interesting facts about the various communities.

# Problem

New York City and Toronto are both top cities in the world with diversity in many ways. In this project, we want explore how much they are similar or dissimilar in aspects from a tourist point of view regarding food, accommodation and many other aspects. 

In order to do this, we will reach the services of Foursquare API for exploring the data of two cities, in terms of their neighborhoods. The data also includes information such as restaurants,coffee shops, theaters etc. for each eighborhood. In this project, we will select one brorough from each city for better analyzing. We will select Manhattan from New York City and Downtown Toronto from the city of Toronto. We will apply algorithms of clustering to help segmentation with similar aspects based on each neighborhood.


# Methodology

In both cases, we will arrange the dataset in regard to our requirements, that is to apply steps of data processing, like eliminating "NA" values, combine eighborhoods which have same geographical coordinates and sort against the targeted borough etc. For data verification and further exploration, we use Foursquare API to get the coordinates of Downtown Toronto/Manhattan and explore their neighborhoods. The neighborhoods are further characterized as venues and venue categories.

# Data Preprocessing for Downton Toronto

## import necessary libraries

In [27]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot
import seaborn as sns

import json
%pip install geopy
from geopy.geocoders import Nominatim
import requests
from urllib.request import urlopen
from pandas.io.json import json_normalize 


import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

from bs4 import BeautifulSoup
import ssl
import csv

print('Libraries imported.')


Note: you may need to restart the kernel to use updated packages.
Libraries imported.


In [7]:
conda install -c anaconda beautifulsoup4

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.11

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    soupsieve-1.9.2            |           py36_0          61 KB  anaconda
    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda
    beautifulsoup4-4.8.0       |           py36_0         147 KB  anaconda
    certifi-2019.6.16          |           py36_1         156 KB  anaconda
    ------------------------------------------------------------
                                           Total:         5.4 MB

The following NEW packages will be INSTALLED:

    soupsieve:      1.9.2-py36_0      anaconda   

The following packages will be UPDATED

In [29]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source)

In [146]:
table= soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')
data = []
for row in table_rows:
    data.append([t.text.strip() for t in row.find_all('td')])

Neighbourhood = pd.DataFrame(data, columns=['Postcode', 'Borough', 'Neighbourhood'])
df = Neighbourhood[~Neighbourhood['Postcode'].isnull()]
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
10,M8A,Not assigned,Not assigned


In [239]:
df.drop(df[df['Borough']=="Not assigned"].index,axis=0, inplace=True)
df = df.reset_index()
df.drop(['index'],axis=1,inplace=True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [148]:
df1= pd.read_csv('https://cocl.us/Geospatial_data')

In [154]:
df1.rename(columns={'Postal Code':'Postcode'},inplace=True)

In [155]:
frames=[df,df1]
frames=pd.concat(frames, axis=1, sort=False)

In [157]:
merge_columns=pd.merge(df, df1, left_on='Postcode', right_on='Postcode')
merge_columns.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763


In [289]:
#sorting
downtown_toronto_data = merge_columns[merge_columns['Borough'] == 'Downtown Toronto'].reset_index(drop=True)

In [290]:
# eliminate 'Postcode' column
downtown_toronto_data=downtown_toronto_data.drop(['Postcode'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,Downtown Toronto,Regent Park,43.65426,-79.360636
2,Downtown Toronto,Ryerson,43.657162,-79.378937
3,Downtown Toronto,Garden District,43.657162,-79.378937
4,Downtown Toronto,St. James Town,43.651494,-79.375418


# Data Preprocessing for Manhattan NYC

In [80]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [81]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [87]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
df2 = pd.DataFrame(columns=column_names)

In [88]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    df2 = df2.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [89]:
df2.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [90]:
manhattan_data = df2[df2['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## Fousquare API

In [92]:
CLIENT_ID = 'ONCECK1MKHTLEPSFYLBMITIC1XXMG4BTQY5RAKGR5KPVAM1I'
CLIENT_SECRET = 'MJTFUKB0IPG0FZJIC4IOA3ZXDNCM5002JKYYX5KP3E5HDQHL'

In [93]:
VERSION = '20180604'
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:ONCECK1MKHTLEPSFYLBMITIC1XXMG4BTQY5RAKGR5KPVAM1I
CLIENT_SECRET:MJTFUKB0IPG0FZJIC4IOA3ZXDNCM5002JKYYX5KP3E5HDQHL


In [94]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

  after removing the cwd from sys.path.


Downtown Toronto latitude 43.655115 & longitude -79.380219


In [95]:
# get the geographical coordinates of Manhattan NYC.
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


# Visualization before clustering

## Downtown Toronto

In [159]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto

In [162]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

## Manhattan NYC

In [164]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [165]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_manhattan)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_manhattan

# Toronto Venues Analysis

In [171]:
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [173]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighbourhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Harbourfront
Regent Park
Ryerson
Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide
King
Richmond
Harbourfront East
Toronto Islands
Union Station
Design Exchange
Toronto Dominion Centre
Commerce Court
Victoria Hotel
Harbord
University of Toronto
Chinatown
Grange Park
Kensington Market
CN Tower
Bathurst Quay
Island airport
Harbourfront West
King and Spadina
Railway Lands
South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown
St. James Town
First Canadian Place
Underground city
Church and Wellesley


In [240]:
# Let's check the size of the resulting dataframe
downtown_toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [175]:
downtown_toronto_venues.shape

(686, 7)

In [180]:
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,20,20,20,20,20,20
Bathurst Quay,15,15,15,15,15,15
Berczy Park,20,20,20,20,20,20
CN Tower,15,15,15,15,15,15
Cabbagetown,20,20,20,20,20,20
Central Bay Street,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Christie,16,16,16,16,16,16
Church and Wellesley,20,20,20,20,20,20
Commerce Court,20,20,20,20,20,20


In [181]:
print('There are {} uniques categories in downton toronto.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 114 uniques categories in downton toronto.


In [182]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Wine Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,...,Taco Place,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [233]:
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                  venue  freq
0      Asian Restaurant  0.10
1            Steakhouse  0.10
2                 Plaza  0.05
3                   Bar  0.05
4  Gym / Fitness Center  0.05


----Bathurst Quay----
                venue  freq
0     Airport Service  0.20
1    Airport Terminal  0.13
2             Airport  0.07
3  Airport Food Court  0.07
4        Airport Gate  0.07


----Berczy Park----
               venue  freq
0     Farmers Market  0.10
1  French Restaurant  0.05
2             Bakery  0.05
3        Coffee Shop  0.05
4       Cocktail Bar  0.05


----CN Tower----
                venue  freq
0     Airport Service  0.20
1    Airport Terminal  0.13
2             Airport  0.07
3  Airport Food Court  0.07
4        Airport Gate  0.07


----Cabbagetown----
           venue  freq
0     Restaurant  0.10
1           Café  0.10
2      Gastropub  0.05
3        Butcher  0.05
4  Deli / Bodega  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  

In [237]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Asian Restaurant,Steakhouse,Seafood Restaurant,Plaza,Bar,Pizza Place,Coffee Shop,Hotel,Speakeasy,Lounge
1,Bathurst Quay,Airport Service,Airport Terminal,Boutique,Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina
2,Berczy Park,Farmers Market,Park,Seafood Restaurant,Fountain,Vegetarian / Vegan Restaurant,Concert Hall,Jazz Club,Coffee Shop,Liquor Store,Cocktail Bar
3,CN Tower,Airport Service,Airport Terminal,Boutique,Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina
4,Cabbagetown,Café,Restaurant,Butcher,Bakery,Caribbean Restaurant,Park,Pet Store,Pub,Jewelry Store,Japanese Restaurant
5,Central Bay Street,Coffee Shop,Bubble Tea Shop,Italian Restaurant,Seafood Restaurant,Japanese Restaurant,Ramen Restaurant,Spa,Sushi Restaurant,Park,Tea Room
6,Chinatown,Café,Vietnamese Restaurant,Caribbean Restaurant,Mexican Restaurant,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Cheese Shop,Cocktail Bar
7,Christie,Grocery Store,Café,Park,Convenience Store,Diner,Athletics & Sports,Italian Restaurant,Baby Store,Nightclub,Coffee Shop
8,Church and Wellesley,Mexican Restaurant,Pub,Restaurant,Coffee Shop,Salon / Barbershop,Japanese Restaurant,Bookstore,Pizza Place,Breakfast Spot,Hobby Shop
9,Commerce Court,Café,Gastropub,Deli / Bodega,Coffee Shop,Gym,Gym / Fitness Center,American Restaurant,Pub,Museum,Bakery


## Clustering Toronto

In [301]:

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=7, random_state=0).fit(downtown_toronto_grouped_clustering)


In [302]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = neighborhoods_venues_sorted
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_
downtown_toronto_merged.head(40)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Adelaide,Asian Restaurant,Steakhouse,Seafood Restaurant,Plaza,Bar,Pizza Place,Coffee Shop,Hotel,Speakeasy,Lounge,0
1,Bathurst Quay,Airport Service,Airport Terminal,Boutique,Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
2,Berczy Park,Farmers Market,Park,Seafood Restaurant,Fountain,Vegetarian / Vegan Restaurant,Concert Hall,Jazz Club,Coffee Shop,Liquor Store,Cocktail Bar,0
3,CN Tower,Airport Service,Airport Terminal,Boutique,Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
4,Cabbagetown,Café,Restaurant,Butcher,Bakery,Caribbean Restaurant,Park,Pet Store,Pub,Jewelry Store,Japanese Restaurant,6
5,Central Bay Street,Coffee Shop,Bubble Tea Shop,Italian Restaurant,Seafood Restaurant,Japanese Restaurant,Ramen Restaurant,Spa,Sushi Restaurant,Park,Tea Room,5
6,Chinatown,Café,Vietnamese Restaurant,Caribbean Restaurant,Mexican Restaurant,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Cheese Shop,Cocktail Bar,6
7,Christie,Grocery Store,Café,Park,Convenience Store,Diner,Athletics & Sports,Italian Restaurant,Baby Store,Nightclub,Coffee Shop,6
8,Church and Wellesley,Mexican Restaurant,Pub,Restaurant,Coffee Shop,Salon / Barbershop,Japanese Restaurant,Bookstore,Pizza Place,Breakfast Spot,Hobby Shop,6
9,Commerce Court,Café,Gastropub,Deli / Bodega,Coffee Shop,Gym,Gym / Fitness Center,American Restaurant,Pub,Museum,Bakery,2


## Toronto Cluster 1 - Asian Restaurant & Farmers Market & Cafe

In [303]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Asian Restaurant,Bar,Pizza Place,Coffee Shop,Hotel,Speakeasy,Lounge,0
2,Farmers Market,Vegetarian / Vegan Restaurant,Concert Hall,Jazz Club,Coffee Shop,Liquor Store,Cocktail Bar,0
12,Café,Burrito Place,Burger Joint,Clothing Store,Pizza Place,Plaza,Coffee Shop,0
20,Asian Restaurant,Bar,Pizza Place,Coffee Shop,Hotel,Speakeasy,Lounge,0
24,Asian Restaurant,Bar,Pizza Place,Coffee Shop,Hotel,Speakeasy,Lounge,0
26,Café,Burrito Place,Burger Joint,Clothing Store,Pizza Place,Plaza,Coffee Shop,0
29,Cocktail Bar,Fountain,Food Truck,Vegetarian / Vegan Restaurant,Farmers Market,Concert Hall,Jazz Club,0


## Toronto Cluster 2 - Airport Service

In [304]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
1,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
3,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
17,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
18,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
21,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
22,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1
27,Airport Service,Airport,Airport Food Court,Airport Gate,Airport Lounge,Coffee Shop,Harbor / Marina,1


## Toronto Cluster 3 - Coffee shop & cafe

In [306]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
9,Café,Gym,Gym / Fitness Center,American Restaurant,Pub,Museum,Bakery,2
10,Coffee Shop,Gym,Gym / Fitness Center,Hotel,Japanese Restaurant,Gastropub,Pub,2
11,Coffee Shop,Seafood Restaurant,Bar,Gym / Fitness Center,Gastropub,American Restaurant,Pub,2
30,Coffee Shop,Gym,Gym / Fitness Center,Hotel,Japanese Restaurant,Gastropub,Pub,2
32,Coffee Shop,Seafood Restaurant,Bar,Gym / Fitness Center,Gastropub,American Restaurant,Pub,2
35,Café,Gym,Gym / Fitness Center,American Restaurant,Pub,Museum,Bakery,2


## Toronto Cluster 4 - cafe

In [308]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
16,Café,New American Restaurant,Performing Arts Venue,Bubble Tea Shop,Lake,Salad Place,Skating Rink,3
31,Café,New American Restaurant,Performing Arts Venue,Bubble Tea Shop,Lake,Salad Place,Skating Rink,3
33,Café,New American Restaurant,Performing Arts Venue,Bubble Tea Shop,Lake,Salad Place,Skating Rink,3


## Totonto Cluster 5 -Park

In [310]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
25,Park,Convenience Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,4


## Toronto Cluster 6- coffee shop

In [312]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 5, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
5,Coffee Shop,Japanese Restaurant,Ramen Restaurant,Spa,Sushi Restaurant,Park,Tea Room,5
15,Coffee Shop,Park,Pub,Restaurant,Chocolate Shop,Performing Arts Venue,Farmers Market,5
23,Coffee Shop,Park,Pub,Restaurant,Chocolate Shop,Performing Arts Venue,Farmers Market,5


## Toronto Cluster 7 - cafe & bookstore & Gastropub

In [314]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 6, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
4,Café,Caribbean Restaurant,Park,Pet Store,Pub,Jewelry Store,Japanese Restaurant,6
6,Café,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Cheese Shop,Cocktail Bar,6
7,Grocery Store,Diner,Athletics & Sports,Italian Restaurant,Baby Store,Nightclub,Coffee Shop,6
8,Mexican Restaurant,Salon / Barbershop,Japanese Restaurant,Bookstore,Pizza Place,Breakfast Spot,Hobby Shop,6
13,Café,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Cheese Shop,Cocktail Bar,6
14,Bookstore,Sushi Restaurant,Beer Bar,Sandwich Place,Chinese Restaurant,Italian Restaurant,Comfort Food Restaurant,6
19,Café,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Cheese Shop,Cocktail Bar,6
28,Gastropub,Coffee Shop,Italian Restaurant,Hotel,Caribbean Restaurant,Food Truck,Butcher,6
34,Bookstore,Sushi Restaurant,Beer Bar,Sandwich Place,Chinese Restaurant,Italian Restaurant,Comfort Food Restaurant,6


# Manhattan NYC Venues Analysis

In [317]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [318]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [319]:
print('There are {} uniques categories in Manhattan NYC.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 209 uniques categories in Manhattan NYC.


In [320]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [321]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [322]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
           venue  freq
0           Park  0.15
1     Food Court  0.10
2  Memorial Site  0.10
3      BBQ Joint  0.05
4     Smoke Shop  0.05


----Carnegie Hill----
                  venue  freq
0                   Spa  0.10
1    Italian Restaurant  0.10
2  Gym / Fitness Center  0.10
3           Coffee Shop  0.10
4            Bagel Shop  0.05


----Central Harlem----
                 venue  freq
0   African Restaurant  0.10
1  American Restaurant  0.10
2    French Restaurant  0.10
3            Jazz Club  0.05
4                  Bar  0.05


----Chelsea----
                venue  freq
0             Theater  0.10
1  Italian Restaurant  0.10
2           Nightclub  0.10
3        Cupcake Shop  0.05
4           Speakeasy  0.05


----Chinatown----
                venue  freq
0        Noodle House  0.10
1  Chinese Restaurant  0.10
2      Sandwich Place  0.10
3         Pizza Place  0.05
4              Bakery  0.05


----Civic Center----
                   venue  freq
0    

In [323]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [324]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Food Court,Memorial Site,Gym,Smoke Shop,Shopping Mall,Burrito Place,Boat or Ferry,Sandwich Place,Coffee Shop
1,Carnegie Hill,Coffee Shop,Spa,Gym / Fitness Center,Italian Restaurant,Community Center,Shoe Store,Café,Gourmet Shop,Bagel Shop,Gym
2,Central Harlem,French Restaurant,African Restaurant,American Restaurant,Ethiopian Restaurant,Music Venue,Beer Bar,Bar,Juice Bar,Gym / Fitness Center,Bagel Shop
3,Chelsea,Theater,Italian Restaurant,Nightclub,Indian Restaurant,Ice Cream Shop,French Restaurant,Cupcake Shop,Coffee Shop,Chinese Restaurant,Café
4,Chinatown,Sandwich Place,Chinese Restaurant,Noodle House,Pizza Place,Bakery,Spa,English Restaurant,Museum,Bike Shop,Garden Center
5,Civic Center,Spa,Falafel Restaurant,Yoga Studio,Gym / Fitness Center,French Restaurant,General Entertainment,Sushi Restaurant,Molecular Gastronomy Restaurant,Coffee Shop,Bakery
6,Clinton,Theater,Gym / Fitness Center,Pie Shop,Pizza Place,Comedy Club,Movie Theater,Café,Building,French Restaurant,Sporting Goods Shop
7,East Harlem,Mexican Restaurant,Spanish Restaurant,Thai Restaurant,Pet Store,Pharmacy,Park,Dance Studio,Coffee Shop,Clothing Store,Sandwich Place
8,East Village,Pizza Place,Scandinavian Restaurant,Beer Store,Korean Restaurant,Dog Run,Coffee Shop,Bar,Cheese Shop,Bagel Shop,Japanese Restaurant
9,Financial District,Gym / Fitness Center,Jewelry Store,Coffee Shop,Salad Place,Steakhouse,Doctor's Office,Monument / Landmark,Café,Gym,New American Restaurant


## Clustering Manhattan NYC

In [327]:
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=5, random_state=0).fit(manhattan_grouped_clustering)

In [328]:
manhattan_merged = manhattan_data
manhattan_merged['Cluster Labels'] = kmeans.labels_
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4,Discount Store,Coffee Shop,Yoga Studio,Tennis Stadium,Ice Cream Shop,Gym,Donut Shop,Diner,Department Store,Pharmacy
1,Manhattan,Chinatown,40.715618,-73.994279,0,Sandwich Place,Chinese Restaurant,Noodle House,Pizza Place,Bakery,Spa,English Restaurant,Museum,Bike Shop,Garden Center
2,Manhattan,Washington Heights,40.851903,-73.9369,2,Wine Shop,Park,Café,Ramen Restaurant,Market,Restaurant,Tapas Restaurant,Liquor Store,Burger Joint,Bakery
3,Manhattan,Inwood,40.867684,-73.92121,1,Café,Park,Wine Bar,Yoga Studio,Bakery,Deli / Bodega,Diner,Farmers Market,Mexican Restaurant,Spanish Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Yoga Studio,Mexican Restaurant,Caribbean Restaurant,Cocktail Bar,Pub,Mediterranean Restaurant,Café,Bar,Food Truck,Bakery


In [329]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Manhattan Cluster 1 - commercial/restaurants

In [331]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Sandwich Place,Chinese Restaurant,Noodle House,Pizza Place,Bakery,Spa,English Restaurant,Museum,Bike Shop,Garden Center
8,Upper East Side,Hotel,Jazz Club,Optical Shop,Coffee Shop,Chocolate Shop,French Restaurant,Burrito Place,Spa,Bookstore,Gym / Fitness Center
11,Roosevelt Island,Sandwich Place,Coffee Shop,Noodle House,Greek Restaurant,Farmers Market,Bus Line,School,Liquor Store,Kosher Restaurant,Residential Building (Apartment / Condo)
22,Little Italy,Café,Wine Bar,Sandwich Place,Ice Cream Shop,Spanish Restaurant,Coffee Shop,Salon / Barbershop,Salad Place,Newsstand,Clothing Store


## Manhattan Cluster 2 - Residential/Restaurants

In [332]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Café,Park,Wine Bar,Yoga Studio,Bakery,Deli / Bodega,Diner,Farmers Market,Mexican Restaurant,Spanish Restaurant
4,Hamilton Heights,Yoga Studio,Mexican Restaurant,Caribbean Restaurant,Cocktail Bar,Pub,Mediterranean Restaurant,Café,Bar,Food Truck,Bakery
19,East Village,Pizza Place,Scandinavian Restaurant,Beer Store,Korean Restaurant,Dog Run,Coffee Shop,Bar,Cheese Shop,Bagel Shop,Japanese Restaurant
23,Soho,Women's Store,Men's Store,Salon / Barbershop,Clothing Store,Arts & Crafts Store,Boutique,Miscellaneous Shop,Cycle Studio,Tea Room,Optical Shop
24,West Village,Cosmetics Shop,Italian Restaurant,Gourmet Shop,Coffee Shop,Cocktail Bar,Chinese Restaurant,Bakery,Gastropub,Mediterranean Restaurant,Candy Store
26,Morningside Heights,American Restaurant,Bookstore,Food Truck,Park,Farmers Market,Coffee Shop,Salad Place,Greek Restaurant,Café,Pub
27,Gramercy,Pizza Place,Coffee Shop,Beer Bar,Ice Cream Shop,Italian Restaurant,Gourmet Shop,Liquor Store,Mexican Restaurant,Food Truck,Playground
29,Financial District,Gym / Fitness Center,Jewelry Store,Coffee Shop,Salad Place,Steakhouse,Doctor's Office,Monument / Landmark,Café,Gym,New American Restaurant
35,Turtle Bay,Sushi Restaurant,Karaoke Bar,Martial Arts Dojo,Greek Restaurant,Café,Museum,Lounge,Duty-free Shop,Residential Building (Apartment / Condo),Ramen Restaurant
38,Flatiron,Japanese Restaurant,Gym / Fitness Center,Cycle Studio,Furniture / Home Store,Cheese Shop,Café,Donut Shop,Miscellaneous Shop,Thai Restaurant,Sports Club


## Manhattan Cluster 3 - commerical/ Tourism /City center

In [334]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Wine Shop,Park,Café,Ramen Restaurant,Market,Restaurant,Tapas Restaurant,Liquor Store,Burger Joint,Bakery
7,East Harlem,Mexican Restaurant,Spanish Restaurant,Thai Restaurant,Pet Store,Pharmacy,Park,Dance Studio,Coffee Shop,Clothing Store,Sandwich Place
12,Upper West Side,Italian Restaurant,Peruvian Restaurant,Bakery,Burger Joint,Southern / Soul Food Restaurant,Mediterranean Restaurant,Bookstore,Chinese Restaurant,Movie Theater,Gift Shop
13,Lincoln Square,Indie Movie Theater,Theater,Concert Hall,Performing Arts Venue,Opera House,Fountain,Gift Shop,School,Gym / Fitness Center,Indie Theater
14,Clinton,Theater,Gym / Fitness Center,Pie Shop,Pizza Place,Comedy Club,Movie Theater,Café,Building,French Restaurant,Sporting Goods Shop
15,Midtown,Hotel,Smoke Shop,French Restaurant,Salad Place,Mediterranean Restaurant,Chinese Restaurant,Sporting Goods Shop,Clothing Store,Cuban Restaurant,Coffee Shop
20,Lower East Side,Art Gallery,Japanese Restaurant,Yoga Studio,Juice Bar,Filipino Restaurant,French Restaurant,Pet Café,Café,Mediterranean Restaurant,Chinese Restaurant
21,Tribeca,Park,Wine Shop,Italian Restaurant,Cycle Studio,Coffee Shop,Poke Place,Dog Run,Salad Place,Café,Spa
25,Manhattan Valley,Bar,Deli / Bodega,Yoga Studio,Italian Restaurant,Hawaiian Restaurant,Grocery Store,Korean Restaurant,Furniture / Home Store,Fried Chicken Joint,Mexican Restaurant
31,Noho,Coffee Shop,Cocktail Bar,Rock Club,French Restaurant,Italian Restaurant,Hotel,Gym,Deli / Bodega,Sandwich Place,Southern / Soul Food Restaurant


## Manhattan cluster 4 - Restaurants

In [336]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Harlem,French Restaurant,African Restaurant,American Restaurant,Ethiopian Restaurant,Music Venue,Beer Bar,Bar,Juice Bar,Gym / Fitness Center,Bagel Shop
17,Chelsea,Theater,Italian Restaurant,Nightclub,Indian Restaurant,Ice Cream Shop,French Restaurant,Cupcake Shop,Coffee Shop,Chinese Restaurant,Café


## Manhatan cluster 5 - Residential

In [339]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Discount Store,Coffee Shop,Yoga Studio,Tennis Stadium,Ice Cream Shop,Gym,Donut Shop,Diner,Department Store,Pharmacy
5,Manhattanville,Italian Restaurant,Indian Restaurant,Coffee Shop,Café,Seafood Restaurant,Lounge,Bike Trail,Supermarket,Climbing Gym,Beer Garden
9,Yorkville,Wine Shop,Coffee Shop,Italian Restaurant,Hobby Shop,Gym,Beer Store,Sandwich Place,Dog Run,Liquor Store,Bagel Shop
10,Lenox Hill,Gym,Thai Restaurant,Pizza Place,Middle Eastern Restaurant,Salad Place,Liquor Store,Taco Place,Gift Shop,Gourmet Shop,College Academic Building
16,Murray Hill,Hotel,Cocktail Bar,Jazz Club,Sandwich Place,Speakeasy,Museum,Lounge,Restaurant,Chinese Restaurant,Ramen Restaurant
18,Greenwich Village,Café,French Restaurant,Italian Restaurant,Yoga Studio,Bagel Shop,Caribbean Restaurant,Sushi Restaurant,Beer Bar,Gourmet Shop,Snack Place
28,Battery Park City,Park,Food Court,Memorial Site,Gym,Smoke Shop,Shopping Mall,Burrito Place,Boat or Ferry,Sandwich Place,Coffee Shop
30,Carnegie Hill,Coffee Shop,Spa,Gym / Fitness Center,Italian Restaurant,Community Center,Shoe Store,Café,Gourmet Shop,Bagel Shop,Gym
33,Midtown South,Korean Restaurant,Cosmetics Shop,Coffee Shop,Lingerie Store,Food Truck,Boutique,Golf Course,Clothing Store,Grocery Store,Hotel
39,Hudson Yards,American Restaurant,Hotel,Gym / Fitness Center,Music School,Furniture / Home Store,Supermarket,Residential Building (Apartment / Condo),Cocktail Bar,Public Art,Coffee Shop


# Conclusion

By using the k-means clustering of two cities which are downtown Toronto and Manhattan NYC, we can see both cities share a lot of similar features. Such as they all have a huge percentage of commercial areas, such as restaurants, cafe, coffee shop, theaters and etc, just like all other big cities around world do.


The dissimilarity will be venues in Manhattan NYC are even more densely located and they are more diverse in comparison to downtown Toronto. Since NYC is a diverse city that consists different ethic groups, for example, restaurants in Manhattan will have much options from difference culture. On the other hand, Manhattan NYC seems process more tourism spot than Downtown Toronto does. 
