# BATTLE OF NEIGHBORHOODS - CAPSTONE PROJECT
## (Report + Notebook)

## PROBLEM AND BACKGROUND

As a resident and an avid traveler in North America, one would want to explore major cities of the two main countries. Toronto and New York are famous for tourism and are very diverse. The goal of this project is to compare the neighborhoods of the two cities and determine how similar or dissimilar they are.

Target audience: While every city is unique in its own way, this exploration will help any tourist compare two (or more) cities and pick the city that suits best for them to visit.

## DATA DESCRIPTION

Source: Data from Foursquare API will be used to perform analysis on Manhattan Borough. Whereas, for Downtown Toronto, a table of Toronto’s Boroughs from Wikipedia page has been extracted. 

Data: The data includes information about restaurants, hotels, parks, art galleries, theatres, museums etc. in different neighborhoods of the cities. 

Boroughs: For this particular project, one Borough from each city has been been selected. Manhattan from New York and Downtown Toronto from Toronto.



## METHODOLOGY

Core: A Machine learning technique called "Clustering" will be used to segment neighborhoods with similar features. These features will be given more weightage based on the foot traffic, helping locate tourist spots and hubs. Using this we can judge the similarity/dissimilarity between the two cities. 

Analysis: The data exploration, visualization and analysis is done for both the boroughs in the same way, but separately.

## PREPROCESSING

In [1]:
#importing necessary libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Visualization
import matplotlib.pyplot
import seaborn as sns
# Too see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

import os
os.getcwd()
#print(os.listdir("C:\\Users\\monam/input"))

# Any results you write to the current directory are saved as output.

Libraries imported.


'C:\\Users\\monam'

### Toronto Data

In [2]:
# Link To Extract 
path='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# Read File
df_wiki=pd.read_html(path)
#Check the type
type(df_wiki)
# Call the position where the table is stored
neighborhood=df_wiki[0]
# Rename the Columns
neighborhood.rename(columns={0:'Postcode', 1: 'Borough', 2: 'Neighborhood'}, inplace=True)
# Eliminate the first row
neighborhood=neighborhood.drop([0])
# Eliminate "Not assigned", categorical values from "Borough" Column
neighborhood=neighborhood[neighborhood.Borough !='Not assigned']
# Making DataFrame
neighborhood=pd.DataFrame(neighborhood)
# Merging rows with same Postcode
neighborhood.set_index(['Postal Code','Borough'],inplace=True)
merge_result = neighborhood.groupby(level=['Postal Code','Borough'], sort=False).agg( ','.join)
# Setting the index
serial_wise=merge_result.reset_index()
# Assign the 'Borough' column value to 'Neighborhood' where 'Not assigned' occurs
serial_wise.loc[4, 'Neighborhood']='Queen\'s Park'
# Saving the file for future use!
serial_wise.to_excel('wikipedia_table.xls')
# Showing the Data Frame
df=pd.DataFrame(serial_wise)
df.rename(columns={'Postal Code':'Postcode'},inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Neighborhood
0,M3A,North York,Parkwoods,
1,M4A,North York,Victoria Village,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",
3,M6A,North York,"Lawrence Manor, Lawrence Heights",
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",Queen's Park


In [3]:
# Geographical Coordinates
df1=pd.read_csv("Geospatial_Coordinates.csv")
# Change the Postal Code to Postcode
df1.rename(columns={'Postal Code':'Postcode'},inplace=True)
#Cancatenation
frames=[df,df1]
frames=pd.concat(frames, axis=1, sort=False)
# Merging the two columns on 'Postcode'
merge_columns=pd.merge(df, df1, left_on='Postcode', right_on='Postcode')
# Save the Data Frame
merge_columns.to_csv('neigbors_geographical.csv')
merge_columns.head()


Unnamed: 0,Postcode,Borough,Neighbourhood,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,,43.753259,-79.329656
1,M4A,North York,Victoria Village,,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",,43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",,43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",Queen's Park,43.662301,-79.389494


In [4]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = merge_columns[merge_columns['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postcode' column
downtown_toronto_data=downtown_toronto_data.drop(['Postcode','Neighborhood'], axis=1)
downtown_toronto_data.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Now we will move towards New York Boroughs. We select "Manhattan" as a Borough and anaylze its neighborhoods later
Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. The link to the dataset: [https://geo.nyu.edu/catalog/nyu_2451_34572](https://geo.nyu.edu/catalog/nyu_2451_34572?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

#I have downloaded and preprocessed the data for another lab. Will be using the pre-processed csv file to further analyse the data.

In [5]:
# Loading data

neighborhoods=pd.read_csv("ny_neighborhoods.csv", index_col=0)
# And make sure that the dataset has all 5 boroughs and 306 neighborhoods.
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

neighborhoods.head()

The dataframe has 5 boroughs and 306 neighborhoods.


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [6]:
# Creating new Dataframe manhattan_data
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## Foursquare API 

In [7]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'BSY5VRV4STUKHCNQQJEI4IMVF2EKLLEBAQ5IVNWITQXEZXX3' # your Foursquare ID
CLIENT_SECRET = 'PWYH23B25UMOSER2CW0W4FAIFPC2HVQKQAKJ0HUW4GJCAC1M' # your Foursquare Secret
VERSION = '20180604'
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:BSY5VRV4STUKHCNQQJEI4IMVF2EKLLEBAQ5IVNWITQXEZXX3
CLIENT_SECRET:PWYH23B25UMOSER2CW0W4FAIFPC2HVQKQAKJ0HUW4GJCAC1M


In [8]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

  after removing the cwd from sys.path.


Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [9]:
# Let's get the geographical coordinates of Manhattan.
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


# VISUALIZATION 

### We visualize the data many times at different stages. In the beginning, we visualize the selected borough neighborhoods so that we can get an idea or confirmation regarding the coordinates of that Borough. The second time after clustered the neighborhoods, we visualize the clusters to name them. Assigning the names are very important because it can identify the areas or specific places in each cluster.

## (Before Clustering)

## Downtown Toronto

In [10]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto

In [11]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

## Manhattan

In [12]:
# let's visualizat Manhattan the neighborhoods in it.
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [13]:
# create map of Manhattan using latitude and longitude values
from folium import plugins
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_manhattan)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_manhattan

# ANALYSIS

### We analyze boroughs neighborhoods through one hot encoding (giving ‘1’ if a venue category is there, and ‘0’ in case of venue category is not there). On the basis of one hot encoding, we calculate mean of the frequency of occurrence of each category and picked top ten venues on that basis for each neighborhood. It means the top venues are showing the foot traffic or the more visited places.

## Exploring Neighborhoods in Downtown Toronto

In [14]:
# Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
# Write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [16]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(356, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [17]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,20,20,20,20,20,20
Christie,16,16,16,16,16,16
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"First Canadian Place, Underground city",20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
"Harbourfront East, Union Station, Toronto Islands",20,20,20,20,20,20
"Kensington Market, Chinatown, Grange Park",20,20,20,20,20,20


In [18]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 128 uniques categories.


## Analyzing Each Neighborhood

In [19]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bank,Bar,Basketball Stadium,Beer Bar,Belgian Restaurant,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Distribution Center,Farmers Market,Fish Market,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hobby Shop,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Liquor Store,Lounge,Martial Arts School,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [21]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0  Seafood Restaurant  0.10
1            Beer Bar  0.10
2      Farmers Market  0.10
3        Liquor Store  0.05
4              Museum  0.05


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0       Airport Lounge  0.12
1      Airport Service  0.12
2      Harbor / Marina  0.06
3  Rental Car Location  0.06
4     Sculpture Garden  0.06


----Central Bay Street----
                        venue  freq
0                 Coffee Shop  0.20
1          Italian Restaurant  0.10
2  Modern European Restaurant  0.05
3            Ramen Restaurant  0.05
4                        Café  0.05


----Christie----
                venue  freq
0       Grocery Store  0.25
1                Café  0.19
2                Park  0.12
3          Baby Store  0.06
4  Italian Restaurant  0.06


----Church and Wellesley----
            venue  freq
0             Pub  0.05
1  Breakfa

In [22]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Beer Bar,Farmers Market,Concert Hall,Park,Jazz Club,Basketball Stadium,Restaurant,Coffee Shop,Museum
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Harbor / Marina,Bar,Plane,Rental Car Location,Boutique,Boat or Ferry,Sculpture Garden,Coffee Shop
2,Central Bay Street,Coffee Shop,Italian Restaurant,Gastropub,Bubble Tea Shop,Ramen Restaurant,Spa,Poke Place,Sandwich Place,Sushi Restaurant,Art Museum
3,Christie,Grocery Store,Café,Park,Coffee Shop,Baby Store,Athletics & Sports,Restaurant,Italian Restaurant,Nightclub,Candy Store
4,Church and Wellesley,Pizza Place,Juice Bar,Park,Coffee Shop,Mexican Restaurant,Men's Store,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub
5,"Commerce Court, Victoria Hotel",Café,Gastropub,Restaurant,Coffee Shop,Museum,Bakery,Gym,Gym / Fitness Center,Hotel,American Restaurant
6,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,Bakery,Gym,Gym / Fitness Center,Seafood Restaurant,Gastropub,Pizza Place,Gluten-free Restaurant
7,"Garden District, Ryerson",Café,Clothing Store,Pizza Place,Comic Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Coffee Shop,Ramen Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Park,Plaza,IT Services,Sporting Goods Shop,Lake,Bubble Tea Shop,New American Restaurant,Skating Rink,Japanese Restaurant,Deli / Bodega
9,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Wine Bar,Bakery,Fish Market,Farmers Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop


## Clustering Neighborhoods

In [24]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 4, 0, 1, 1, 1, 2, 1])

In [25]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Breakfast Spot,Bakery,Park,Pub,Restaurant,Performing Arts Venue,Spa,Dessert Shop,Distribution Center
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Yoga Studio,Distribution Center,Portuguese Restaurant,Mexican Restaurant,Beer Bar,Creperie,Bank,Park,Smoothie Shop
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Café,Clothing Store,Pizza Place,Comic Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Coffee Shop,Ramen Restaurant
3,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Gastropub,Café,Coffee Shop,Japanese Restaurant,Food Truck,Restaurant,Middle Eastern Restaurant,BBQ Joint,Italian Restaurant,Cosmetics Shop
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Seafood Restaurant,Beer Bar,Farmers Market,Concert Hall,Park,Jazz Club,Basketball Stadium,Restaurant,Coffee Shop,Museum


In [26]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

## Cluster 1 (Airport Lounge, Coffee Shop, Cafe, Restaurants & Grocery Store)

In [27]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government",Coffee Shop,Yoga Studio,Distribution Center,Portuguese Restaurant,Mexican Restaurant,Beer Bar,Creperie,Bank,Park,Smoothie Shop
2,"Garden District, Ryerson",Café,Clothing Store,Pizza Place,Comic Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Coffee Shop,Ramen Restaurant
4,Berczy Park,Seafood Restaurant,Beer Bar,Farmers Market,Concert Hall,Park,Jazz Club,Basketball Stadium,Restaurant,Coffee Shop,Museum
10,"Commerce Court, Victoria Hotel",Café,Gastropub,Restaurant,Coffee Shop,Museum,Bakery,Gym,Gym / Fitness Center,Hotel,American Restaurant
11,"University of Toronto, Harbord",Café,Bakery,Japanese Restaurant,Bookstore,French Restaurant,Italian Restaurant,Dessert Shop,Comfort Food Restaurant,College Gym,Restaurant


## Cluster 2 (Gastropubs)

In [28]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Bay Street,Coffee Shop,Italian Restaurant,Gastropub,Bubble Tea Shop,Ramen Restaurant,Spa,Poke Place,Sandwich Place,Sushi Restaurant,Art Museum
6,Christie,Grocery Store,Café,Park,Coffee Shop,Baby Store,Athletics & Sports,Restaurant,Italian Restaurant,Nightclub,Candy Store
7,"Richmond, Adelaide, King",Café,Pizza Place,Hotel,Coffee Shop,Plaza,Lounge,Colombian Restaurant,Restaurant,Concert Hall,Smoke Shop
9,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Restaurant,Café,American Restaurant,Beer Bar,Pub,Bakery,Japanese Restaurant,Hotel,Steakhouse
12,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Wine Bar,Bakery,Fish Market,Farmers Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop
14,Rosedale,Park,Playground,Trail,College Rec Center,Cosmetics Shop,Concert Hall,Comic Shop,Comfort Food Restaurant,Colombian Restaurant,Coffee Shop
15,Stn A PO Boxes,Farmers Market,Beer Bar,Cocktail Bar,Hotel,Restaurant,Comfort Food Restaurant,Park,Jazz Club,Museum,Bakery
17,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,Bakery,Gym,Gym / Fitness Center,Seafood Restaurant,Gastropub,Pizza Place,Gluten-free Restaurant
18,Church and Wellesley,Pizza Place,Juice Bar,Park,Coffee Shop,Mexican Restaurant,Men's Store,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub


## Cluster 3 (Cafes)

In [29]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Breakfast Spot,Bakery,Park,Pub,Restaurant,Performing Arts Venue,Spa,Dessert Shop,Distribution Center
8,"Harbourfront East, Union Station, Toronto Islands",Park,Plaza,IT Services,Sporting Goods Shop,Lake,Bubble Tea Shop,New American Restaurant,Skating Rink,Japanese Restaurant,Deli / Bodega
16,"St. James Town, Cabbagetown",Café,Restaurant,Caribbean Restaurant,Bakery,Gastropub,Diner,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store


## Cluster 4 (Coffee Shop, Cafe, Park & Japanese Restaurant)

In [30]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Harbor / Marina,Bar,Plane,Rental Car Location,Boutique,Boat or Ferry,Sculpture Garden,Coffee Shop


## Cluster 5 (Seafood, steakhouse, Hotel & Cafe)

In [31]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Café,Coffee Shop,Japanese Restaurant,Food Truck,Restaurant,Middle Eastern Restaurant,BBQ Joint,Italian Restaurant,Cosmetics Shop


## Exploring Neighborhoods in Manhattan

In [32]:
# Let's create a function to repeat the same process to all the neighborhoods in Manhattan
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
# Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [34]:
# Let's check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [35]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 210 uniques categories.


## Analyzing the Neighborhoods

In [36]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auditorium,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Duty-free Shop,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health Food Store,Heliport,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Leather Goods Store,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Noodle House,Opera House,Outdoor Sculpture,Outdoors & Recreation,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [38]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    

----Battery Park City----
           venue  freq
0  Memorial Site  0.15
1           Park  0.15
2     Food Court  0.10
3    Coffee Shop  0.05
4       Building  0.05


----Carnegie Hill----
                  venue  freq
0    Italian Restaurant  0.10
1           Coffee Shop  0.10
2                   Gym  0.10
3  Gym / Fitness Center  0.10
4            Shoe Store  0.05


----Central Harlem----
                  venue  freq
0     French Restaurant  0.10
1    African Restaurant  0.10
2   American Restaurant  0.10
3          Cycle Studio  0.05
4  Ethiopian Restaurant  0.05


----Chelsea----
                venue  freq
0             Theater  0.10
1  Chinese Restaurant  0.05
2    Tapas Restaurant  0.05
3         Fish Market  0.05
4         Coffee Shop  0.05


----Chinatown----
                venue  freq
0  Chinese Restaurant  0.15
1      Sandwich Place  0.10
2                 Spa  0.10
3        Dessert Shop  0.05
4              Bakery  0.05


----Civic Center----
                  venue  freq


In [39]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Memorial Site,Park,Food Court,Monument / Landmark,Scenic Lookout,Shopping Mall,Sandwich Place,Building,Coffee Shop,Gym
1,Carnegie Hill,Coffee Shop,Gym,Gym / Fitness Center,Italian Restaurant,Shoe Store,Gourmet Shop,French Restaurant,Bagel Shop,Bookstore,Ramen Restaurant
2,Central Harlem,African Restaurant,American Restaurant,French Restaurant,Juice Bar,Boutique,Ethiopian Restaurant,Library,Gym / Fitness Center,Beer Bar,Café
3,Chelsea,Theater,Hotel,Coffee Shop,New American Restaurant,Chinese Restaurant,Café,Fish Market,French Restaurant,Speakeasy,Sushi Restaurant
4,Chinatown,Chinese Restaurant,Sandwich Place,Spa,Hotel,Cocktail Bar,Bike Shop,New American Restaurant,Noodle House,Salon / Barbershop,Bakery
5,Civic Center,Spa,Gym / Fitness Center,Yoga Studio,Gym,Dance Studio,Falafel Restaurant,Burrito Place,Monument / Landmark,Molecular Gastronomy Restaurant,French Restaurant
6,Clinton,Gym / Fitness Center,Theater,Indie Theater,Lounge,French Restaurant,Mediterranean Restaurant,Peruvian Restaurant,Pie Shop,Pizza Place,Comedy Club
7,East Harlem,Mexican Restaurant,Thai Restaurant,Latin American Restaurant,Steakhouse,New American Restaurant,Beer Bar,Donut Shop,Bakery,Sandwich Place,French Restaurant
8,East Village,Vietnamese Restaurant,Pizza Place,Coffee Shop,Dog Run,Beer Store,Scandinavian Restaurant,Beer Bar,Korean Restaurant,Bagel Shop,Dessert Shop
9,Financial District,Coffee Shop,Gym / Fitness Center,Falafel Restaurant,New American Restaurant,Doctor's Office,French Restaurant,Monument / Landmark,Café,Gym,Salad Place


## CLUSTERING NEIGHBORHOODS (MANHATTAN)

### Now we applied Machine Learning Technique “Clustering” to segment the neighborhoods in similar objects cluster. This will help to analyze from Tourist perspective and we can easily extract the Tourist places which are present on one of the clusters.

In [41]:
## Manhattan

In [42]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 4, 1, 1, 1, 4, 3, 0, 0])

In [43]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Gym,Discount Store,Coffee Shop,Sandwich Place,Yoga Studio,Tennis Stadium,Donut Shop,Diner,Department Store,Pharmacy
1,Manhattan,Chinatown,40.715618,-73.994279,2,Chinese Restaurant,Sandwich Place,Spa,Hotel,Cocktail Bar,Bike Shop,New American Restaurant,Noodle House,Salon / Barbershop,Bakery
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Café,Park,Wine Shop,Bakery,Breakfast Spot,Market,New American Restaurant,Coffee Shop,Restaurant,Ramen Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,1,Bakery,Wine Bar,Park,Yoga Studio,Pet Store,Deli / Bodega,Diner,Café,Farmers Market,Mexican Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Yoga Studio,Cocktail Bar,Mexican Restaurant,Wine Bar,Bakery,Smoke Shop,Caribbean Restaurant,Mediterranean Restaurant,School,Coffee Shop


In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## EXAMINE CLUSTERS (MANHATTAN)

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

In [45]:
#Residential
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Hotel,Italian Restaurant,Hotel Bar,Pet Store,Sandwich Place,Bookstore,Coffee Shop,Burrito Place,Bar,Bakery
9,Yorkville,Wine Shop,Italian Restaurant,Deli / Bodega,Coffee Shop,Liquor Store,Sushi Restaurant,Dog Run,Bagel Shop,Gym,Diner
11,Roosevelt Island,Deli / Bodega,Farmers Market,Liquor Store,Playground,Coffee Shop,Residential Building (Apartment / Condo),Restaurant,Park,Outdoors & Recreation,Sandwich Place
18,Greenwich Village,Italian Restaurant,Café,Sushi Restaurant,Yoga Studio,Coffee Shop,Snack Place,French Restaurant,Caribbean Restaurant,Gourmet Shop,Beer Bar
19,East Village,Vietnamese Restaurant,Pizza Place,Coffee Shop,Dog Run,Beer Store,Scandinavian Restaurant,Beer Bar,Korean Restaurant,Bagel Shop,Dessert Shop
20,Lower East Side,Chinese Restaurant,Art Gallery,Japanese Restaurant,Yoga Studio,Juice Bar,Filipino Restaurant,Café,French Restaurant,Mediterranean Restaurant,Caribbean Restaurant
24,West Village,Italian Restaurant,Cocktail Bar,Gourmet Shop,Coffee Shop,Accessories Store,Austrian Restaurant,Boutique,French Restaurant,Candy Store,Mediterranean Restaurant
36,Tudor City,Park,Yoga Studio,Thai Restaurant,Pizza Place,Deli / Bodega,Salad Place,Café,Seafood Restaurant,Bridge,Boxing Gym


In [46]:
#Commercial Spaces
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Bakery,Wine Bar,Park,Yoga Studio,Pet Store,Deli / Bodega,Diner,Café,Farmers Market,Mexican Restaurant
4,Hamilton Heights,Yoga Studio,Cocktail Bar,Mexican Restaurant,Wine Bar,Bakery,Smoke Shop,Caribbean Restaurant,Mediterranean Restaurant,School,Coffee Shop
5,Manhattanville,Coffee Shop,Italian Restaurant,Café,Lounge,Climbing Gym,Gastropub,Dumpling Restaurant,Supermarket,Bar,Juice Bar
23,Soho,Clothing Store,Men's Store,Sporting Goods Shop,Yoga Studio,Gift Shop,Cycle Studio,Dance Studio,Salon / Barbershop,Shoe Store,Miscellaneous Shop
29,Financial District,Coffee Shop,Gym / Fitness Center,Falafel Restaurant,New American Restaurant,Doctor's Office,French Restaurant,Monument / Landmark,Café,Gym,Salad Place
31,Noho,Wine Shop,French Restaurant,Rock Club,Coffee Shop,Boutique,Deli / Bodega,Sandwich Place,Gourmet Shop,Gym,Cocktail Bar
35,Turtle Bay,Karaoke Bar,Tennis Court,Grocery Store,Boxing Gym,Café,Duty-free Shop,Seafood Restaurant,Museum,French Restaurant,Cheese Shop


In [47]:
#Tourist Areas & Hubs
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Sandwich Place,Spa,Hotel,Cocktail Bar,Bike Shop,New American Restaurant,Noodle House,Salon / Barbershop,Bakery
12,Upper West Side,American Restaurant,Italian Restaurant,Bakery,Tiki Bar,Nail Salon,Movie Theater,Juice Bar,Bar,Chinese Restaurant,Bagel Shop
13,Lincoln Square,Theater,Performing Arts Venue,Indie Movie Theater,Concert Hall,Fountain,Opera House,Circus,Library,Gym / Fitness Center,Plaza
21,Tribeca,Park,American Restaurant,Indie Theater,Sushi Restaurant,Italian Restaurant,Men's Store,Cycle Studio,Playground,Poke Place,Coffee Shop
27,Gramercy,Pizza Place,Coffee Shop,Yoga Studio,Beer Bar,Irish Pub,Gourmet Shop,Liquor Store,Mexican Restaurant,Playground,Comedy Club
38,Flatiron,Japanese Restaurant,Furniture / Home Store,Cycle Studio,Coffee Shop,Donut Shop,Salon / Barbershop,Salad Place,Thai Restaurant,Gym,Gym / Fitness Center


In [48]:
#Center Activity
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym,Discount Store,Coffee Shop,Sandwich Place,Yoga Studio,Tennis Stadium,Donut Shop,Diner,Department Store,Pharmacy
7,East Harlem,Mexican Restaurant,Thai Restaurant,Latin American Restaurant,Steakhouse,New American Restaurant,Beer Bar,Donut Shop,Bakery,Sandwich Place,French Restaurant
10,Lenox Hill,Thai Restaurant,Cocktail Bar,Restaurant,Smoke Shop,Middle Eastern Restaurant,Dessert Shop,Taco Place,Liquor Store,Salad Place,Gift Shop
15,Midtown,Hotel,Plaza,French Restaurant,Cycle Studio,Park,Clothing Store,Salad Place,Chinese Restaurant,Smoke Shop,Miscellaneous Shop
16,Murray Hill,Burger Joint,Japanese Restaurant,Coffee Shop,Tea Room,Restaurant,Grocery Store,Gym,Jewish Restaurant,Sandwich Place,Museum
22,Little Italy,Sandwich Place,Wine Bar,Ice Cream Shop,Café,Gourmet Shop,Salad Place,Chinese Restaurant,Thai Restaurant,Cocktail Bar,Coffee Shop
25,Manhattan Valley,Bar,Yoga Studio,Korean Restaurant,Cosmetics Shop,Park,Coffee Shop,Ethiopian Restaurant,Mexican Restaurant,Fried Chicken Joint,Bike Shop
26,Morningside Heights,Bookstore,Park,American Restaurant,Greek Restaurant,Salad Place,Mexican Restaurant,Café,Food Truck,Outdoor Sculpture,Farmers Market
28,Battery Park City,Memorial Site,Park,Food Court,Monument / Landmark,Scenic Lookout,Shopping Mall,Sandwich Place,Building,Coffee Shop,Gym
30,Carnegie Hill,Coffee Shop,Gym,Gym / Fitness Center,Italian Restaurant,Shoe Store,Gourmet Shop,French Restaurant,Bagel Shop,Bookstore,Ramen Restaurant


In [49]:
#Cultural & Going out Places
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Park,Wine Shop,Bakery,Breakfast Spot,Market,New American Restaurant,Coffee Shop,Restaurant,Ramen Restaurant
6,Central Harlem,African Restaurant,American Restaurant,French Restaurant,Juice Bar,Boutique,Ethiopian Restaurant,Library,Gym / Fitness Center,Beer Bar,Café
14,Clinton,Gym / Fitness Center,Theater,Indie Theater,Lounge,French Restaurant,Mediterranean Restaurant,Peruvian Restaurant,Pie Shop,Pizza Place,Comedy Club
17,Chelsea,Theater,Hotel,Coffee Shop,New American Restaurant,Chinese Restaurant,Café,Fish Market,French Restaurant,Speakeasy,Sushi Restaurant


## RESULTS

Using the clustering method, the data of the respective neighborhoods/boroughs (as expected) shows that there are multiple venues that are attractive and explored by tourists.
The neighborhoods are much similar in features like Theaters, opera houses, food places, clubs, museums, parks etc. As far as concern to dissimilarity, it differs in terms of some unique places like historical places and monuments.

# Observations & Discussion

When we compare the tourist places, we observe that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available in Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Manhattan.

As a tourist, I would consider Downtown Toronto Neighborhoods first to visit, and I recommend other tourists to do so as well. One key reason for this recommendation is: The tourists have an easily travelling access due to Airport facility, which not only saves time but also helps to save money. This saved money can be utilized to explore more, the attracting venues.

# Conclusion

Both downtown Toronto and Manhattan neighborhoods have some dissimilarities, but not to a large extent. These famous cities are indeed similar based on the venues, facilites, venues and attractions.