# Capstone Project - Similar neighbourhoods in New York and Toronto



## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)




## Introduction: Business Problem <a name="introduction"></a>

Manhattan is the smallest and most densely populated borough of New York City. With 72,033 people per square mile in 2015, this borough has density higher than any individual American city (1).

I currently live in New York and work for a big IT Consultancy company. I've been living in Manhattan for the last 5 year and have moved a few times until I found Harlem. I love everything about this neighbourhood, from restaurants to supermarkets and gym facilities, this is home for me.

My consultancy company won a big project in Old Town Toronto. I have been lucky to be assigned to this project, which is going to run for at least 24 months. I was asked to choose a place to rent in Old Town Toronto. Rent price is not a problem for the company and therefore I want to move to a neighbourhood similar to Harlem.

Both Manhattan and Old Toronto are boroughs of main cities in US and Canada. It's very difficult to compare these two neighbours due to its difference in demographics but also due to the way these boroughs have split their neighbourhoods.
The Old Town Toronto has only 64 neighbourhoods, less than half of Manhattan (2)(3). It's population is 3,169 people per square mile (4), just 5% of Manhattan!


## Data <a name="data"></a>

Comparing Neighbourhoods is not an easy task, so comparing Neighbourhoods of two different cities is even more complicate. **How can I compare Harlem neighbourhood to Old Town Toronto neighbourhoods and find the most similar place to live in?**

This problem affects many people the globe who need to move their locations to a new city due to new job opportunities. In this case the origin is Harlem, but it can be changes to any other place.



### Approach:

One way of comparing and segmenting neighbourhoods is to use Foursquare data to rank they types of venues in each neighbourhood. Then, I can segment these venues to identify the neighbourhoods in Toronto that are in the same segment as Harlem.

1) List of neighbourhoods in Old Town Toronto
* I need the list of all Old Town Toronto neighbourhoods, along with its latitude and longitude so that I can use Foursquare to obtain information about the surrounding venue categories and frequency. 
* My data source for this exercise will be the list of Toronto boroughs and neighbourhoods https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M along with the latitude and Longitude of each postcode in Toronto https://cocl.us/Geospatial_data. I will have to scrap this webpage and load only the Old Town Toronto neighbourhoods in a dataframe.

2) Harlem geolocation
* I need the latitude and longitude of Harlem so that I can use Foursquare to obtain information about the surrounding venue categories and frequency. 
* My data source is https://www.gps-latitude-longitude.com/gps-coordinates-of-harlem and the latitude is  for this is 40.8115504 and longitude is -73.9464769.

3) Get venues in each Neighbourhood using Foursquare
* I will run API calls to obtain the list of all venues in a radius of 750 meter of each neighbourhood geolocation.
* I will clean the data and create a dataframe with the frequency of the top 10 venue categories per neighbourhood.

4) Segment the neighbourhoods to find Harlem like neighbourhoods
* I will use K-means segmentation to find the best number of clusters and group the neighbourhoods.
* Once segmented, I will identify which cluster has Harlem. The neighboughoods in this cluster will be my short-list of candidates for Toronto.



### 1) List of neighbourhoods in Old Town Toronto

In [10]:
import pandas as pd
import numpy as np
import json
import urllib.request
from bs4 import BeautifulSoup
import re

In [24]:
# List of neighbourhoods in Old Town Toronto

# The source of this data is a LIST OF BOROUGS GROUPED BY POST CODE!!
# This means we will group boroughs that are in the same post
# The data source contains all Toronto boroughs, therefore I will filter the data to consider only the ones in Old Town Toronto, which in this case means that the borough name has the word 'Toronto'

# data source - webpage table
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")

# find the HTML table using class wikitable sortable
table=soup.find('table', class_='wikitable sortable')

# go though the table and find the rows (tr) and cells (td). The result of each cell is saved in temporary array
PostalCode=[]
Borough=[]
Neighbourhood=[]

for row in table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        PostalCode.append(cells[0].find(text=True))
        Borough.append(cells[1].find(text=True))
        Neighbourhood.append(cells[2].find(text=True))
        
# save the boroughs in
toronto_df=pd.DataFrame(PostalCode,columns=['PostalCode'])
toronto_df['Borough']=Borough
toronto_df['Neighbourhood']=Neighbourhood
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [25]:
# let's clean the data to trim data and remove special characters
toronto_df['PostalCode'] = toronto_df['PostalCode'].str.strip()
toronto_df['Borough'] = toronto_df['Borough'].str.strip()
toronto_df['Neighbourhood'] = toronto_df['Neighbourhood'].str.strip()

In [26]:
# Get list of post codes
latlong_df = pd.read_csv('https://cocl.us/Geospatial_data')

# rename column to merge data
latlong_df.rename(columns={'Postal Code':'PostalCode'}, inplace=True)

latlong_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [27]:
toronto_df = pd.merge(toronto_df, latlong_df, how='left', on='PostalCode')
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1A,Not assigned,,,
1,M2A,Not assigned,,,
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636


In [28]:
# Let's filter out the postal codes / neighbourhoods that don't belong to Old Town Toronto.
# in this case means that the borough name has the word 'Toronto'
toronto_df = toronto_df.loc[toronto_df['Borough'].isin(['Downtown Toronto','Central Toronto','East Toronto', 'West Toronto'])].sort_values(['Borough', 'Neighbourhood']).reset_index(drop=True)

# remove the PostalCode column as this is not needed anymore
toronto_df.drop(columns=['PostalCode'], inplace=True)
toronto_df.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Central Toronto,Davisville,43.704324,-79.38879
1,Central Toronto,Davisville North,43.712751,-79.390197
2,Central Toronto,Forest Hill North & West,43.696948,-79.411307
3,Central Toronto,Lawrence Park,43.72802,-79.38879
4,Central Toronto,Moore Park / Summerhill East,43.689574,-79.38316


### 2) Harlem geolocation

In [29]:
# append the Harlem location to toronto dataframe
toronto_df = toronto_df.append({'Borough': 'Manhattan','Neighbourhood':'Harlem','Latitude':'40.8115504','Longitude':'-73.9464769'}, ignore_index=True)
toronto_df.tail()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
35,West Toronto,High Park / The Junction South,43.6616,-79.4648
36,West Toronto,Little Portugal / Trinity,43.6479,-79.4197
37,West Toronto,Parkdale / Roncesvalles,43.649,-79.4563
38,West Toronto,Runnymede / Swansea,43.6516,-79.4844
39,Manhattan,Harlem,40.8115504,-73.9464769


### 3) Get venues in each Neighbourhod using Foursquare

In [30]:
# Forusquare client id and password
CLIENT_ID = 'RDURISPBOMYSLWDTIIRPLY124ZSZ5S5QY1GZY1CVH4RUFGW3' # your Foursquare ID
CLIENT_SECRET = 'MTLSWMK2MWRN4SHMPNNIBQULYBGHDPX55O4XY32V1A5Y3TAN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


In [37]:
# Let's create a function to return a dataframe with the top 100 venues in a 750 meter radius of each Neighbourhood (Latitute and longitude)

import requests # library to handle requests

def getNearbyVenues(names, latitudes, longitudes, radius=750, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                'Neighbourhood Latitude', 
                'Neighbourhood Longitude', 
                'Venue', 
                'Venue Latitude', 
                'Venue Longitude', 
                'Venue Category']
    
    return(nearby_venues)

In [38]:
# Let's use the function above to get the toronto venues
venues_df = getNearbyVenues(names=toronto_df['Neighbourhood'],
                                 latitudes=toronto_df['Latitude'],
                                 longitudes=toronto_df['Longitude'])

Davisville
Davisville North
Forest Hill North & West
Lawrence Park
Moore Park / Summerhill East
North Toronto West
Roselawn
Summerhill West / Rathnelly / South Hill / Forest Hill SE / Deer Park
The Annex / North Midtown / Yorkville
Berczy Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Central Bay Street
Christie
Church and Wellesley
Commerce Court / Victoria Hotel
First Canadian Place / Underground city
Garden District / Ryerson
Harbourfront East / Union Station / Toronto Islands
Kensington Market / Chinatown / Grange Park
Queen's Park / Ontario Provincial Government
Regent Park / Harbourfront
Richmond / Adelaide / King
Rosedale
St. James Town
St. James Town / Cabbagetown
Stn A PO Boxes
Toronto Dominion Centre / Design Exchange
University of Toronto / Harbord
Business reply mail Processing Centre
India Bazaar / The Beaches West
Studio District
The Beaches
The Danforth West / Riverdale
Brockton / Parkdale Village / E

In [39]:
# check the shape and head
print(venues_df.shape)
venues_df.head()

(2766, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Davisville,43.7043,-79.3888,Jules Cafe Patisserie,43.704138,-79.388413,Dessert Shop
1,Davisville,43.7043,-79.3888,Thobors Boulangerie Patisserie Café,43.704514,-79.388616,Café
2,Davisville,43.7043,-79.3888,Marigold Indian Bistro,43.702881,-79.388008,Indian Restaurant
3,Davisville,43.7043,-79.3888,XO Gelato,43.705177,-79.388793,Dessert Shop
4,Davisville,43.7043,-79.3888,Viva Napoli,43.705752,-79.389125,Pizza Place





## Methodology <a name="methodology"></a>

In this poject, I am focusing in listing and ranking the frequency of venue categories in a radius of 750 meters of each Old Town Toronto neighbourhood. Then, based on the top 10 venue categories, I will cluster and segment them to find the which one is similar to Harlem.

The first steps were all about collecting the name of each neighbourhood in Old Town Toronto and it's geolocation (latitude and longitude). The same information was collected for Harlem. I used Foursqueare to collected the list of 100 venues in a radius 750 meters from each neighbourhood.

In the analysis phase, I will explore the 100 venues in each neighbourhood using one hot and mean analysis to then cluster the venue categories and identify the top 10 venue categories in each neighbourhood.

In the third and final step i will segment the neighboughoods and its venue clusters using K-means to find the neighbourhood with similar characteristics to Harlem.

## Analysis <a name="analysis"></a>

In [41]:
# let's have a look at the number of venues per neighbourhood returned by Foursquare
venues_df.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
Brockton / Parkdale Village / Exhibition Place,84,84,84,84,84,84
Business reply mail Processing Centre,51,51,51,51,51,51
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,26,26,26,26,26,26
Central Bay Street,100,100,100,100,100,100
Christie,34,34,34,34,34,34
Church and Wellesley,100,100,100,100,100,100
Commerce Court / Victoria Hotel,100,100,100,100,100,100
Davisville,70,70,70,70,70,70
Davisville North,32,32,32,32,32,32


In [42]:
# Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame

# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = venues_df['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Davisville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Davisville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Davisville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Davisville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Davisville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
# group by neighbourhood and calculate the mean per venue category
venues_freq_df = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
venues_freq_df.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
1,Brockton / Parkdale Village / Exhibition Place,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0
3,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.0,0.0,0.038462,0.038462,0.038462,0.076923,0.076923,0.038462,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01


In [44]:
# let's list the top 10 categories per neighbourhood

num_top_venues = 10

for hood in venues_freq_df['Neighbourhood']:
    print("----"+hood+"----")
    temp = venues_freq_df[venues_freq_df['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0          Coffee Shop  0.09
1                Hotel  0.06
2                 Café  0.05
3  Japanese Restaurant  0.04
4               Bakery  0.03
5         Cocktail Bar  0.03
6           Restaurant  0.03
7             Beer Bar  0.03
8   Seafood Restaurant  0.02
9                 Park  0.02


----Brockton / Parkdale Village / Exhibition Place----
                    venue  freq
0             Coffee Shop  0.07
1                    Café  0.07
2              Restaurant  0.05
3                     Bar  0.04
4               Gift Shop  0.04
5                  Bakery  0.04
6  Thrift / Vintage Store  0.02
7             Music Venue  0.02
8   Performing Arts Venue  0.02
9  Furniture / Home Store  0.02


----Business reply mail Processing Centre----
                  venue  freq
0  Fast Food Restaurant  0.10
1    Italian Restaurant  0.04
2           Coffee Shop  0.04
3                   Bar  0.04
4                Bakery  0.04
5               Brewery 

In [46]:
# let's create a dataframe with this data

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
top_10_venues = pd.DataFrame(columns=columns)
top_10_venues['Neighbourhood'] = venues_freq_df['Neighbourhood']

for ind in np.arange(venues_freq_df.shape[0]):
    top_10_venues.iloc[ind, 1:] = return_most_common_venues(venues_freq_df.iloc[ind, :], num_top_venues)

print(top_10_venues.shape)
top_10_venues.head()

(40, 11)


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Hotel,Café,Japanese Restaurant,Cocktail Bar,Beer Bar,Restaurant,Bakery,Gastropub,Seafood Restaurant
1,Brockton / Parkdale Village / Exhibition Place,Café,Coffee Shop,Restaurant,Gift Shop,Bar,Bakery,Furniture / Home Store,Thrift / Vintage Store,Supermarket,Music Venue
2,Business reply mail Processing Centre,Fast Food Restaurant,Park,Restaurant,Italian Restaurant,Light Rail Station,Bar,Brewery,Burrito Place,Bakery,Clothing Store
3,CN Tower / King and Spadina / Railway Lands / ...,Rental Car Location,Harbor / Marina,Boat or Ferry,Coffee Shop,Sculpture Garden,Airport Lounge,Airport Service,Boutique,Pier,Park
4,Central Bay Street,Coffee Shop,Café,Clothing Store,Japanese Restaurant,Art Gallery,Italian Restaurant,Ramen Restaurant,Arts & Crafts Store,Diner,Creperie


In [47]:
# Now I can cluster the neighboughoods using K-means

from sklearn.cluster import KMeans

# set number of clusters. since we have a group of 40 boroughs, I'm going to create 8 clusters
kclusters = 8

grouped_clustering = venues_freq_df.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters=kclusters, random_state=0).fit(grouped_clustering)


In [48]:
# add cluster numbers to dataframe
top_10_venues.insert(0, 'Cluster Labels', kmeans.labels_)
top_10_venues.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Berczy Park,Coffee Shop,Hotel,Café,Japanese Restaurant,Cocktail Bar,Beer Bar,Restaurant,Bakery,Gastropub,Seafood Restaurant
1,0,Brockton / Parkdale Village / Exhibition Place,Café,Coffee Shop,Restaurant,Gift Shop,Bar,Bakery,Furniture / Home Store,Thrift / Vintage Store,Supermarket,Music Venue
2,5,Business reply mail Processing Centre,Fast Food Restaurant,Park,Restaurant,Italian Restaurant,Light Rail Station,Bar,Brewery,Burrito Place,Bakery,Clothing Store
3,4,CN Tower / King and Spadina / Railway Lands / ...,Rental Car Location,Harbor / Marina,Boat or Ferry,Coffee Shop,Sculpture Garden,Airport Lounge,Airport Service,Boutique,Pier,Park
4,0,Central Bay Street,Coffee Shop,Café,Clothing Store,Japanese Restaurant,Art Gallery,Italian Restaurant,Ramen Restaurant,Arts & Crafts Store,Diner,Creperie


In [51]:
# add the location of each neighbourhood
top_10_venues_df = toronto_df.join(top_10_venues.set_index('Neighbourhood'), on='Neighbourhood')
top_10_venues_df.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Davisville,43.7043,-79.3888,5,Italian Restaurant,Coffee Shop,Dessert Shop,Restaurant,Café,Pizza Place,Sandwich Place,Gym,Indian Restaurant,Bar
1,Central Toronto,Davisville North,43.7128,-79.3902,5,Coffee Shop,Pizza Place,Gym,Park,Bar,Café,Taco Place,Dessert Shop,Diner,Sandwich Place
2,Central Toronto,Forest Hill North & West,43.6969,-79.4113,7,Park,Gym / Fitness Center,Jewelry Store,Sushi Restaurant,Trail,Yoga Studio,Donut Shop,Distribution Center,Dive Bar,Dog Run
3,Central Toronto,Lawrence Park,43.728,-79.3888,2,Coffee Shop,Park,Swim School,Bus Line,Yoga Studio,Dumpling Restaurant,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
4,Central Toronto,Moore Park / Summerhill East,43.6896,-79.3832,6,Grocery Store,Park,Candy Store,Playground,Tennis Court,Thai Restaurant,Sandwich Place,Gym / Fitness Center,Gym,Café


In [53]:
#Let's find out Harlem's cluster number
harlem_df = top_10_venues_df.loc[top_10_venues_df['Borough'] =='Manhattan']
harlem_df

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,Manhattan,Harlem,40.8115504,-73.9464769,0,French Restaurant,Cosmetics Shop,Southern / Soul Food Restaurant,Jazz Club,Cocktail Bar,Burger Joint,Café,American Restaurant,Theater,Pizza Place


In [54]:
# not let's see all neighbourhoods in cluster zero (same as harlem)
harlem_like_df = top_10_venues_df.loc[top_10_venues_df['Cluster Labels'] == 0]
harlem_like_df

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Toronto,North Toronto West,43.7154,-79.4057,0,Coffee Shop,Sporting Goods Shop,Café,Clothing Store,Grocery Store,Skating Rink,Italian Restaurant,Restaurant,Diner,Dessert Shop
9,Downtown Toronto,Berczy Park,43.6448,-79.3733,0,Coffee Shop,Hotel,Café,Japanese Restaurant,Cocktail Bar,Beer Bar,Restaurant,Bakery,Gastropub,Seafood Restaurant
11,Downtown Toronto,Central Bay Street,43.658,-79.3874,0,Coffee Shop,Café,Clothing Store,Japanese Restaurant,Art Gallery,Italian Restaurant,Ramen Restaurant,Arts & Crafts Store,Diner,Creperie
13,Downtown Toronto,Church and Wellesley,43.6659,-79.3832,0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Sandwich Place,Restaurant,Men's Store,Diner,Pizza Place,Bookstore,Smoke Shop
14,Downtown Toronto,Commerce Court / Victoria Hotel,43.6482,-79.3798,0,Coffee Shop,Café,Hotel,American Restaurant,Japanese Restaurant,Seafood Restaurant,Restaurant,Concert Hall,Gym,Vegetarian / Vegan Restaurant
15,Downtown Toronto,First Canadian Place / Underground city,43.6484,-79.3823,0,Hotel,Coffee Shop,Café,Japanese Restaurant,Theater,Restaurant,Concert Hall,Seafood Restaurant,Deli / Bodega,Bookstore
16,Downtown Toronto,Garden District / Ryerson,43.6572,-79.3789,0,Coffee Shop,Hotel,Gastropub,Falafel Restaurant,Café,Ramen Restaurant,Japanese Restaurant,Sushi Restaurant,Burger Joint,Park
17,Downtown Toronto,Harbourfront East / Union Station / Toronto Is...,43.6408,-79.3818,0,Coffee Shop,Hotel,Boat or Ferry,Brewery,Park,Japanese Restaurant,Bar,Pizza Place,Café,Plaza
18,Downtown Toronto,Kensington Market / Chinatown / Grange Park,43.6532,-79.4,0,Café,Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Mexican Restaurant,Bakery,Dessert Shop,Art Gallery,Ice Cream Shop,Record Shop
19,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.6623,-79.3895,0,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Sushi Restaurant,Park,Japanese Restaurant,Gastropub,Diner,Ice Cream Shop


In [56]:
# let's plot these neighbourhoods in a map

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors

In [78]:
# toronto latitude longitude
plot_df = top_10_venues_df.loc[top_10_venues_df['Borough'] !='Manhattan']
latitude=43.6532
longitude=-79.3832

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = ['#3d3d3d', '#3d3d3d', '#3d3d3d', '#3d3d3d', '#3d3d3d', '#3d3d3d', '#3d3d3d', '#FF0000']


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(plot_df['Latitude'], plot_df['Longitude'], plot_df['Neighbourhood'], plot_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

The analysis shows that the borough of Downtown Toronto is by far the borough with more neighbourhoods classified and clustered as like Harlem. There some neighbourhoods in West Toronto and Central Toronto that share the same profile. You can see the most common top 10 venue categories in each neighbourhood that are part of Harlem's cluster as well as a map with the location of each borough (in red).

I can also observe that Harlem's top 4 common venue categories are not in any of the suggested boroughs. You can only see a match from the 5th venue category, "Cocktail Bar", matching the boroughs of Little Portugal, Trinity and Berczy Park.

In fact, there is no borough in Toronto that has "French restaurant" in the first place of most common venue category. This ultimately suggests that the experience in Toronto is going to be perhaps different from Harlem since there is not a strong correlation of common venues in Toronto.

## Conclusion <a name="conclusion"></a>

This project was create to help me identify neighbourhoods in Old Town Toronto that are similar Harlem (Manhattan) due to a recent change in career that is forcing me to leave the place I love. To help me fid the best places in Toronto, that I will enjoy as I do in Harlem, I have used data from Foursquare to list the most common venues around 750 meters of each neighbourhood in Toronto, to find those which match with Harlem's most common venues.

Downtown Toronto is the place to start for, as it has the most neighbourhoods in the same cluster as Harlem.

## References

(1) New York City Neighbourhoods - https://en.wikipedia.org/wiki/New_York_City#Geography

(2) Manhattan demographics - https://en.wikipedia.org/wiki/List_of_Manhattan_neighborhoods

(3) Toronto Neighbourhoods - https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto

(4) Toronto demographics - https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods