# Outline
Part 1 Transform wikipedia page to designated dataframe
1. Build dependencies and load data
2. Clean the dataframe as instructed

Part 2 Fetch coordinates for each PostalCode

Part 3 Explore Neighborhoods in Toronto
1. Visualize Toronto neighborhoods
2. Explore neighborhood
3. Analyze each neighborhood
4. Cluster neighborhoods
5. Analysis and summary

# Part 1  Transform wikipedia page to designated dataframe

## Build Dependencies and load data


In [1]:
#!pip install geocoder

In [2]:
import datetime
import geocoder
import numpy as np
import pandas as pd

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df_TO = pd.read_html(url)[0]
df_TO.head()

Unnamed: 0,0,1,2
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


## Clean the DataFrame

In [4]:
# Assign column names, remove the first row
df_TO.columns = ["PostalCode","Borough","Neighborhood"]
df_TO.drop(0,inplace=True)
df_TO.reset_index(drop=True,inplace=True)
df_TO.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [5]:
# Ignore cells with a borough that is "Not assigned".
df_TO = df_TO[df_TO["Borough"]!="Not assigned"]
df_TO.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [6]:
# change Neighborhood from "Not assigned" to name of Borough
mask = df_TO["Neighborhood"]=="Not assigned"
df_TO.loc[mask,"Neighborhood"] = df_TO.loc[mask,"Borough"]
df_TO.loc[7]

PostalCode               M7A
Borough         Queen's Park
Neighborhood    Queen's Park
Name: 7, dtype: object

In [7]:
# group by PostalCode, combine Neighborhood into one row seperated by comma
df_TO = df_TO.groupby(["PostalCode"],as_index=False).agg({"Borough":"first","Neighborhood":", ".join})
df_TO.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [8]:
# display the shape of dataframe
df_TO.shape

(103, 3)

# Part 2 Fetch coordinates for each PostalCode

In [9]:
# fetch geo coordinates for each postal code and fill into df_TO
#for i in range(len(df_TO)):
#    # initialize your variable to None
#    lat_lng_coords = None
#
#    # loop until you get the coordinates
#    while(lat_lng_coords is None):
#        g = geocoder.google('{}, Toronto, Ontario'.format(df_TO.loc[i,"PostalCode"]))
#        lat_lng_coords = g.latlng
#    df_TO.loc[i,"Latitude"] = lat_lng_coords[0]
#    df_TO.loc[i,"Longitude"] = lat_lng_coords[1]
#df_TO.head()

In [10]:
# since geocoder is unstable, I use the csv geo data instead
geodata = pd.read_csv("http://cocl.us/Geospatial_data")
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
# for each PostalCode find its corresponding Latitude and Longitude from geodata
df_TO = df_TO.join(geodata.set_index('Postal Code'), on='PostalCode')
df_TO.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Part 3 Explore Neighborhoods in Toronto

## 1. Visualize Toronto neighborhoods

In [12]:
#address = 'Toronto, ON'
#geolocator = Nominatim(user_agent="ny_explorer")
#location = geolocator.geocode(address)
#latitude = location.latitude
#longitude = location.longitude
#print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

# find the geograpical coordinate of Toronto
longitude = -79.3832
latitude = 43.6532

In [13]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df_TO['Latitude'], df_TO['Longitude'], df_TO['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## 2. Explore neighborhood

first we only explore one neighborhood.

In [14]:
# define Foursquare credentials and version
CLIENT_ID = '1FZ5D3IFSLKIRBG42HEPKV10T5IOGKIO3NGTUZWBMYLKT2CX' # your Foursquare ID
CLIENT_SECRET = 'TBGVXN3CCJHJDVV2TJQWIVMOO1AVZLBDYZC3W3JAI2UR1JN4' # your Foursquare Secret
VERSION = (datetime.date.today()-datetime.timedelta(1)).strftime("%Y%m%d") # Foursquare API version: yesterday

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1FZ5D3IFSLKIRBG42HEPKV10T5IOGKIO3NGTUZWBMYLKT2CX
CLIENT_SECRET:TBGVXN3CCJHJDVV2TJQWIVMOO1AVZLBDYZC3W3JAI2UR1JN4


In [15]:
# Let's explore the first neighborhood in our dataframe
neighborhood_latitude = df_TO.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_TO.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_TO.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge, Malvern are 43.806686299999996, -79.19435340000001.


In [16]:
# Now, let's get the top 100 venues that are in Rouge, Malvern within a radius of 500 meters.
# First, let's create the GET request URL. Name your URL url
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
# send the get request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5df4c2a414a126001bccb83a'},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.8111863045, 'lng': -79.18812958073042},
   'sw': {'lat': 43.80218629549999, 'lng': -79.2005772192696}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bb6b9446edc76b0d771311c',
       'name': "Wendy's",
       'location': {'crossStreet': 'Morningside & Sheppard',
        'lat': 43.80744841934756,
        'lng': -79.19905558052072,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.80744841934756,
          'lng': -79.19905558052072}],
        'distance': 387,
        'cc': 'CA',
        'city': 'Toronto',
    

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056
1,Interprovincial Group,Print Shop,43.80563,-79.200378


### Now Let's repeat the process to explore all neighborhoods in Toronto
Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [19]:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry



def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    i = 0
    
    session = requests.Session()
    retry = Retry(connect=3, backoff_factor=0.5)
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(str(i)+" "+name)
        i += 1
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = session.get(url).json()["response"]['groups'][0]['items']#requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
toronto_venues = getNearbyVenues(names=df_TO['Neighborhood'],
                                   latitudes=df_TO['Latitude'],
                                   longitudes=df_TO['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head()

0 Rouge, Malvern
1 Highland Creek, Rouge Hill, Port Union
2 Guildwood, Morningside, West Hill
3 Woburn
4 Cedarbrae
5 Scarborough Village
6 East Birchmount Park, Ionview, Kennedy Park
7 Clairlea, Golden Mile, Oakridge
8 Cliffcrest, Cliffside, Scarborough Village West
9 Birch Cliff, Cliffside West
10 Dorset Park, Scarborough Town Centre, Wexford Heights
11 Maryvale, Wexford
12 Agincourt
13 Clarks Corners, Sullivan, Tam O'Shanter
14 Agincourt North, L'Amoreaux East, Milliken, Steeles East
15 L'Amoreaux West
16 Upper Rouge
17 Hillcrest Village
18 Fairview, Henry Farm, Oriole
19 Bayview Village
20 Silver Hills, York Mills
21 Newtonbrook, Willowdale
22 Willowdale South
23 York Mills West
24 Willowdale West
25 Parkwoods
26 Don Mills North
27 Flemingdon Park, Don Mills South
28 Bathurst Manor, Downsview North, Wilson Heights
29 Northwood Park, York University
30 CFB Toronto, Downsview East
31 Downsview West
32 Downsview Central
33 Downsview Northwest
34 Victoria Village
35 Woodbine Gardens, Pa

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target


In [21]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target


In [22]:
# check how many venues returned for each neighborhood
print(toronto_venues.groupby('Neighborhood')['Venue'].count())
# how many unique categories were found
print(len(toronto_venues["Venue Category"].unique()))

Neighborhood
Adelaide, King, Richmond                                                                                         100
Agincourt                                                                                                          5
Agincourt North, L'Amoreaux East, Milliken, Steeles East                                                           3
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown      8
Alderwood, Long Branch                                                                                             9
Bathurst Manor, Downsview North, Wilson Heights                                                                   19
Bayview Village                                                                                                    4
Bedford Park, Lawrence Manor East                                                                                 23
Berczy Park                                        

## 3. Analyze Each Neighborhood

convert the dataframe with one hot encoding and group venues by neighborhood

In [23]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(2241, 272)


Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
# group venues by neighborhood, take the mean of each category
toronto_grouped = toronto_onehot.groupby(["Neighborhood"]).mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(101, 272)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


select top 10 venus for each neighborhood and put them into a new dataframe

In [25]:
# a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Steakhouse,Burger Joint,Bar,Bakery,Restaurant,Sushi Restaurant,Asian Restaurant
1,Agincourt,Latin American Restaurant,Skating Rink,Clothing Store,Lounge,Breakfast Spot,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Arts & Crafts Store,Playground,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Pizza Place,Grocery Store,Video Store,Beer Store,Pharmacy,Fried Chicken Joint,Fast Food Restaurant,Sandwich Place,Discount Store,Department Store
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Pharmacy,Sandwich Place,Dance Studio,Skating Rink,Pub,Gym,Colombian Restaurant,Curling Ice


## 4.Cluster Neighborhoods

run k means to cluster the neighborhoods into 5 clusters

In [26]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated
print(kmeans.labels_)

[4 4 2 1 1 1 4 4 4 4 1 4 4 2 4 4 2 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4
 1 4 2 4 2 4 4 4 4 2 4 4 4 4 4 4 4 1 1 4 1 4 2 4 1 4 2 4 4 4 4 2 4 4 4 4 2
 4 2 4 1 4 4 3 0 4 4 4 4 4 4 4 4 4 4 4 1 2 4 1 1 1 4 2]


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = df_TO.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,1.0,Fast Food Restaurant,Print Shop,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,4.0,Bar,Construction & Landscaping,Moving Target,Women's Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,4.0,Electronics Store,Breakfast Spot,Medical Center,Intersection,Mexican Restaurant,Pizza Place,Rental Car Location,Spa,Doner Restaurant,Donut Shop
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Korean Restaurant,Pharmacy,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,4.0,Bakery,Bank,Athletics & Sports,Hakka Restaurant,Fried Chicken Joint,Caribbean Restaurant,Gas Station,Thai Restaurant,Women's Store,Diner


Now we check the labels to see if there is any nan values

In [28]:
sum(toronto_merged["Cluster Labels"].isnull())

1

There is indeed one neighborhood with no cluster label, because there is no venue found for it. Thus we remove this neighborhood.

In [29]:
toronto_merged = toronto_merged[~toronto_merged["Cluster Labels"].isnull()]
toronto_merged.shape

(102, 16)

Finally, let's visualize the resulting clusters

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Analysis and summary

Cluster 0

It is a single neighborhood with cafes, dog runs and delis.

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"Silver Hills, York Mills",0.0,Cafeteria,Dog Run,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store,Dance Studio


Cluster 1

It contains suburban neighborhoods with mainly fast food restaurants, pizza places and dog runs.

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rouge, Malvern",1.0,Fast Food Restaurant,Print Shop,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
3,Woburn,1.0,Coffee Shop,Korean Restaurant,Pharmacy,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
13,"Clarks Corners, Sullivan, Tam O'Shanter",1.0,Pizza Place,Bank,Noodle House,Shopping Mall,Pharmacy,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Fried Chicken Joint,Thai Restaurant
15,L'Amoreaux West,1.0,Fast Food Restaurant,Chinese Restaurant,Thrift / Vintage Store,Coffee Shop,Grocery Store,Gym,Breakfast Spot,Sandwich Place,Pharmacy,Pizza Place
17,Hillcrest Village,1.0,Golf Course,Pool,Fast Food Restaurant,Dog Run,Mediterranean Restaurant,Athletics & Sports,Comic Shop,Department Store,Eastern European Restaurant,Dumpling Restaurant
24,Willowdale West,1.0,Coffee Shop,Pharmacy,Pizza Place,Butcher,Discount Store,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
28,"Bathurst Manor, Downsview North, Wilson Heights",1.0,Coffee Shop,Middle Eastern Restaurant,Frozen Yogurt Shop,Supermarket,Sushi Restaurant,Deli / Bodega,Fast Food Restaurant,Restaurant,Pizza Place,Pharmacy
33,Downsview Northwest,1.0,Athletics & Sports,Grocery Store,Gym / Fitness Center,Liquor Store,Women's Store,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
35,"Woodbine Gardens, Parkview Hill",1.0,Pizza Place,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Gym / Fitness Center,Gastropub,Pharmacy,Bus Line,Dessert Shop
88,"Humber Bay Shores, Mimico South, New Toronto",1.0,Flower Shop,Restaurant,Liquor Store,Coffee Shop,Café,Gym,Sandwich Place,Pharmacy,American Restaurant,Bakery


Cluster 2

It contains suburban areas with mainly park, playground and dog run.

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"Agincourt North, L'Amoreaux East, Milliken, St...",2.0,Park,Arts & Crafts Store,Playground,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
21,"Newtonbrook, Willowdale",2.0,Park,Piano Bar,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store
23,York Mills West,2.0,Park,Bank,Convenience Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Deli / Bodega
25,Parkwoods,2.0,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Ethiopian Restaurant
30,"CFB Toronto, Downsview East",2.0,Park,Airport,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop
40,East Toronto,2.0,Park,Convenience Store,Coffee Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Women's Store
44,Lawrence Park,2.0,Park,Gym / Fitness Center,Bus Line,Swim School,Dumpling Restaurant,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dance Studio
50,Rosedale,2.0,Park,Playground,Trail,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Eastern European Restaurant,Cupcake Shop
64,"Forest Hill North, Forest Hill West",2.0,Park,Sushi Restaurant,Trail,Jewelry Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
74,Caledonia-Fairbanks,2.0,Park,Fast Food Restaurant,Market,Women's Store,General Travel,General Entertainment,Dumpling Restaurant,Drugstore,Donut Shop,Gift Shop


Cluster 3

It is a single neighborhood with playground, women's store and doner restaurant.

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough Village,3.0,Playground,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop


Cluster 4

It contains downtown areas with mainly bars, banks and restaurants.

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Highland Creek, Rouge Hill, Port Union",4.0,Bar,Construction & Landscaping,Moving Target,Women's Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
2,"Guildwood, Morningside, West Hill",4.0,Electronics Store,Breakfast Spot,Medical Center,Intersection,Mexican Restaurant,Pizza Place,Rental Car Location,Spa,Doner Restaurant,Donut Shop
4,Cedarbrae,4.0,Bakery,Bank,Athletics & Sports,Hakka Restaurant,Fried Chicken Joint,Caribbean Restaurant,Gas Station,Thai Restaurant,Women's Store,Diner
6,"East Birchmount Park, Ionview, Kennedy Park",4.0,Discount Store,Chinese Restaurant,Convenience Store,Hobby Shop,Department Store,Bus Station,Coffee Shop,Dumpling Restaurant,Drugstore,Eastern European Restaurant
7,"Clairlea, Golden Mile, Oakridge",4.0,Bus Line,Bakery,Metro Station,Park,Bus Station,Soccer Field,Fast Food Restaurant,Intersection,Diner,Dessert Shop
8,"Cliffcrest, Cliffside, Scarborough Village West",4.0,Motel,American Restaurant,Women's Store,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
9,"Birch Cliff, Cliffside West",4.0,Skating Rink,College Stadium,Café,General Entertainment,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
10,"Dorset Park, Scarborough Town Centre, Wexford ...",4.0,Indian Restaurant,Furniture / Home Store,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Thrift / Vintage Store,Women's Store,Deli / Bodega,Department Store,Dessert Shop
11,"Maryvale, Wexford",4.0,Auto Garage,Breakfast Spot,Bakery,Sandwich Place,Middle Eastern Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Eastern European Restaurant,Ethiopian Restaurant
12,Agincourt,4.0,Latin American Restaurant,Skating Rink,Clothing Store,Lounge,Breakfast Spot,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store


In Summary, I think the neighborhoods of Toronto can be mainly divided into 3 clusters: the downtown areas, the suburban areas with mainly parks and the suburban areas with residences nearby.