<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto (Part 3)</font></h1>

<h1 align=center><font size = 1>Part 1 - Scrape Wikipedia Table</font></h1>

In this Notebook, I will be using code to scrape the table of postal codes from the following Wikipedia page and creating a pandas dataframe: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

First let's install the necessary packages and libraries.

In [1]:
#install relevant packages
!pip install beautifulsoup4
!pip install lxml
!pip install requests

#import relevant libraries
from bs4 import BeautifulSoup
import numpy as np 
import pandas as pd 
import requests



Now let's assign the website link to a variable named postal_codes.

In [2]:
#assign name to link
postal_codes = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
postal_codes

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of postal codes of Canada: M - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":890001695,"wgRevisionId":890001695,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat"

You can see the data comes in a big, messy "soup". Let's clean it up.

We can create a BeautifulSoup object to extract data from HTML and Prettify() to view how the tags are nested in the document.

In [3]:
#import bs4 and use prettify() to view tags
from bs4 import BeautifulSoup
soup = BeautifulSoup(postal_codes,"lxml")
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":890001695,"wgRevisionId":890001695,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wg

The data is a little more readable, but we need to extract and clean the information specific to columns "PostalCode", "Borough", and "Neighborhood".
Looking at the HTML script, we see that all the table data we need follows the class 'wikitable sortable'. 

In [4]:
#find class ‘wikitable sortable’ in the HTML script
toronto_table = soup.find("table",{"class":"wikitable sortable"})
toronto_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

Now let's turn the extracted table into a dataframe. First we need to create the columns in our table - "PostalCode", "Borough", and "Neighborhood" - and append them with information from the HTML script by using the findAll function, which will extract data within the HTML tr and td. Then we will create the dataframe.

In [5]:
#create our table columns as lists
PostalCode = []
Borough = []
Neighborhood = []

In [6]:
#extract data from HTML script  and append to lists
for row in soup.findAll("tr"):
    cells = row.findAll("td")
    if len(cells) == 3:
        PostalCode.append(cells[0].findAll(text = True))
        Borough.append(cells[1].findAll(text = True))
        Neighborhood.append(cells[2].findAll(text = True))

In [7]:
#create dataframe
table = pd.DataFrame(data=[PostalCode,Borough,Neighborhood]).transpose()

#name columns
table.columns = ['PostalCode', 'Borough', 'Neighborhood']

#adjust type so we can format and clean the dataframe
table['PostalCode'] = table['PostalCode'].str.get(0)
table['Borough'] = table['Borough'].str.get(0)
table['Neighborhood'] = table['Neighborhood'].str.get(0)

Now that we have our dataframe set up, let's clean it up.

In [8]:
#clean up neighborhoods with '\n' at the end
table['Neighborhood'] = table['Neighborhood'].replace('\n','',regex=True)

#remove boroughs listed as not assigned
table.drop(table[table.Borough == "Not assigned"].index, inplace=True)

#remove duplicate rows
table.drop_duplicates(['Neighborhood'])

#assign neighborhood equal to borough if not assigned
table['Neighborhood'].loc[table['Neighborhood'] == 'Not assigned'] = table['Borough']

#group same postal codes on one line, separating neighborhoods by commas
table_group = table.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

#view table
table_group

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [9]:
#print number of rows in the dataframe
table_group.shape

(103, 3)

There you have it!

<h1 align=center><font size = 1>Part 2 - Pulling in Location Coordinates</font></h1>

Now we are going to pull in the latitude and longitude for each neighborhood using the Google Maps Geocoding csv file. 

In [77]:
#import pandas library
import pandas as pd

#download csv file
geo_data = pd.read_csv("https://cocl.us/Geospatial_data")
geo_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Now that we have the location data available, pull in the latitude/longitude values for each postal code.

In [11]:
#combine our postal code table with the lat/long values from the csv file
geo_table_join = pd.merge(table_group, geo_data, left_on='PostalCode', right_on='Postal Code', how = 'left')
geo_table_join

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,M1J,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",M1K,43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",M1L,43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",M1M,43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",M1N,43.692657,-79.264848


In [86]:
#remove the extraneous column of postal codes
geo_table = geo_table_join.drop(['Postal Code'], axis=1)
geo_table

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


All done!

<h1 align=center><font size = 1>Part 3 - Analysis</font></h1>

Now we are going to explore and cluster the neighborhoods in Toronto. 

In [13]:
#install packagaes and import libraries to be used for analysis

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Now we will use the geopy library to get the latitude and longitude values of Toronto, Canada.

In [46]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto, Canada are 43.653963, -79.387207.


Now let's create a map of Toronto.

In [47]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(geo_table['Latitude'], geo_table['Longitude'], geo_table['Borough'], geo_table['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='pink',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Let's use the Foursquare API to explore and segment the neighborhoods.

In [48]:
#define foursquare credentials
CLIENT_ID = 'EKVRVXSOVRUUFWGADFRCZXIVODSLLSYJQ5I0Y0TGRLWKSJ4H' 
CLIENT_SECRET = 'R2A1CTMQGZXNYHDXAM30J2HEVLZRN4F35AFYDPGZCZWU3UP2' 
VERSION = '20180605' 

What are the top 5 venues in Scarborough within a radius of 1000 meters?

In [49]:
#name latitude and longitude for Scarborough and name the neighborhood
scarborough_latitude = geo_table.loc[0, 'Latitude'] 
scarborough_longitude = geo_table.loc[0, 'Longitude']
scarborough_name = geo_table.loc[0, 'Borough']

print('Latitude and longitude values of {} are {}, {}.'.format(scarborough_name, 
                                                               scarborough_latitude, 
                                                               scarborough_longitude))

Latitude and longitude values of Scarborough are 43.806686299999996, -79.19435340000001.


In [50]:
#limit to 5 venues and 1000 meters
limit = 5
radius = 1000

#create url and get results
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    scarborough_latitude, 
    scarborough_longitude, 
    radius, 
    limit)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cb14e101ed2196d933d4531'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 15,
  'suggestedBounds': {'ne': {'lat': 43.81568630900001,
    'lng': -79.18190576146081},
   'sw': {'lat': 43.797686290999984, 'lng': -79.20680103853921}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d669cba83865481c948fa53',
       'name': 'Images Salon & Spa',
       'location': {'address': '8130 Sheppard Ave E',
        'crossStreet': 'Morningside Ave',
        'lat': 43.80228301948931,
        'lng': -79.19856472801668,
        'labeledLatLngs'

Extract the category of the venue. Then clean the json file and create a pandas dataframe.

In [51]:
#function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#name results and normalize json file
venues = results['response']['groups'][0]['items']
from pandas.io.json import json_normalize 
nearby_venues = json_normalize(venues) 

#filter columns and category for each row
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Images Salon & Spa,Spa,43.802283,-79.198565
1,Caribbean Wave,Caribbean Restaurant,43.798558,-79.195777
2,Harvey's,Fast Food Restaurant,43.800106,-79.198258
3,Staples Morningside,Paper / Office Supplies Store,43.800285,-79.196607
4,Wendy's,Fast Food Restaurant,43.802008,-79.19808


In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [63]:
#list nearby venues
toronto_venues = getNearbyVenues(names=geo_table['Neighborhood'],
                                   latitudes=geo_table['Latitude'],
                                   longitudes=geo_table['Longitude'])

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [64]:
#view venues dataframe
toronto_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,"Rouge, Malvern",43.806686,-79.194353,Caribbean Wave,43.798558,-79.195777,Caribbean Restaurant
2,"Rouge, Malvern",43.806686,-79.194353,Harvey's,43.800106,-79.198258,Fast Food Restaurant
3,"Rouge, Malvern",43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store
4,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
5,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Shamrock Burgers,43.783823,-79.168406,Burger Joint
6,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Fratelli Village Pizzeria,43.784008,-79.169787,Italian Restaurant
7,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Ted's Restaurant,43.784468,-79.1692,Breakfast Spot
8,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Colonel Danforth Park,43.777507,-79.164303,Playground
9,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Centennial Park,43.786257,-79.148776,Park


Analyze the distribution of venue types within each neighborhood.

In [87]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
grouped.head(10)

Unnamed: 0,Neighborhood,Wings Joint,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Beer Bar,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Café,Caribbean Restaurant,Chinese Restaurant,Churrascaria,Clothing Store,Coffee Shop,College Rec Center,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Liquor Store,Lounge,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Museum,Noodle House,Organic Grocery,Paper / Office Supplies Store,Park,Pastry Shop,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shopping Mall,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theme Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Top 5 most common venues in each neighborhood?


In [88]:
top_venues = 5

for hood in grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = grouped[grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(top_venues))
    print('\n')

----Adelaide, King, Richmond----
          venue  freq
0  Concert Hall   0.2
1         Plaza   0.2
2    Steakhouse   0.2
3         Hotel   0.2
4     Speakeasy   0.2


----Agincourt----
                   venue  freq
0   Caribbean Restaurant   0.4
1     Chinese Restaurant   0.2
2         Breakfast Spot   0.2
3  Sri Lankan Restaurant   0.2
4           Liquor Store   0.0


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                           venue  freq
0             Chinese Restaurant   0.2
1           Caribbean Restaurant   0.2
2  Vegetarian / Vegan Restaurant   0.2
3                    Event Space   0.2
4                   Noodle House   0.2


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                 venue  freq
0        Grocery Store   0.4
1  Fried Chicken Joint   0.2
2       Sandwich Place   0.2
3             Pharmacy   0.2
4       Hardware Store   0.0


----Alderwood, Long Branch----
 

Looks like there are a lot of restaurants on the top 5 venue lists.

Let's combine the above into one big dataframe of all the top 5 venues by neighborhood.

In [89]:
#sort venues in descending order
def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_venues]

In [90]:
#identify indicators for column names
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = grouped['Neighborhood']

for ind in np.arange(grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Concert Hall,Speakeasy,Hotel,Plaza,Steakhouse
1,Agincourt,Caribbean Restaurant,Breakfast Spot,Sri Lankan Restaurant,Chinese Restaurant,Warehouse Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Event Space,Vegetarian / Vegan Restaurant,Noodle House,Chinese Restaurant,Caribbean Restaurant
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pharmacy,Fried Chicken Joint,Sandwich Place,Dessert Shop
4,"Alderwood, Long Branch",Grocery Store,Gym,Pizza Place,Discount Store,Pub
5,"Bathurst Manor, Downsview North, Wilson Heights",Park,Restaurant,Coffee Shop,Deli / Bodega,Bridal Shop
6,Bayview Village,Café,Fast Food Restaurant,Bank,Chinese Restaurant,Japanese Restaurant
7,"Bedford Park, Lawrence Manor East",Café,Bagel Shop,Restaurant,Hardware Store,Italian Restaurant
8,Berczy Park,Concert Hall,Liquor Store,Farmers Market,Museum,Steakhouse
9,"Birch Cliff, Cliffside West",General Entertainment,Park,Gym,Thai Restaurant,Café


From the first few, it looks like grocery store, cafes, and restaurants are among the most frequent top venues.

Now let's cluster the neighborhoods into 5 clusters by running k-means.

In [91]:
# set number of clusters
kclusters = 5

grouped_clustering = grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe

kmeans.labels_[0:10] 

array([0, 4, 4, 0, 0, 3, 2, 2, 0, 2], dtype=int32)

In [92]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

merged = geo_table

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
merged = merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
merged['Cluster Labels'] = merged['Cluster Labels'].fillna(5).astype("int")

merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,4,Fast Food Restaurant,Paper / Office Supplies Store,Caribbean Restaurant,Spa,Warehouse Store
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,3,Park,Breakfast Spot,Playground,Burger Joint,Italian Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,Pizza Place,Fried Chicken Joint,Liquor Store,Food & Drink Shop,Warehouse Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1,Coffee Shop,Pharmacy,Chinese Restaurant,Park,Furniture / Home Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,4,Caribbean Restaurant,Indian Restaurant,Coffee Shop,Hakka Restaurant,Burger Joint


In [94]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['Latitude'], merged['Longitude'], merged['Neighborhood'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examining each cluster.

In [38]:
merged.loc[merged['Cluster Labels'] == 0, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,ClusterLabels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Scarborough,0.0,0.0,0.0,Pizza Place,Fried Chicken Joint,Liquor Store,Food & Drink Shop,Warehouse Store
11,Scarborough,0.0,0.0,0.0,Vietnamese Restaurant,Breakfast Spot,Korean Restaurant,Supermarket,Fish Market
15,Scarborough,0.0,0.0,0.0,Chinese Restaurant,Sandwich Place,Hotpot Restaurant,Bakery,Warehouse Store
18,North York,0.0,0.0,0.0,Pharmacy,Burger Joint,Shopping Mall,Electronics Store,Toy / Game Store
22,North York,0.0,0.0,0.0,Seafood Restaurant,Grocery Store,Hotel,Steakhouse,Ramen Restaurant
23,North York,0.0,0.0,0.0,Grocery Store,Dog Run,French Restaurant,Restaurant,Bank
27,North York,0.0,0.0,0.0,Discount Store,Gym,Japanese Restaurant,History Museum,Italian Restaurant
30,North York,0.0,0.0,0.0,Turkish Restaurant,Airport,Latin American Restaurant,Liquor Store,Warehouse Store
31,North York,0.0,0.0,0.0,Bank,Vietnamese Restaurant,Grocery Store,Pizza Place,Shopping Mall
32,North York,0.0,0.0,0.0,Vietnamese Restaurant,Baseball Field,Restaurant,Farmers Market,Event Space


The first cluster seems very varied.

In [39]:
merged.loc[merged['Cluster Labels'] == 1, merged.columns[[1] + list(range(5, merged.shape[1]))]]


Unnamed: 0,Borough,Cluster_Labels,ClusterLabels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Scarborough,1.0,1.0,1.0,Coffee Shop,Pharmacy,Chinese Restaurant,Park,Furniture / Home Store
5,Scarborough,1.0,1.0,1.0,Fast Food Restaurant,Coffee Shop,Pizza Place,Sandwich Place,Warehouse Store
6,Scarborough,1.0,1.0,1.0,Coffee Shop,Discount Store,Chinese Restaurant,Department Store,Warehouse Store
7,Scarborough,1.0,1.0,1.0,Coffee Shop,Beer Store,Bank,Fast Food Restaurant,Dog Run
10,Scarborough,1.0,1.0,1.0,Indian Restaurant,Coffee Shop,Asian Restaurant,Chinese Restaurant,Burger Joint
17,North York,1.0,1.0,1.0,Park,Grocery Store,Korean Restaurant,Coffee Shop,Bakery
24,North York,1.0,1.0,1.0,Pharmacy,Grocery Store,Pizza Place,Coffee Shop,Dessert Shop
29,North York,1.0,1.0,1.0,Coffee Shop,Middle Eastern Restaurant,Massage Studio,Pizza Place,Eastern European Restaurant
34,North York,1.0,1.0,1.0,Coffee Shop,Intersection,Hockey Arena,Portuguese Restaurant,Empanada Restaurant
43,East Toronto,1.0,1.0,1.0,Bookstore,Coffee Shop,Sandwich Place,Ice Cream Shop,Fish Market


One of the top venues in the second cluster is the Coffee Shop.

In [40]:
merged.loc[merged['Cluster Labels'] == 2, merged.columns[[1] + list(range(5, merged.shape[1]))]]


Unnamed: 0,Borough,Cluster_Labels,ClusterLabels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Scarborough,2.0,2.0,2.0,General Entertainment,Park,Gym,Thai Restaurant,Café
19,North York,2.0,2.0,2.0,Café,Fast Food Restaurant,Bank,Chinese Restaurant,Japanese Restaurant
21,North York,2.0,2.0,2.0,Café,Grocery Store,Korean Restaurant,Asian Restaurant,Hookah Bar
25,North York,2.0,2.0,2.0,Park,Fast Food Restaurant,Caribbean Restaurant,Café,Supermarket
26,North York,2.0,2.0,2.0,Café,Gym / Fitness Center,Supermarket,Caribbean Restaurant,Japanese Restaurant
45,Central Toronto,2.0,2.0,2.0,Café,Park,Churrascaria,Food & Drink Shop,Supermarket
54,Downtown Toronto,2.0,2.0,2.0,Burrito Place,Comic Shop,Pizza Place,Clothing Store,Café
60,Downtown Toronto,2.0,2.0,2.0,Café,Gym,Hotel,Restaurant,Diner
61,Downtown Toronto,2.0,2.0,2.0,Café,Gym,Museum,Restaurant,Dessert Shop
62,North York,2.0,2.0,2.0,Café,Bagel Shop,Restaurant,Hardware Store,Italian Restaurant


A lot of Cafe in the third cluster.

In [41]:
merged.loc[merged['Cluster Labels'] == 3, merged.columns[[1] + list(range(5, merged.shape[1]))]]


Unnamed: 0,Borough,Cluster_Labels,ClusterLabels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Scarborough,3.0,3.0,3.0,Park,Breakfast Spot,Playground,Burger Joint,Italian Restaurant
8,Scarborough,3.0,3.0,3.0,Beach,Park,Fast Food Restaurant,Furniture / Home Store,Warehouse Store
20,North York,3.0,3.0,3.0,Park,Pool,Dessert Shop,Empanada Restaurant,Electronics Store
28,North York,3.0,3.0,3.0,Park,Restaurant,Coffee Shop,Deli / Bodega,Bridal Shop
36,East York,3.0,3.0,3.0,Pastry Shop,Skating Rink,Farmers Market,Sandwich Place,Park
44,Central Toronto,3.0,3.0,3.0,Park,Coffee Shop,Gym / Fitness Center,Trail,Bookstore
48,Central Toronto,3.0,3.0,3.0,Park,Bagel Shop,Grocery Store,Café,Empanada Restaurant
59,Downtown Toronto,3.0,3.0,3.0,Park,Sporting Goods Shop,Salad Place,Lake,Empanada Restaurant
81,York,3.0,3.0,3.0,Brewery,Park,Burger Joint,Athletics & Sports,Discount Store
82,West Toronto,3.0,3.0,3.0,Gastropub,Bar,Seafood Restaurant,Italian Restaurant,Park


Park is the most frequent top venue in the fourth cluster.

In [42]:
merged.loc[merged['Cluster Labels'] == 4, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,ClusterLabels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Scarborough,4.0,4.0,4.0,Fast Food Restaurant,Paper / Office Supplies Store,Caribbean Restaurant,Spa,Warehouse Store
4,Scarborough,4.0,4.0,4.0,Caribbean Restaurant,Indian Restaurant,Coffee Shop,Hakka Restaurant,Burger Joint
12,Scarborough,4.0,4.0,4.0,Caribbean Restaurant,Breakfast Spot,Sri Lankan Restaurant,Chinese Restaurant,Warehouse Store
13,Scarborough,4.0,4.0,4.0,Pharmacy,Noodle House,Italian Restaurant,Chinese Restaurant,Caribbean Restaurant
14,Scarborough,4.0,4.0,4.0,Event Space,Vegetarian / Vegan Restaurant,Noodle House,Chinese Restaurant,Caribbean Restaurant


The fifth cluster is rather small, but the Carribbean Restaurant and Chinese Restaurant are top venues.