Coursera IBM Capstone
==
## Cluster Toronto Neighborhoods by Ted Hartnell
In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

Your submission will be a link to your Jupyter Notebook on your Github repository.

## Step 1: Load Dependencies

In [1]:
import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import numpy as np

In [2]:
pip install wikipedia

Note: you may need to restart the kernel to use updated packages.


In [3]:
# Import the Python Wikipedia Crawl library
import wikipedia
wikipedia.summary("List of neighbourhoods in Toronto", sentences=2)

'The strength and vitality of the many neighbourhoods that make up Toronto, Ontario, Canada has earned the city its unofficial nickname of "the city of neighbourhoods." There are 140 neighbourhoods officially recognized by the City of Toronto  and upwards of 240 official and unofficial neighbourhoods within the city\'s boundaries. Before 1998, Toronto was a much smaller municipality and formed part of Metropolitan Toronto.'

In [4]:
# Import BeautifulSoup to parse the crawled HTML
try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

In [5]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [6]:
# Import the Geocoders Library to lookup Latitude and Longitude of Toronto Neighborhoods
import geopy.geocoders
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [7]:
# Instantiate the Geolocator with a long timeout (in seconds) to lookup latitude and longitude values
geolocator = Nominatim(user_agent="exploreToronto", timeout=4)

In [8]:
# Import library to handle REST requests
import requests

In [9]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


In [10]:
# Import Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering stage
from sklearn.cluster import KMeans

# Import Map Rendering Library
import folium

In [11]:
# Have the user manually enter the Foursquare Credentials so they are not stored in the raw Jupyter notebook
# Clear Cell Outputs after running!!!
FOURSQUARE_CLIENT_ID = input("[1/3] Enter your FourSquare Client ID:")
FOURSQUARE_CLIENT_SECRET = input("[2/3] Enter your FourSquare Client Secret:")
FOURSQUARE_VERSION = input("[3/3] Enter your FourSquare Version Number:")

## Step 2: Crawl Wikipedia Toronto Neighborhoods

In [12]:
# Get the link to the 'List of Neighborhoods in Toronto' Wikipedia Page and test it
wikipediaNeighborhoods = wikipedia.page("List_of_neighbourhoods_in_Toronto")
print( wikipediaNeighborhoods.title )
print( wikipediaNeighborhoods.url )

List of neighbourhoods in Toronto
https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto


In [13]:
# Get the HTML content from the 'List of Neighborhoods in Toronto' Wikipedia Page as a String
htmlNeighborhoods = wikipediaNeighborhoods.html()

# Check the HTML content was collected
print( htmlNeighborhoods[0:100] )

<div class="mw-parser-output"><div class="thumb tright"><div class="thumbinner" style="width:252px;"


In [14]:
# Parse the HTML so it can be searched by BeautifulSoup
parsedNeighborhoods = BeautifulSoup(htmlNeighborhoods)

In [15]:
# Find the list of Neighborhoods at the bottom of the 'List of Neighborhoods in Toronto' Wikipedia Page
find_allNeighborhoods = parsedNeighborhoods.find('table', class_='nowraplinks').find_all('li', class_='')
print('Found {} Neighborhoods in the HTML'.format( len(find_allNeighborhoods) ))

# Check that the first 5 neighborhoods contains useful data
find_allNeighborhoods[:5]

Found 201 Neighborhoods in the HTML


[<li><a href="/wiki/Alexandra_Park,_Toronto" title="Alexandra Park, Toronto">Alexandra Park</a></li>,
 <li><a href="/wiki/Allenby,_Toronto" title="Allenby, Toronto">Allenby</a></li>,
 <li><a href="/wiki/The_Annex" title="The Annex">The Annex</a></li>,
 <li><a href="/wiki/The_Beaches" title="The Beaches">The Beaches</a></li>,
 <li><a href="/wiki/Bedford_Park,_Toronto" title="Bedford Park, Toronto">Bedford Park</a></li>]

In [16]:
# Define the dataframe columns for the Neighborhoods
columnsNeighborhoods = ['Neighborhood', 'URL', 'Title', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns=columnsNeighborhoods)

In [61]:
# Try reading in the neighborhoods from earlier - necessary as it takes a very long time to parse neighborhood data
neighborhoods = pd.read_csv("Toronto Neighborhoods 002 Tab-Delineated.csv", sep='\t')
neighborhoods.drop(neighborhoods.columns[0], axis=1, inplace=True) # Drop the ID column

In [52]:
# Convert crawled neighborhood HTML data into structured DataFrames
# Note that this routine can be run several times to retry latitude/longitude lookups as they very often fail

# Limit the number of neighborhoods that can be added each time this cell is run
MAX_NEIGHBORHOODS = 210
countNeighborhoods = 0

for tag in find_allNeighborhoods:
    
    countNeighborhoods += 1
    if countNeighborhoods > MAX_NEIGHBORHOODS:
        break
    
    nameNeighborhood = tag.a.get_text().strip()
    nameNeighborhood = nameNeighborhood.replace(", Toronto","").replace(" (Toronto)","").replace(" (neighbourhood)","").replace(" (page does not exist)","")

    try:
        urlNeighborhood = tag.a['href'].strip()
    except:
        urlNeighborhood = ''

    try:
        titleNeighborhood = tag.a['title'].strip()
        titleNeighborhood = titleNeighborhood.replace(", Toronto","").replace(" (Toronto)","").replace(" (neighbourhood)","").replace(" (page does not exist)","")
        titleNeighborhood = titleNeighborhood + ', Toronto' # Use the Neighborhood Title to find the latitude and longitude
    except:
        titleNeighborhood = ''

    # Check that the Name being added to the DataFrame is unique
    if nameNeighborhood in neighborhoods["Neighborhood"].tolist():
        print('The Neighborhood of {} has already been loaded into the DataFrame'.format(nameNeighborhood))
        continue
    
    latitudeNeighborhood = ''
    longitudeNeighborhood = ''

    if True:
        try:
            locationNeighborhood = geolocator.geocode(titleNeighborhood)
            latitudeNeighborhood = locationNeighborhood.latitude
            longitudeNeighborhood = locationNeighborhood.longitude
            print('The geograpical coordinate of {} is {}, {}.'.format(titleNeighborhood, latitudeNeighborhood, longitudeNeighborhood))
        except:
            print('The geograpical coordinates of {} failed'.format(titleNeighborhood))
            latitudeNeighborhood = ''
            longitudeNeighborhood = ''
            continue

    print('Adding Neighborhood {}'.format(titleNeighborhood))
    neighborhoods = neighborhoods.append({'Neighborhood': nameNeighborhood,
                                          'URL': urlNeighborhood,
                                          'Title': titleNeighborhood,
                                          'Latitude': latitudeNeighborhood,
                                          'Longitude': longitudeNeighborhood}, ignore_index=True)

# Note that not all of the locations of the Neighborhood names from the Wikipedia page can be found by the geolocator
print('Finished adding Neighborhoods {} of {}'.format(len(neighborhoods), len(find_allNeighborhoods), ))

The Neighborhood of Alexandra Park has already been loaded into the DataFrame
The Neighborhood of Allenby has already been loaded into the DataFrame
The Neighborhood of The Annex has already been loaded into the DataFrame
The Neighborhood of The Beaches has already been loaded into the DataFrame
The Neighborhood of Bedford Park has already been loaded into the DataFrame
The Neighborhood of Bickford Park has already been loaded into the DataFrame
The Neighborhood of Bloor West Village has already been loaded into the DataFrame
The Neighborhood of Bloor Street Culture Corridor has already been loaded into the DataFrame
The Neighborhood of Bloordale Village has already been loaded into the DataFrame
The Neighborhood of Bracondale Hill has already been loaded into the DataFrame
The Neighborhood of Brockton Village has already been loaded into the DataFrame
The Neighborhood of Cabbagetown has already been loaded into the DataFrame
The Neighborhood of Carleton Village has already been loaded

The geograpical coordinates of Humbermede, Toronto failed
The Neighborhood of Jane and Finch has already been loaded into the DataFrame
The Neighborhood of Lansing has already been loaded into the DataFrame
The Neighborhood of Lawrence Heights has already been loaded into the DataFrame
The Neighborhood of Lawrence Manor has already been loaded into the DataFrame
The Neighborhood of Maple Leaf has already been loaded into the DataFrame
The Neighborhood of Newtonbrook has already been loaded into the DataFrame
The Neighborhood of North York City Centre has already been loaded into the DataFrame
The Neighborhood of Parkway Forest has already been loaded into the DataFrame
The Neighborhood of Parkwoods has already been loaded into the DataFrame
The Neighborhood of Pelmo Park – Humberlea has already been loaded into the DataFrame
The Neighborhood of Pleasant View has already been loaded into the DataFrame
The Neighborhood of Teddington Park has already been loaded into the DataFrame
The Nei

In [62]:
# Check that the neighborhoods have been loaded correctly
neighborhoods.head()

Unnamed: 0,Neighborhood,URL,Title,Latitude,Longitude
0,The Beaches,/wiki/The_Beaches,"The Beaches, Toronto",43.671024,-79.296712
1,Alexandra Park,"/wiki/Alexandra_Park,_Toronto","Alexandra Park, Toronto",43.650758,-79.404298
2,Allenby,"/wiki/Allenby,_Toronto","Allenby, Toronto",43.711351,-79.553424
3,The Annex,/wiki/The_Annex,"The Annex, Toronto",43.670338,-79.407117
4,Bedford Park,"/wiki/Bedford_Park,_Toronto","Bedford Park, Toronto",43.737388,-79.410925


In [63]:
# Save the neighborhoods as a tab-delineated CSV file
neighborhoods.to_csv("Toronto Neighborhoods 003 Tab-Delineated.csv", sep='\t')

In [97]:
# Get the Location of the geographic center of Toronto to center the map
nameLocation = "Leaside, Canada" # The geographic center is not "Toronto, Canada"
locationToronto = geolocator.geocode(nameLocation)
print('The geograpical coordinate of the city center at {} is {}, {}.'.format(nameLocation, locationToronto.latitude, locationToronto.longitude))

The geograpical coordinate of the city center at Leaside, Canada is 43.7047983, -79.3680904.


In [22]:
# Create map of Toronto using {latitude, longitude} values and Neighborhood label
mapToronto = folium.Map(location=[locationToronto.latitude, locationToronto.longitude], zoom_start=11)

# Add markers to Map
for latitude, longitude, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(mapToronto)  
    
mapToronto

## Step 3: Explore Toronto Neighborhoods

In [23]:
# Create a function to collect nearby venues from all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):

    # Count the number of neighborhoods being explored
    countNeighborhoods = 0

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # Update the user on which neighborhood is being explored
        countNeighborhoods += 1
        print('{} of {}: Exploring venues nearby {}.'.format(countNeighborhoods, len(names), name))

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            FOURSQUARE_CLIENT_ID, 
            FOURSQUARE_CLIENT_SECRET, 
            FOURSQUARE_VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # Continue if Error with processing the GET request
        try:
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except: 
          pass        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
# Limit the number of neighborhoods that can be explored
MAX_NEIGHBORHOODS = 200

# Call Foursquare to explore nearby venues in each Neighborhood
venuesToronto = getNearbyVenues(names=neighborhoods.head(MAX_NEIGHBORHOODS)['Neighborhood'],
                                   latitudes=neighborhoods.head(MAX_NEIGHBORHOODS)['Latitude'],
                                   longitudes=neighborhoods.head(MAX_NEIGHBORHOODS)['Longitude']
                                  )

1 of 190: Exploring venues nearby The Beaches.
2 of 190: Exploring venues nearby Alexandra Park.
3 of 190: Exploring venues nearby Allenby.
4 of 190: Exploring venues nearby The Annex.
5 of 190: Exploring venues nearby Bedford Park.
6 of 190: Exploring venues nearby Bloor Street Culture Corridor.
7 of 190: Exploring venues nearby Bloordale Village.
8 of 190: Exploring venues nearby Bracondale Hill.
9 of 190: Exploring venues nearby Bickford Park.
10 of 190: Exploring venues nearby Bloor West Village.
11 of 190: Exploring venues nearby Cabbagetown.
12 of 190: Exploring venues nearby Carleton Village.
13 of 190: Exploring venues nearby Chaplin Estates.
14 of 190: Exploring venues nearby Christie Pits.
15 of 190: Exploring venues nearby Corktown.
16 of 190: Exploring venues nearby Brockton Village.
17 of 190: Exploring venues nearby Chinatown.
18 of 190: Exploring venues nearby Church and Wellesley.
19 of 190: Exploring venues nearby CityPlace.
20 of 190: Exploring venues nearby Casa Loma

From cffi callback <function _verify_callback at 0x00000203BC4227B8>:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\OpenSSL\SSL.py", line 306, in wrapper
    @wraps(callback)
KeyboardInterrupt


23 of 190: Exploring venues nearby Earlscourt.
24 of 190: Exploring venues nearby East Chinatown.
25 of 190: Exploring venues nearby East Toronto.
26 of 190: Exploring venues nearby Entertainment District.
27 of 190: Exploring venues nearby Financial District.
28 of 190: Exploring venues nearby Kensington Market.
29 of 190: Exploring venues nearby Lawrence Park.
30 of 190: Exploring venues nearby Discovery District.
31 of 190: Exploring venues nearby Davenport.
32 of 190: Exploring venues nearby East Danforth.
33 of 190: Exploring venues nearby Forest Hill.
34 of 190: Exploring venues nearby Harbord Village.
35 of 190: Exploring venues nearby Gerrard India Bazaar.
36 of 190: Exploring venues nearby Grange Park.
37 of 190: Exploring venues nearby Junction Triangle.
38 of 190: Exploring venues nearby Corso Italia.
39 of 190: Exploring venues nearby Distillery District.
40 of 190: Exploring venues nearby Dufferin Grove.
41 of 190: Exploring venues nearby Harbourfront.
42 of 190: Exploring

188 of 190: Exploring venues nearby Thorncliffe Park.
189 of 190: Exploring venues nearby West Hill.
190 of 190: Exploring venues nearby Liberty Village.


In [25]:
# Check how many venues were found in each neighborhood
venuesToronto.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,13,13,13,13,13,13
Alderwood,10,10,10,10,10,10
Alexandra Park,100,100,100,100,100,100
Allenby,10,10,10,10,10,10
Amesbury,5,5,5,5,5,5
Armadale,14,14,14,14,14,14
Armour Heights,4,4,4,4,4,4
Baby Point,4,4,4,4,4,4
Bathurst Manor,4,4,4,4,4,4
Bayview Village,12,12,12,12,12,12


In [26]:
# Check how many unique categories were found
print('There are {} uniques categories.'.format(len(venuesToronto['Venue Category'].unique())))

There are 335 uniques categories.


In [27]:
# Prepare to Cluster the data
onehotToronto = pd.get_dummies(venuesToronto[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
onehotToronto['Neighborhood'] = venuesToronto['Neighborhood'] 

# Move neighborhood column to the first column
onehotColumns = [onehotToronto.columns[-1]] + list(onehotToronto.columns[:-1])
onehotToronto = onehotToronto[onehotColumns]

# Check what the data looks like
print( onehotToronto.shape )
onehotToronto.head()

(5525, 335)


Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [28]:
# Group the Explored Venues by Neighborhood to see the percentage of each Venue Category

groupedToronto = onehotToronto.groupby('Neighborhood').mean().reset_index()
groupedToronto

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.076923,0.000000,0.0,0.000000,0.000000,0.000000,0.00
1,Alderwood,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
2,Alexandra Park,0.010000,0.00,0.000000,0.000000,0.0,0.010000,0.0,0.000000,0.0,...,0.040000,0.000000,0.000000,0.010000,0.000000,0.0,0.010000,0.000000,0.000000,0.00
3,Allenby,0.000000,0.00,0.000000,0.100000,0.1,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
4,Amesbury,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
5,Armadale,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.071429,0.00
6,Armour Heights,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
7,Baby Point,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
8,Bathurst Manor,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00
9,Bayview Village,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00


In [29]:
# Limit the number of neighborhoods that can be explored
MAX_NEIGHBORHOODS = 5

# Print each neighborhood along with the top 10 most common venues
COUNT_TOP_VENUES = 10

for hood in groupedToronto['Neighborhood'].head(MAX_NEIGHBORHOODS):
    print("----"+hood+"----")
    temp = groupedToronto[groupedToronto['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(COUNT_TOP_VENUES))
    print('\n')

----Agincourt----
                   venue  freq
0     Chinese Restaurant  0.23
1      Korean Restaurant  0.08
2          Shopping Mall  0.08
3            Coffee Shop  0.08
4  Vietnamese Restaurant  0.08
5    Rental Car Location  0.08
6             Restaurant  0.08
7   Cantonese Restaurant  0.08
8          Train Station  0.08
9       Asian Restaurant  0.08


----Alderwood----
            venue  freq
0     Pizza Place   0.2
1    Dance Studio   0.1
2    Skating Rink   0.1
3  Sandwich Place   0.1
4     Coffee Shop   0.1
5        Pharmacy   0.1
6             Gym   0.1
7            Pool   0.1
8             Pub   0.1
9       Nightclub   0.0


----Alexandra Park----
                           venue  freq
0                            Bar  0.10
1                           Café  0.04
2  Vegetarian / Vegan Restaurant  0.04
3         Furniture / Home Store  0.04
4             Italian Restaurant  0.04
5                     Restaurant  0.03
6                    Coffee Shop  0.03
7               Asia

In [30]:
# Sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [45]:
# Create a new DataFrame with the Top-10 Venues for each Neighborhood
COUNT_TOP_VENUES = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(COUNT_TOP_VENUES):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
sortedNeighborhoodVenues = pd.DataFrame(columns=columns)
sortedNeighborhoodVenues['Neighborhood'] = groupedToronto['Neighborhood']

for ind in np.arange(groupedToronto.shape[0]):
    sortedNeighborhoodVenues.iloc[ind, 1:] = return_most_common_venues(groupedToronto.iloc[ind, :], COUNT_TOP_VENUES)

sortedNeighborhoodVenues.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Train Station,Cantonese Restaurant,Korean Restaurant,Vietnamese Restaurant,Hong Kong Restaurant,Shopping Mall,Restaurant,Rental Car Location,Asian Restaurant
1,Alderwood,Pizza Place,Pool,Dance Studio,Pub,Gym,Sandwich Place,Skating Rink,Pharmacy,Coffee Shop,Farm
2,Alexandra Park,Bar,Café,Vegetarian / Vegan Restaurant,Italian Restaurant,Furniture / Home Store,Restaurant,Coffee Shop,Pizza Place,Asian Restaurant,Caribbean Restaurant
3,Allenby,Restaurant,Afghan Restaurant,African Restaurant,Fast Food Restaurant,Bookstore,Intersection,Fish & Chips Shop,Discount Store,Café,Big Box Store
4,Amesbury,Bank,Park,Intersection,Athletics & Sports,Coffee Shop,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm


## Step 4: Cluster Toronto Neighborhoods

In [86]:
# Run k-means to cluster the neighborhood into 6 clusters
COUNT_K_MEANS_CLUSTERS = 6

clusteredTorontoGroups = groupedToronto.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=COUNT_K_MEANS_CLUSTERS, random_state=0).fit(clusteredTorontoGroups)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:300]

array([5, 5, 0, 5, 0, 5, 5, 3, 1, 5, 5, 5, 5, 0, 0, 1, 0, 0, 0, 1, 5, 0,
       5, 0, 1, 0, 5, 0, 0, 0, 0, 5, 1, 5, 5, 0, 0, 1, 5, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 5, 0, 0, 5, 0, 0, 0, 0, 0, 1, 0, 0, 5, 0, 3,
       0, 5, 5, 0, 0, 1, 0, 1, 0, 0, 3, 1, 5, 5, 1, 5, 5, 0, 0, 3, 0, 0,
       5, 5, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 5, 5, 5, 1, 5, 0, 5, 5, 5,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 0, 0, 1, 0, 0, 5, 5, 0, 5, 3,
       5, 0, 3, 0, 0, 0, 0, 0, 5, 5, 5, 0, 1, 1, 0, 0, 5, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 1, 5, 0, 0, 0, 5, 0, 0, 5, 5, 5, 0, 5, 5, 0, 0, 0,
       0, 5, 5, 5, 0, 0, 0, 5, 0, 0, 4, 5, 0])

In [87]:
# Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

# Add clustering labels
labeledSortedNeighborhoodVenues = sortedNeighborhoodVenues.copy()

labeledSortedNeighborhoodVenues.insert(0, 'Cluster Labels', kmeans.labels_)

# Extend the original Neighborhood Data
mergedToronto = neighborhoods.copy()

# Merge the grouped data with the neighborhood data to add latitude/longitude for each neighborhood
mergedToronto = mergedToronto.join(labeledSortedNeighborhoodVenues.set_index('Neighborhood'), on='Neighborhood')

mergedToronto.head() # Check the merged columns

Unnamed: 0,Neighborhood,URL,Title,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,/wiki/The_Beaches,"The Beaches, Toronto",43.671024,-79.296712,0.0,Beach,Park,Coffee Shop,Japanese Restaurant,Breakfast Spot,Tea Room,Bar,Pizza Place,Thai Restaurant,Pub
1,Alexandra Park,"/wiki/Alexandra_Park,_Toronto","Alexandra Park, Toronto",43.650758,-79.404298,0.0,Bar,Café,Vegetarian / Vegan Restaurant,Italian Restaurant,Furniture / Home Store,Restaurant,Coffee Shop,Pizza Place,Asian Restaurant,Caribbean Restaurant
2,Allenby,"/wiki/Allenby,_Toronto","Allenby, Toronto",43.711351,-79.553424,5.0,Restaurant,Afghan Restaurant,African Restaurant,Fast Food Restaurant,Bookstore,Intersection,Fish & Chips Shop,Discount Store,Café,Big Box Store
3,The Annex,/wiki/The_Annex,"The Annex, Toronto",43.670338,-79.407117,0.0,Pizza Place,Coffee Shop,Grocery Store,Bistro,Thai Restaurant,Ice Cream Shop,Park,Indian Restaurant,Korean Restaurant,Bakery
4,Bedford Park,"/wiki/Bedford_Park,_Toronto","Bedford Park, Toronto",43.737388,-79.410925,5.0,Seafood Restaurant,Rental Car Location,Gym / Fitness Center,Women's Store,Fast Food Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


### Map of 6 Clusters

In [88]:
# Visualize the Clusters

# Create a map
map_clusters = folium.Map(location=[locationToronto.latitude, locationToronto.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(COUNT_K_MEANS_CLUSTERS)
ys = [i + x + (i*x)**2 for i in range(COUNT_K_MEANS_CLUSTERS)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mergedToronto['Latitude'], mergedToronto['Longitude'], mergedToronto['Neighborhood'], mergedToronto['Cluster Labels']):
    try:
        idCluster = int(cluster)
    except:
        idCluster = 0
    label = folium.Popup(str(poi) + ' Cluster ' + str(idCluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[idCluster-1],
        fill=True,
        fill_color=rainbow[idCluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Step 5: Examine Toronto Clusters

### Cluster 0

In [90]:
mergedToronto.loc[mergedToronto['Cluster Labels'] == 0, mergedToronto.columns[[1] + list(range(5, mergedToronto.shape[1]))]]

Unnamed: 0,URL,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,/wiki/The_Beaches,0.0,Beach,Park,Coffee Shop,Japanese Restaurant,Breakfast Spot,Tea Room,Bar,Pizza Place,Thai Restaurant,Pub
1,"/wiki/Alexandra_Park,_Toronto",0.0,Bar,Café,Vegetarian / Vegan Restaurant,Italian Restaurant,Furniture / Home Store,Restaurant,Coffee Shop,Pizza Place,Asian Restaurant,Caribbean Restaurant
3,/wiki/The_Annex,0.0,Pizza Place,Coffee Shop,Grocery Store,Bistro,Thai Restaurant,Ice Cream Shop,Park,Indian Restaurant,Korean Restaurant,Bakery
5,/wiki/Bloor_Street_Culture_Corridor,0.0,Coffee Shop,Boutique,Café,Restaurant,Clothing Store,French Restaurant,Women's Store,Spa,Cosmetics Shop,Italian Restaurant
6,/wiki/Bloordale_Village,0.0,Bar,Café,Cocktail Bar,Caribbean Restaurant,Sandwich Place,Bakery,Pub,Gift Shop,Portuguese Restaurant,Thrift / Vintage Store
8,/wiki/Palmerston%E2%80%93Little_Italy,0.0,Bar,Italian Restaurant,Café,Sandwich Place,Korean Restaurant,Pub,Art Gallery,Record Shop,Pizza Place,Asian Restaurant
9,/wiki/Bloor_West_Village,0.0,Coffee Shop,Café,Sushi Restaurant,Pizza Place,Fish & Chips Shop,Bookstore,Gourmet Shop,Falafel Restaurant,Bar,Boutique
10,"/wiki/Cabbagetown,_Toronto",0.0,Restaurant,Coffee Shop,Café,Diner,Bakery,Pub,Japanese Restaurant,Italian Restaurant,Beer Store,Gastropub
13,/wiki/Christie_Pits,0.0,Grocery Store,Café,Indian Restaurant,Ethiopian Restaurant,Park,Cocktail Bar,Italian Restaurant,Diner,Nightclub,Restaurant
14,"/wiki/Corktown,_Toronto",0.0,Coffee Shop,Park,Gym / Fitness Center,Pub,Restaurant,Pizza Place,Auto Dealership,Spa,Mediterranean Restaurant,Fast Food Restaurant


### Cluster 1

In [92]:
mergedToronto.loc[mergedToronto['Cluster Labels'] == 1, mergedToronto.columns[[1] + list(range(5, mergedToronto.shape[1]))]]

Unnamed: 0,URL,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,/wiki/Bracondale_Hill,1.0,Park,Bakery,Bar,Grocery Store,Coffee Shop,Art Gallery,Other Repair Shop,Flower Shop,Ethiopian Restaurant,Fountain
11,/wiki/Carleton_Village,1.0,Jewelry Store,Park,Bus Line,Dog Run,Fish & Chips Shop,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
32,"/wiki/Forest_Hill,_Toronto",1.0,Playground,Park,Mediterranean Restaurant,Bank,Women's Store,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm
60,"/wiki/Swansea,_Toronto",1.0,Park,Restaurant,Skating Rink,Social Club,Bus Line,Women's Store,Event Space,Exhibit,Falafel Restaurant,Farm
67,/wiki/Lawrence_Manor,1.0,Bank,Electronics Store,Park,Kids Store,Doctor's Office,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm
72,"/wiki/Highland_Creek,_Toronto",1.0,Park,Pharmacy,Women's Store,Elementary School,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market
81,"/wiki/Old_Mill,_Toronto",1.0,Park,Spa,American Restaurant,Metro Station,Women's Store,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm
84,/wiki/Crescent_Town,1.0,Baseball Field,Park,Convenience Store,Metro Station,Golf Course,Women's Store,Filipino Restaurant,Exhibit,Falafel Restaurant,Farm
87,/wiki/Dovercourt_Park,1.0,Café,Park,Brazilian Restaurant,Coffee Shop,Bar,Food Court,Fountain,Exhibit,Falafel Restaurant,French Restaurant
102,/wiki/Henry_Farm,1.0,Park,Tennis Court,Women's Store,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


### Cluster 3

In [94]:
mergedToronto.loc[mergedToronto['Cluster Labels'] == 3, mergedToronto.columns[[1] + list(range(5, mergedToronto.shape[1]))]]

Unnamed: 0,URL,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,/wiki/Pelmo_Park_%E2%80%93_Humberlea,3.0,Park,Women's Store,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
78,/wiki/Baby_Point,3.0,Park,River,Elementary School,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
116,"/wiki/Port_Union,_Toronto",3.0,Park,Women's Store,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
178,/wiki/Humber_Heights_%E2%80%93_Westmount,3.0,Park,Women's Store,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
180,/wiki/Kingsview_Village,3.0,Park,Women's Store,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
186,"/wiki/Governor%27s_Bridge,_Toronto",3.0,Park,Trail,Women's Store,Elementary School,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market


### Cluster 5

In [96]:
mergedToronto.loc[mergedToronto['Cluster Labels'] == 5, mergedToronto.columns[[1] + list(range(5, mergedToronto.shape[1]))]]

Unnamed: 0,URL,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"/wiki/Allenby,_Toronto",5.0,Restaurant,Afghan Restaurant,African Restaurant,Fast Food Restaurant,Bookstore,Intersection,Fish & Chips Shop,Discount Store,Café,Big Box Store
4,"/wiki/Bedford_Park,_Toronto",5.0,Seafood Restaurant,Rental Car Location,Gym / Fitness Center,Women's Store,Fast Food Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant
12,/wiki/Chaplin_Estates,5.0,Restaurant,Italian Restaurant,Gym,Japanese Restaurant,French Restaurant,Breakfast Spot,Mexican Restaurant,Fast Food Restaurant,Sushi Restaurant,Gym / Fitness Center
30,"/wiki/Davenport,_Toronto",5.0,Bus Line,Dog Run,Playground,Convenience Store,Park,Music Venue,Bus Stop,Exhibit,Falafel Restaurant,Farm
31,/wiki/East_Danforth,5.0,Pharmacy,Coffee Shop,Bus Line,Skating Rink,Mexican Restaurant,Café,Baseball Field,Sushi Restaurant,Fried Chicken Joint,Metro Station
46,"/wiki/Parkdale,_Toronto",5.0,Tibetan Restaurant,Café,Restaurant,Bar,Pharmacy,Indian Restaurant,Diner,Chinese Restaurant,Sandwich Place,Liquor Store
50,"/wiki/The_Ward,_Toronto",5.0,Restaurant,Fish & Chips Shop,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
52,/wiki/Bayview_Village,5.0,Sporting Goods Shop,Persian Restaurant,Fish Market,Gas Station,Fast Food Restaurant,Bank,Sandwich Place,Hardware Store,Coffee Shop,Outdoor Supply Store
57,"/wiki/Riverdale,_Toronto",5.0,Vietnamese Restaurant,Chinese Restaurant,Bakery,Fast Food Restaurant,Light Rail Station,Trail,Breakfast Spot,Moving Target,Café,Fish Market
61,/wiki/Bayview_Woods_%E2%80%93_Steeles,5.0,Dog Run,Women's Store,Fish & Chips Shop,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant


## The End.