<font size="5"> **Segmenting and Clustering Neighbourhoods in Toronto**</font>

This notebook will explore, segment and cluster neighbourhoods in the city of Toronto

<font size="5"> **Table of Contents**</font>
<div id="toc"></div>

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# Part 1: Download and Transform Data into Pandas Dataframe

The neighbourhood data for the city of Toronto will be scraped from Wikipedia using the BeautifulSoup package

## 1.1 Installing & Importing Packages

### 1.1.1 Description: BeautifulSoup
Beautiful Soup is a library for pulling data out of HTML and XML files. It provides ways of navigating, searching, and modifying parse trees.

In [2]:
# Install BeautifulSoup
!conda install -c anaconda beautifulsoup4 --yes # --yes pass the yes flag to the installation package

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/antoinekhalife/opt/anaconda3

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.8.2                |           py37_0         3.0 MB  anaconda
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following packages will be SUPERSEDED by a higher-priority channel:

  conda                                         conda-forge --> anaconda



Downloading and Extracting Packages
conda-4.8.2          | 3.0 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [3]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

### 1.1.2 Description: LXML
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping.

In [4]:
# Install lxml
!conda install -c anaconda lxml --yes # --yes pass the yes flag to the installation package

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



### 1.1.3 Description: Requests 
Requests is the only Non-GMO HTTP library for Python

In [5]:
# Install Requests
!conda install -c anaconda requests --yes # --yes pass the yes flag to the installation package

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [6]:
# Import Requests
import requests

## 1.2 Making a Soup 

### 1.2.1 Requests
A lot of sites have precautions in place to fend off scrapers from accessing their data. The first thing we can do to get around this is spoofing the headers we send along with our requests to make it look like we're a legitimate browser 

In [7]:
# Set headers
headers = requests.utils.default_headers()
headers.update({ 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'})

### 1.2.2 BeautifulSoup
To parse a document, we need to pass it the BeautifulSoup constructor. 

In [8]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" # url address we want to scrape
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'lxml')

### 1.2.3 Soup
Let's see what our variable soup looks like

In [9]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"XkB-KwpAAD0AAJUL5s8AAAAY","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":935851093,"wgRevisionId":935851093,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communi

## 1.3 Targeting HTML Elements 

### 1.3.1 HTML Element of Interest
The required information, Borough and Neighbourhoods, is located in the HTML code under "table" with the class "wikitable sortable"

In [10]:
# Let's define a variable that includes the desired tata
target_elements = soup.find('table',class_='wikitable sortable').text # find locates the right table and text converts the data away from html
print(target_elements)



Postcode
Borough
Neighbourhood


M1A
Not assigned
Not assigned


M2A
Not assigned
Not assigned


M3A
North York
Parkwoods


M4A
North York
Victoria Village


M5A
Downtown Toronto
Harbourfront


M6A
North York
Lawrence Heights


M6A
North York
Lawrence Manor


M7A
Downtown Toronto
Queen's Park


M8A
Not assigned
Not assigned


M9A
Queen's Park
Not assigned


M1B
Scarborough
Rouge


M1B
Scarborough
Malvern


M2B
Not assigned
Not assigned


M3B
North York
Don Mills North


M4B
East York
Woodbine Gardens


M4B
East York
Parkview Hill


M5B
Downtown Toronto
Ryerson


M5B
Downtown Toronto
Garden District


M6B
North York
Glencairn


M7B
Not assigned
Not assigned


M8B
Not assigned
Not assigned


M9B
Etobicoke
Cloverdale


M9B
Etobicoke
Islington


M9B
Etobicoke
Martin Grove


M9B
Etobicoke
Princess Gardens


M9B
Etobicoke
West Deane Park


M1C
Scarborough
Highland Creek


M1C
Scarborough
Rouge Hill


M1C
Scarborough
Port Union


M2C
Not assigned
Not assigned


M3C
North York
Flemingdon Par

### 1.3.2 Convert HTML Element of Interest into a List
We will convert the HTML code of interest into a list for data processing purposes

In [11]:
target_elements_split = target_elements.split('\n')
print(target_elements_split)

['', '', 'Postcode', 'Borough', 'Neighbourhood', '', '', 'M1A', 'Not assigned', 'Not assigned', '', '', 'M2A', 'Not assigned', 'Not assigned', '', '', 'M3A', 'North York', 'Parkwoods', '', '', 'M4A', 'North York', 'Victoria Village', '', '', 'M5A', 'Downtown Toronto', 'Harbourfront', '', '', 'M6A', 'North York', 'Lawrence Heights', '', '', 'M6A', 'North York', 'Lawrence Manor', '', '', 'M7A', 'Downtown Toronto', "Queen's Park", '', '', 'M8A', 'Not assigned', 'Not assigned', '', '', 'M9A', "Queen's Park", 'Not assigned', '', '', 'M1B', 'Scarborough', 'Rouge', '', '', 'M1B', 'Scarborough', 'Malvern', '', '', 'M2B', 'Not assigned', 'Not assigned', '', '', 'M3B', 'North York', 'Don Mills North', '', '', 'M4B', 'East York', 'Woodbine Gardens', '', '', 'M4B', 'East York', 'Parkview Hill', '', '', 'M5B', 'Downtown Toronto', 'Ryerson', '', '', 'M5B', 'Downtown Toronto', 'Garden District', '', '', 'M6B', 'North York', 'Glencairn', '', '', 'M7B', 'Not assigned', 'Not assigned', '', '', 'M8B', 'N

### 1.3.3 Convert List into List of Lists
Each observation needs to have Postcode, Borough and Neibourghood. So, we will created nested lists which capture each of these features. This will then be used to transform the data into a dataframe

In [12]:
parent_list = [] # parent_list will have all observations
nested_list = [] # nested_list will be used to capture one observation at a time
counter = 0
flag = 0

for i in range(0,len(target_elements_split)):
    if target_elements_split[i] != "" and counter < 4:
        nested_list.append(target_elements_split[i])
        counter = counter + 1
    if counter == 3:
        parent_list.append(nested_list)
        nested_list = []
        counter = 0

parent_list       

[['Postcode', 'Borough', 'Neighbourhood'],
 ['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M8A', 'Not assigned', 'Not assigned'],
 ['M9A', "Queen's Park", 'Not assigned'],
 ['M1B', 'Scarborough', 'Rouge'],
 ['M1B', 'Scarborough', 'Malvern'],
 ['M2B', 'Not assigned', 'Not assigned'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens'],
 ['M4B', 'East York', 'Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson'],
 ['M5B', 'Downtown Toronto', 'Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M7B', 'Not assigned', 'Not assigned'],
 ['M8B', 'Not assigned', 'Not assigned'],
 ['M9B', 'Etobicoke', 'Cloverdale'],
 ['M9B', 'Etobicoke', 'Islington'],
 ['M9B',

## 1.4 Data Cleansing 

### 1.4.1 Remove Data Header
Currently, the elements of the first nested list is: Postcode, Borough, Neighbourhood. We will removed that and create dataframe column headers at a later stage

In [13]:
parent_list.pop(0)

['Postcode', 'Borough', 'Neighbourhood']

In [14]:
parent_list

[['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M8A', 'Not assigned', 'Not assigned'],
 ['M9A', "Queen's Park", 'Not assigned'],
 ['M1B', 'Scarborough', 'Rouge'],
 ['M1B', 'Scarborough', 'Malvern'],
 ['M2B', 'Not assigned', 'Not assigned'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens'],
 ['M4B', 'East York', 'Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson'],
 ['M5B', 'Downtown Toronto', 'Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M7B', 'Not assigned', 'Not assigned'],
 ['M8B', 'Not assigned', 'Not assigned'],
 ['M9B', 'Etobicoke', 'Cloverdale'],
 ['M9B', 'Etobicoke', 'Islington'],
 ['M9B', 'Etobicoke', 'Martin Grove'],
 ['M9B', 'Et

### 1.4.2 Remove Elements with no Borough
All Borough's which have a "Not assigned" element will be removed from the dataset 

In [15]:
parent_list_01 = [] # list will not include Borough's where the element is "Not assigned"

for i in range(0,len(parent_list)):
    if parent_list[i][1] != "Not assigned":
        parent_list_01.append(parent_list[i])
        
parent_list_01 

[['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M9A', "Queen's Park", 'Not assigned'],
 ['M1B', 'Scarborough', 'Rouge'],
 ['M1B', 'Scarborough', 'Malvern'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens'],
 ['M4B', 'East York', 'Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson'],
 ['M5B', 'Downtown Toronto', 'Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M9B', 'Etobicoke', 'Cloverdale'],
 ['M9B', 'Etobicoke', 'Islington'],
 ['M9B', 'Etobicoke', 'Martin Grove'],
 ['M9B', 'Etobicoke', 'Princess Gardens'],
 ['M9B', 'Etobicoke', 'West Deane Park'],
 ['M1C', 'Scarborough', 'Highland Creek'],
 ['M1C', 'Scarborough', 'Rouge Hill'],
 ['M1C', 'Scarborough', 'Port Union'],
 ['M3C', 'North York', 'Flemingdon Park'],
 ['M3C', 'North

### 1.4.3 Neighbourhood "Not assigned"
If a cell has a Borough but a "Not assigned" Neigbourhood, then the Neighbourhood will be set to be the same as the Borough

In [16]:
parent_list_02 = [] # list where all Neighbourhood's are assigned an area
nested_list_02 = [] # Temporarily holds a nested list

for i in range(0,len(parent_list_01)):
    if parent_list_01[i][2] == "Not assigned":
        nested_list_02 = [parent_list_01[i][0],parent_list[i][1],parent_list[i][2]]
        parent_list_02.append(nested_list_02)
        nested_list_02 = []
    else:
        parent_list_02.append(parent_list_01[i])    
        
parent_list_02

[['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M9A', 'North York', 'Lawrence Manor'],
 ['M1B', 'Scarborough', 'Rouge'],
 ['M1B', 'Scarborough', 'Malvern'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens'],
 ['M4B', 'East York', 'Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson'],
 ['M5B', 'Downtown Toronto', 'Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M9B', 'Etobicoke', 'Cloverdale'],
 ['M9B', 'Etobicoke', 'Islington'],
 ['M9B', 'Etobicoke', 'Martin Grove'],
 ['M9B', 'Etobicoke', 'Princess Gardens'],
 ['M9B', 'Etobicoke', 'West Deane Park'],
 ['M1C', 'Scarborough', 'Highland Creek'],
 ['M1C', 'Scarborough', 'Rouge Hill'],
 ['M1C', 'Scarborough', 'Port Union'],
 ['M3C', 'North York', 'Flemingdon Park'],
 ['M3C', 'North

### 1.4.4 Neighbourhoods with same Postcode
If two or more Neighbourhoods have the same Postcode, then those Neighbourhoods will be merged under one postcode and will be separated by a comma

In [17]:
parent_list_03 = [] # list of all Neighbourhoods merged if they share same postcode
neted_list_03 = [] # Temporarily holds a nested list

for i in range(0,len(parent_list_02)):
    if i+1 == len(parent_list_02):
        break
    else:
        if parent_list_02[i][0] == parent_list_02[i+1][0] :
            if parent_list_02[i][0] == parent_list_02[i-1][0] :
                nested_list_03 = [parent_list_03[len(parent_list_03)-1][0],parent_list_03[len(parent_list_03)-1][1],parent_list_03[len(parent_list_03)-1][2]+','+parent_list_02[i+1][2]]
                parent_list_03[len(parent_list_03)-1] = nested_list_03
                nested_list_03 = []
            else:
                nested_list_03 = [parent_list_02[i][0],parent_list_02[i][1],parent_list_02[i][2]+','+parent_list_02[i+1][2]]
                parent_list_03.append(nested_list_03)
                nested_list_03 = []
        else:
            if parent_list_02[i][0] != parent_list_02[i-1][0]:
                parent_list_03.append(parent_list_02[i])

parent_list_03

[['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M6A', 'North York', 'Lawrence Heights,Lawrence Manor'],
 ['M7A', 'Downtown Toronto', "Queen's Park"],
 ['M9A', 'North York', 'Lawrence Manor'],
 ['M1B', 'Scarborough', 'Rouge,Malvern'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens,Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson,Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M9B',
  'Etobicoke',
  'Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park'],
 ['M1C', 'Scarborough', 'Highland Creek,Rouge Hill,Port Union'],
 ['M3C', 'North York', 'Flemingdon Park,Don Mills South'],
 ['M4C', 'East York', 'Woodbine Heights'],
 ['M5C', 'Downtown Toronto', 'St. James Town'],
 ['M6C', 'York', 'Humewood-Cedarvale'],
 ['M9C',
  'Etobicoke',
  'Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe'],
 ['M1E', 'Scarborough', 'Guildwood,Morningside,We

## 1.5 Transform Data into Dataframe
All the data is currently in a list format, in this section we will conver it into a dataframe

In [18]:
import pandas as pd

# define the dataframe columns
column_names = ['Postal Code','Borough', 'Neighborhood'] 

# instantiate the dataframe
toronto_FSAs = pd.DataFrame(parent_list_03,columns=column_names)
toronto_FSAs

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout..."


In [19]:
toronto_FSAs.shape

(103, 3)

# Part 2: Get Longitude and Latitude per Neighbourhood 
In this part of the notebook, the pandas dataframe will be extended to include the longitude and latitude for each Neighbourhood.
<br>Note: An alternative method using Geocoder was initially used but the package was unresponsive

### 2.1.1  Download Latitude and Longitude into a pandas Dataframe

In [20]:
geospatial_coordinates = pd.read_csv("https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv")

In [21]:
geospatial_coordinates

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


### 2.1.2 Add Latitude and Longitude to Neighbourhoods 

Match the Postal Code of toronto_FSAs Dataframe to geospatial_coordinates Dataframe with the aim of a creating an ordered list of Latitude and Longitude coordinates to be added into the toronto_FSAs Dataframe

In [22]:
Latitude_list = []
Longitude_list = []

for i in range(0, len(toronto_FSAs)):
    for j in range(0,len(geospatial_coordinates)):
        if toronto_FSAs["Postal Code"][i] == geospatial_coordinates["Postal Code"][j]:
            Latitude_list.append(geospatial_coordinates["Latitude"][j])
            Longitude_list.append(geospatial_coordinates["Longitude"][j])

Add Latitude and Longitude to each Neighbourhood in the toronto_FSAs Dataframe

In [23]:
toronto_FSAs.insert(3,"Latitude",Latitude_list)
toronto_FSAs.insert(4,"Longitude",Longitude_list)
toronto_FSAs

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout...",43.636258,-79.498509


# Part 3: Explore and Cluster Neighbourhoods in Toronto 

## 3.1 Create a Map of Toronto and of Downtown Toronto 

### 3.1.1 Create a Map of Toronto 

In [24]:
!conda install -c conda-forge folium --yes # folium is a map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: \ 
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::certifi-2019.9.11-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::certifi-2019.9.11-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2, defaults/osx-64::ca-certificates-2019.8.28-0
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2, defaults/osx-64::certifi-2019.9.11-py37_0
  - anaconda/osx-64::openssl-1.1.1d-h1de35cc_2, defaults/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::certifi-2019.9.11-py37_0
  - anaconda/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - defaults/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::certifi-2019.9.11-py37_0, defaults/

In [25]:
import folium

In [26]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[43.6532, -79.3832], zoom_start=10)

lat = 0
lng = 0
borough =""
postcode = ""

# add markers to map
for i in range(0,len(toronto_FSAs)):
    # Step 1: Get geospatial coordinates for each Neighbourhood
    lat = toronto_FSAs["Latitude"][i]
    lng = toronto_FSAs["Longitude"][i]
    borough = toronto_FSAs["Borough"][i]
    postcode = toronto_FSAs["Postal Code"][i]
    # Step 2: Parse geospatial coordinates onto the map of Toronoto 
    label = '{}, {}'.format(borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

### 3.1.2 Create a Map of Downtown Toronto

The original dataframe will be amended to cover neighbourhoods in the area of Downtown Toronto

In [27]:
downtown_toronto = toronto_FSAs[toronto_FSAs['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [28]:
downtown_toronto.shape

(19, 5)

In [29]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[43.6548, -79.3883], zoom_start=12)

lat = 0
lng = 0
borough =""
postcode = ""

# add markers to map
for i in range(0,len(downtown_toronto)):
    # Step 1: Get geospatial coordinates for each Neighbourhood
    lat = downtown_toronto["Latitude"][i]
    lng = downtown_toronto["Longitude"][i]
    borough = downtown_toronto["Borough"][i]
    postcode = downtown_toronto["Postal Code"][i]
    # Step 2: Parse geospatial coordinates onto the map of Downtown Toronoto 
    label = '{}, {}'.format(borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)
    
map_downtown_toronto

## 3.2 Foursquare Set-up  

### 3.2.1 Define Foursquare Credentials and Version 

In [30]:
CLIENT_ID = '4EGYKOC2CGZBHUTI5VG0KUOEGWHJ4ES05WKRQHHDFHHC2JIW' # your Foursquare ID
CLIENT_SECRET = 'TGXXOPMD0ZXQPOAMQWRJCFC0C0ES1NZBQDENTDEZ5B4OLUDV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4EGYKOC2CGZBHUTI5VG0KUOEGWHJ4ES05WKRQHHDFHHC2JIW
CLIENT_SECRET:TGXXOPMD0ZXQPOAMQWRJCFC0C0ES1NZBQDENTDEZ5B4OLUDV


### 3.2.2 Explore first Neighbourhood in Downtown Toronto dataframe 

Get the neighbourhood's name

In [31]:
downtown_toronto.loc[0, 'Neighborhood']

'Harbourfront'

Get the neighbourhood latitude and longitude value

In [32]:
neighborhood_latitude = downtown_toronto.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = downtown_toronto.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = downtown_toronto.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Harbourfront are 43.6542599, -79.3606359.


Now, let's get the top 100 venues that are in Harbourfront with a radius of 500 meters

First, let's create the GET request URL. Name your URL **url**.

In [33]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=4EGYKOC2CGZBHUTI5VG0KUOEGWHJ4ES05WKRQHHDFHHC2JIW&client_secret=TGXXOPMD0ZXQPOAMQWRJCFC0C0ES1NZBQDENTDEZ5B4OLUDV&v=20180605&ll=43.6542599,-79.3606359&radius=500&limit=100'

Send the GET request and examine the resutls

In [34]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e407f043907e7001bbfbdcf'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 46,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
 

All the information is in the *items* key. Before we proceed, let's create the **get_category_type** function

In [35]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [36]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


And how many venues were returned by Foursquare?

In [37]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

46 venues were returned by Foursquare.


## 3.3 Explore Neighbourhoods in Downtown Toronto 

#### Let's create a function to repeat the same process to all the neighborhoods in Downtown Toronto

In [38]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    
    for i in range(0,len(downtown_toronto)):
        neighborhood_latitude = downtown_toronto.loc[0, 'Latitude'] # neighborhood latitude value
        neighborhood_longitude = downtown_toronto.loc[0, 'Longitude'] # neighborhood longitude value
        neighborhood_name = downtown_toronto.loc[0, 'Neighborhood'] # neighborhood name
        print(neighborhood_name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            neighborhood_latitude, 
            neighborhood_longitude, 
            radius, 
            LIMIT)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            neighborhood_latitude, 
            neighborhood_longitude, 
            v['venue']['name'], 
            v['venue']['location']['neighborhood_latitude'], 
            v['venue']['location']['neighborhood_longitude'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *downtown_toronto_venues*.

In [40]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto['Neighborhood'],
                                   latitudes=downtown_toronto['Latitude'],
                                   longitudes=downtown_toronto['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city
Church and Wellesley


#### Let's check the size of the resulting dataframe

In [41]:
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(1296, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Let's check how many venues were returned for each neighborhood

In [42]:
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",16,16,16,16,16,16
"Cabbagetown,St. James Town",42,42,42,42,42,42
Central Bay Street,84,84,84,84,84,84
"Chinatown,Grange Park,Kensington Market",80,80,80,80,80,80
Christie,18,18,18,18,18,18
Church and Wellesley,81,81,81,81,81,81
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"Design Exchange,Toronto Dominion Centre",100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [43]:
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 199 uniques categories.


## 3.4 Analyze each neighbourhood in Downtown Toronto 

In [44]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [45]:
downtown_toronto_onehot.shape

(1296, 199)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [46]:
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()
downtown_toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,...,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0
5,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0125,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,...,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0
8,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
9,"Design Exchange,Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0


Let's confirm the new size

In [47]:
downtown_toronto_grouped.shape

(19, 199)

Let's print each neighborhood along with the top 5 most common venues

In [48]:
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
              venue  freq
0       Coffee Shop  0.07
1        Steakhouse  0.04
2              Café  0.04
3               Bar  0.04
4  Asian Restaurant  0.03


----Berczy Park----
                venue  freq
0         Coffee Shop  0.07
1        Cocktail Bar  0.05
2            Beer Bar  0.04
3  Seafood Restaurant  0.04
4      Farmers Market  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.19
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3               Bar  0.06
4       Coffee Shop  0.06


----Cabbagetown,St. James Town----
         venue  freq
0   Restaurant  0.07
1  Coffee Shop  0.07
2          Pub  0.05
3       Bakery  0.05
4         Café  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.17
1      Sandwich Place  0.05
2      Ice Cream Shop  0.05
3  Italian Restaurant  0.05
4                Café  0

Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [50]:
import numpy as np # library to handle data in a vectorized manner

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Cosmetics Shop,Burger Joint,Asian Restaurant,Bakery,Breakfast Spot
1,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Steakhouse,Seafood Restaurant,Bakery,Café,Beer Bar,Farmers Market,Diner
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Terminal,Airport Lounge,Boat or Ferry,Rental Car Location,Harbor / Marina,Coffee Shop,Sculpture Garden,Airport Gate,Airport Food Court
3,"Cabbagetown,St. James Town",Coffee Shop,Restaurant,Pizza Place,Bakery,Italian Restaurant,Café,Pub,Playground,Indian Restaurant,Japanese Restaurant
4,Central Bay Street,Coffee Shop,Sandwich Place,Ice Cream Shop,Café,Italian Restaurant,Burger Joint,Juice Bar,Japanese Restaurant,Department Store,Bar


## 3.5 Cluster Neighbourhoods

In [51]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

Run *k*-means to cluster the neighborhood into 5 clusters.

In [52]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 3, 2, 2, 2, 4, 2, 2, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [53]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_toronto_merged = downtown_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Pub,Park,Café,Bakery,Mexican Restaurant,Breakfast Spot,Restaurant,Ice Cream Shop,Spa
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Gym,Park,Yoga Studio,College Auditorium,Sandwich Place,Salad Place,Restaurant,Portuguese Restaurant,Burger Joint
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,2,Coffee Shop,Clothing Store,Japanese Restaurant,Café,Cosmetics Shop,Bubble Tea Shop,Ramen Restaurant,Restaurant,Bakery,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Café,Restaurant,Cocktail Bar,American Restaurant,Bakery,Cosmetics Shop,Breakfast Spot,Clothing Store,Hotel
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Coffee Shop,Cocktail Bar,Cheese Shop,Steakhouse,Seafood Restaurant,Bakery,Café,Beer Bar,Farmers Market,Diner


Finally, let's visualize the resulting clusters

In [54]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[43.6548, -79.3883], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 3.6 Examine Clusters 

### 3.6.1 Cluster 1

In [55]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Pub,Park,Café,Bakery,Mexican Restaurant,Breakfast Spot,Restaurant,Ice Cream Shop,Spa
1,Downtown Toronto,0,Coffee Shop,Gym,Park,Yoga Studio,College Auditorium,Sandwich Place,Salad Place,Restaurant,Portuguese Restaurant,Burger Joint


### 3.6.2 Cluster 2

In [56]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,1,Park,Trail,Playground,Cosmetics Shop,Doner Restaurant,Dog Run,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop


### 3.6.3 Cluster 3

In [57]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,2,Coffee Shop,Clothing Store,Japanese Restaurant,Café,Cosmetics Shop,Bubble Tea Shop,Ramen Restaurant,Restaurant,Bakery,Middle Eastern Restaurant
3,Downtown Toronto,2,Coffee Shop,Café,Restaurant,Cocktail Bar,American Restaurant,Bakery,Cosmetics Shop,Breakfast Spot,Clothing Store,Hotel
4,Downtown Toronto,2,Coffee Shop,Cocktail Bar,Cheese Shop,Steakhouse,Seafood Restaurant,Bakery,Café,Beer Bar,Farmers Market,Diner
5,Downtown Toronto,2,Coffee Shop,Sandwich Place,Ice Cream Shop,Café,Italian Restaurant,Burger Joint,Juice Bar,Japanese Restaurant,Department Store,Bar
7,Downtown Toronto,2,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Cosmetics Shop,Burger Joint,Asian Restaurant,Bakery,Breakfast Spot
8,Downtown Toronto,2,Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant,Restaurant,Brewery,Scenic Lookout,Sporting Goods Shop,Fried Chicken Joint
9,Downtown Toronto,2,Coffee Shop,Café,Hotel,Steakhouse,Restaurant,Italian Restaurant,Bar,Gastropub,Seafood Restaurant,Deli / Bodega
10,Downtown Toronto,2,Coffee Shop,Café,Hotel,Restaurant,Bakery,Deli / Bodega,Gym,American Restaurant,Steakhouse,Gastropub
11,Downtown Toronto,2,Café,Bar,Sandwich Place,Japanese Restaurant,Bookstore,Restaurant,Bakery,Noodle House,Beer Bar,Beer Store
12,Downtown Toronto,2,Bar,Café,Dumpling Restaurant,Vietnamese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mexican Restaurant,Donut Shop,Dessert Shop


### 3.6.4 Cluster 4

In [58]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,3,Airport Service,Airport Terminal,Airport Lounge,Boat or Ferry,Rental Car Location,Harbor / Marina,Coffee Shop,Sculpture Garden,Airport Gate,Airport Food Court


### 3.6.5 Cluster 5

In [59]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,4,Grocery Store,Café,Park,Athletics & Sports,Nightclub,Candy Store,Restaurant,Diner,Italian Restaurant,Baby Store
