# Assignment Summary

###  Part 1 <br>First you will see how data from the wikipedia page on Canadian postal codes is captured and converted <br> into a Panda dataframe for further analysis. Various data wrangling and web scraping techniques will be used to cleanse and prepare the data.

###  Part 2 <br> You'll see how addresses are converted into their equivalent latitude and longitude values.  This step is necessary to visualize <br>the converted source data on a map.

###  Part 3 <br>Foursquare API will be used to explore neighborhoods in Toronto. Then you will see the most common venue categories in <br>each neighborhood, and then the explore feature will be used to group the neighborhoods into clusters. The *k*-means clustering algorithm <br>will be used to complete this task.  The Folium library will be used to visualize the neighborhoods in Toronoto and their emerging clusters. <br>Finally, an analysis of the clusters will be given highlighting key observations of each cluster.<br><br>

# Table of contents

### Part 1
[1. Create Notebook and download libraries and packages](#1.1.0)<br>
[2. Download wikipedia page, extract PostalCode, Burough & Neighbourhood table](#1.2.0)<br>
[3.a) Show table column headers, indexing and dataframe size](#1.3.0)<br>
[3.b) Only process cells that have an assigned Borough & ignore those Not assigned](#1.3.1)<br>
[3.c) Combine Neighbourhoods with same Postcode like M5A](#1.3.2)<br>
[3.d) Show table column headers, indexing and dataframe size](#1.3.3)<br>
[3.e&f) Show Number of (Rows, Columns) of new dataframe](#1.3.4)<br>
<br>
### Part 2
[1. Get Geo-spatial data from website](#2.1.0)<br>
[2. Set index of geospatial data and merge data sets](#2.2.0)<br>
[3. Determine the number of Boroughs and Neigborhoods](#2.3.0)<br>
<br>
### Part 3
[1. Import Mapping Libraries](#3.1.0)<br>
[1.a) Get mapping coordinates of Toronto](#3.1.1)<br>
[1.b) Neighborhood Selection Criteria](#3.1.2)<br>
[1.c) Get mapping coordinates for Downtown Toronto](#3.1.3)<br>
[1.d) Get coordinates of Downtown Toronto borough to create a map](#3.1.4)<br>
[1.e) Create map of downtown using latitude and longitude values](#3.1.5)<br>    
[2. Add Four square credentials](#3.2.0)<br>
[2.a) Let's get the top 100 venues in Downtown Toronto Burough](#3.2.1)<br>
[2.b) Create JSON file](#3.2.2)<br>
[2.c) Let's borrow the get_category_type function from the Foursquare lab](#3.2.3)<br>
[2.d) Let's capture Downtown Toronto Venues](#3.2.4)<br>
[2.e) Count the number of venues](#3.2.5)<br>    
[2.f) Let's get the coordinates of the nearby venues](#3.2.6)<br>
[2.g) let's list all the nearby venues](#3.2.7)<br>
[2.h) Let's show the coordinates of each venue by neighborhood](#3.2.8)<br>    
[2.i) Let's check how many venues were returned for each neighborhood](#3.2.9)<br>
[2.j) Let's find out how many unique categories can be curated from all the returned venues](#3.2.10)<br>    
[2.k) Let's analyze each neighbourhood](#3.2.11)<br>
[2.l) Look at venue frequency](#3.2.12)<br>
[2.m) Let's print each neighborhood along with the top 5 most common venues](#3.2.13)<br>    
[2.n) Let's put that into a pandas dataframe](#3.2.14)<br>
[2.o) First, let's write a function to sort the venues in descending order](#3.2.15)<br>    
[2.p) Now let's create the new dataframe and display the top 10 venues for each neighborhood](#3.2.16)<br>
[2.q) Let's look at statistics on neighborhood venues](#3.2.17)<br>
[3. Cluster Neighborhoods](#3.3.0)<br>    
[3.a) Run k-means to cluster the neighborhood into 5 clusters](#3.3.1)<br>
[3.b) Let's create a new dataframe that includes the cluster as well as the top 10 venues sorted by postcode](#3.3.2)<br>
[3.c) Finally, let's visualize the resulting 5 clusters](#3.3.3)<br>    
[3.d) Examine each Cluster of Neighborhoods](#3.3.4)<br>
[3.e) Examine Cluster 1](#3.3.5)<br>
[3.f) Examine Cluster 2](#3.3.6)<br>
[3.g) Examine Cluster 3](#3.3.7)<br>
[3.h) Examine Cluster 4](#3.3.8)<br>
[3.i) Examine Cluster 5](#3.3.9)<br>
[3.j) Observations of the Clusters](#3.3.10)<br>


# Part 1


## 1. Create Notebook and download libraries and packages <a name="1.1.0"></a>

In [1]:

# import Python libraries
import pandas as pd # library to analyze data
import numpy as np # for manipulating arrays and methematical functions
import requests # library to handle web requests
from bs4 import BeautifulSoup # library to do web scraping
#
print('Libraries imported')

Libraries imported


## 2. Download wikipedia page, extract PostalCode, Burough & Neighbourhood table <a name="1.2.0"></a>

### <font color=green>_The pandas library reads HTML tables directly from a URL. This means that they already have a built-in HTML parser<br>that processes the HTML content of a given page and tries to extract various tables in the page.<br>The read-html method returns a list of DataFrames<br>The class  - table class="wikitable sortable"> is used to isolate the table we will extract from the wikipedia page.<br> header=0, the table starts at row 0, so the table is read to include the column headers<br>The attrs argument takes a Python dictionary of attributes and matches HTML elements that match those attributes.<br>Then print the extracted numbers to span the length of the object wikitable_</font>


In [2]:
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wikitables = pd.read_html(url, header=0, attrs={"class":"wikitable sortable"})
print ("Extracted {num} wikitables".format(num=len(wikitables)))
wikidf = wikitables[0]
wikidf.head(10)

Extracted 1 wikitables


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


### <font color=orange> _You'll notice the table above has duplicate postal codes and cells with Not assigned in the Borough and Neighbourhood columns_</font>

## 3.a) Show table column headers, indexing and dataframe size<a name="1.3.0"></a>


### <font color=green> _To do this we  using the built-in functions columns, index and table shape._</font>

In [3]:
wikidf.columns

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')

In [4]:
wikidf.index

RangeIndex(start=0, stop=288, step=1)

In [5]:
wikidf.shape

(288, 3)

## 3.b) Only process cells that have an assigned Borough & ignore those Not assigned<a name="1.3.1"></a>

## Remove table rows with Borough - Not assigned>>

In [6]:
wikidf = wikidf[wikidf.Borough != 'Not assigned'] # != means not equal to Not assigned
wikidf.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


### <font color=orange> _Notice the dataframe above, there are fewer rows indicated by the first column index. In this case,  the rows with Not assigned were removed from the dataframe._</font>

In [7]:
wikidf.shape

(211, 3)

### <font color=green>_We verify the number of rows removed by looking at the size of the dataframe. In this case the output shows that the number of rows declined from 288 to 211._</font>

## 3.c) Combine Neighbourhoods with same Postcode like M5A<a name="1.3.2"></a>

### <font color=green>_In order to combine Postal Code we use the unique parameter. Since the Neighbourhood column has multiple values, we must use the series function to list all the values.<br>Then we remove any unwanted text or symboles from the column using the lambda expression_</font>

In [8]:
# create unique values for Postcode column
new_df = pd.DataFrame({'Postcode':wikidf.Postcode.unique()})
# Add text of Burough column to new dataframe
new_df['Borough']=pd.DataFrame(list(set(wikidf['Borough'].loc[wikidf['Postcode'] == pc['Postcode']])) for i, pc in new_df.iterrows())
# Iterates over the rows of the dataframe to add series of multiple Neighbourhoods in list into Neighbourhood column
new_df['Neighbourhood']=pd.Series(list(set(wikidf['Neighbourhood'].loc[wikidf['Postcode'] == pc['Postcode']])) for i, pc in new_df.iterrows())
# remove unwanted [] from text in Neighbourhood column of new dataframe
new_df['Neighbourhood']=new_df['Neighbourhood'].apply(lambda pc: ', '.join(pc))
#
new_df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Not assigned
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


## 3.d) If a cell has Borough but - Not Assigned>> Neighbourhood, then Neighbourhood get same value as Borough<a name="1.3.3"></a><br>
### <font color=green>_For Boroughs with corresponding Not assigned>> Neighbourhood then replace - Not Assigned>> to Borough's cell value_<br> # See M7A Queen's Park in row index 4</font>

In [9]:
for index, row in new_df.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']
new_df.head(20) 

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


### <font color=orange>_See change with row at M7A Queen's Park now appears in both columns_</font>

## 3.e&f) Show Number of (Rows, Columns) of new dataframe.<a name="1.3.4"></a><br><br>

In [10]:
new_df.shape

(103, 3)

### <font color=green>_We verify the number of rows removed by looking at the size of the dataframe.<br> In this case the output shows that the number of rows declined from 211 to 103.<br> The dataframe is not formatted and cleaned and ready for Part 2_</font>

# Part 2

## 1. Get Geo-spatial data from website<a name="2.1.0"></a>

### <font color=green>_We use the read__csv function because the file format is csv._</font>

In [11]:
gs_data= pd.read_csv("http://cocl.us/Geospatial_data")
gs_data.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


## 2. Set index of geospatial data and merge data sets<a name="2.2.0"></a>

### <font color=green>_The next step we have to rename the Postal Code column to Postcode column. Therefore all the combined data will work as one when we conduct some analysis later. Here the Postcode becomes the key that links the two tables. Finally we use the merge function to merge the two datasets (new__df and gs__data) into a new dataframe where we added the geo coordinates called table__w__coordinates. We use the key (Postcode) which are unique and found in each data set._ We also had to rename Neigbourhood to Neighbordhood so that we can use the same neighborhood data from both data sets.</font>

In [12]:
gs_data.set_index("Postal Code")
#Rename cloumns so it matches column of other data set
gs_data.rename(columns={'Postal Code':'Postcode'}, inplace=True)
new_df.set_index("Postcode")
# Merge two data sets 
table_w_coordinates=pd.merge(new_df, gs_data)
table_w_coordinates.rename(columns={'Neighbourhood':'Neighborhood'}, inplace=True)
table_w_coordinates.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## 3. Determine the number of Boroughs and Neigborhoods<a name="2.3.0"></a>

### <font color=green>_We verify the number of boroughs and print a list of borough with their corresponding neighborhoods.<br>_</font>

In [13]:
print('The dataframe has {} Boroughs and {} Neighborhoods.'.format(
        len(table_w_coordinates['Borough'].unique()),
        table_w_coordinates.shape[0]
    )
)

The dataframe has 11 Boroughs and 103 Neighborhoods.


# Part 3

## 1. Import Mapping Libraries<a name="3.1.0"></a>

In [14]:
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.') # show import is complete. This usually takes a few minutes:-)

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.19.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


## 1.a) Get mapping coordinates of Toronto<a name="3.1.1"></a>¶

## <font color=green>_This is reusable code whereas you would change the address and user agent for different geiographies._</font>

In [15]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


## 1.b) Neighborhood Selection Criteria<a name="3.1.2"></a>

## <font color=green>_Based on the table that has postal codes, there are four Toronto buroughs with the name Toronto in them, namely East Toronto, West Toronto, Downtown Toronto and Central Toronto. I will look at the Downtown Toronto neighborhood, since I once lived there for a few years in my childhood, and I would like to explore the downtown core for when I go back and visit._</font>

## So let's slice the original dataframe and create a new dataframe of the Downtown Toronto data.

## 1.c) Get mapping coordinates for Downtown Toronto<a name="3.1.3"></a>

In [16]:
downtown_data = table_w_coordinates[table_w_coordinates['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564
6,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
7,M5J,Downtown Toronto,"Union Station, Harbourfront East, Toronto Islands",43.640816,-79.381752
8,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817


## 1.d) Get coordinates of Downtown Toronto borough to create a map<a name="3.1.4"></a>¶

In [17]:
address = 'Downtown Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.655115, -79.380219.


## 1.e) Create map of downtown using latitude and longitude values<a name="3.1.5"></a>¶

In [18]:
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=13, width='70%', height='90%', min_zoom=12)

# add markers to map
for lat, lng, label in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        tooltip=label,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

## 2. Add Four square credentials<a name="3.2.0"></a>¶

In [19]:
CLIENT_ID = '35IHSKAD1BQVF0FHGVJFIBV5PKESZZW2CZQI0X3XVAVDQBFT' # your Foursquare ID
CLIENT_SECRET = '3B5I55GFVBCT4W0KN2COZV3TYQL04RGE1CZVPSEBT1Y2RGCP' # your Foursquare Secret
VERSION = '20180314' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 35IHSKAD1BQVF0FHGVJFIBV5PKESZZW2CZQI0X3XVAVDQBFT
CLIENT_SECRET:3B5I55GFVBCT4W0KN2COZV3TYQL04RGE1CZVPSEBT1Y2RGCP


## 2.a) Let's get the top 100 venues in Downtown Toronto Burough<a name="3.2.1"></a>

In [20]:
# The geograpical coordinate of downtown are latitude 43.773077, and longitude -79.257774
s_latitude = 43.773077
s_longitude = -79.257774

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    s_latitude, 
    s_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=35IHSKAD1BQVF0FHGVJFIBV5PKESZZW2CZQI0X3XVAVDQBFT&client_secret=3B5I55GFVBCT4W0KN2COZV3TYQL04RGE1CZVPSEBT1Y2RGCP&v=20180314&ll=43.773077,-79.257774&radius=500&limit=100'

## 2.b) Create JSON file<a name="3.2.2"></a>

In [21]:
results = requests.get(url).json() # decided not to print out the JSON file here, it is very long and unnecessary to view

## 2.c) Let's borrow the get_category_type function from the Foursquare lab<a name="3.2.3"></a>

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## 2.d) Let's capture Downtown Toronto Venues<a name="3.2.4"></a>

In [23]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Disney Store,Toy / Game Store,43.775537,-79.256833
1,SEPHORA,Cosmetics Shop,43.775017,-79.258109
2,American Eagle Outfitters,Clothing Store,43.775908,-79.258352
3,DAVIDsTEA,Tea Room,43.776613,-79.258516
4,Tommy Hilfiger Company Store,Clothing Store,43.776015,-79.257369
5,Chipotle Mexican Grill,Mexican Restaurant,43.77641,-79.258069
6,Scarborough Town Centre,Shopping Mall,43.775237,-79.257363
7,St. Andrews Fish & Chips,Fish & Chips Shop,43.771865,-79.252645
8,Coliseum Scarborough Cinemas,Movie Theater,43.775995,-79.255649
9,Jimmy The Greek,Greek Restaurant,43.775112,-79.257119


## 2.e) Count the number of venues<a name="3.2.5"></a>

In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

47 venues were returned by Foursquare.


## 2.f) Let's get the coordinates of the nearby venues<a name="3.2.6"></a>

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## 2.g) let's list all the nearby venues<a name="3.2.7"></a>

In [26]:
downtown_venues = getNearbyVenues(names=downtown_data['Neighborhood'],
                                   latitudes=downtown_data['Latitude'],
                                   longitudes=downtown_data['Longitude']
                                  )

Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Union Station, Harbourfront East, Toronto Islands
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Chinatown, Grange Park, Kensington Market
King and Spadina, Harbourfront West, Bathurst Quay, Island airport, South Niagara, CN Tower, Railway Lands
Rosedale
Stn A PO Boxes 25 The Esplanade
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


## 2.h) Let's show the coordinates of each venue by neighborhood<a name="3.2.8"></a>

In [27]:
print(downtown_venues.shape)
downtown_venues.head(10)

(1281, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
5,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
6,"Regent Park, Harbourfront",43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub
7,"Regent Park, Harbourfront",43.65426,-79.360636,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
8,"Regent Park, Harbourfront",43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park
9,"Regent Park, Harbourfront",43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site


## 2.i) Let's check how many venues were returned for each neighborhood<a name="3.2.9"></a>

In [28]:
downtown_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
Central Bay Street,81,81,81,81,81,81
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,89,89,89,89,89,89
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100


## 2.j) Let's find out how many unique categories can be curated from all the returned venues<a name="3.2.10"></a>

In [29]:
print('There are {} uniques categories.'.format(len(downtown_venues['Venue Category'].unique())))

There are 205 uniques categories.


## 2.k) Let's analyze each neighbourhood<a name="3.2.11"></a>

In [30]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 

# move neighborhood column to the first column
#fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
#fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
#downtown_onehot = downtown_onehot[fixed_columns]

downtown_onehot.head(10)

Unnamed: 0,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
downtown_onehot.shape

(1281, 205)

## 2.l) Look at venue frequency<a name="3.2.12"></a>

In [32]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped.head(10)

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,"Adelaide, King, Richmond",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.012346
3,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.05,0.0,0.05,0.01,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.0,0.011236,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011236,0.011236,0.0,0.011236,0.022472
6,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
7,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
8,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
9,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0


In [33]:
# Let's confirm the new shape
downtown_grouped.shape

(18, 205)

## 2.m) Let's print each neighborhood along with the top 5 most common venues<a name="3.2.13"></a>

In [34]:
num_top_venues = 5

for hood in downtown_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.06
1             Café  0.04
2       Steakhouse  0.04
3  Thai Restaurant  0.04
4     Burger Joint  0.03


----Berczy Park----
            venue  freq
0     Coffee Shop  0.07
1    Cocktail Bar  0.05
2      Steakhouse  0.04
3  Farmers Market  0.04
4          Bakery  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.16
1  Italian Restaurant  0.05
2                 Bar  0.04
3        Burger Joint  0.04
4                Café  0.04


----Chinatown, Grange Park, Kensington Market----
                           venue  freq
0                           Café  0.07
1                            Bar  0.06
2          Vietnamese Restaurant  0.05
3  Vegetarian / Vegan Restaurant  0.05
4                    Coffee Shop  0.04


----Christie----
                venue  freq
0                Café  0.20
1       Grocery Store  0.20
2                Park  0.13
3           Nightclub  0.07
4  Italia

## 2.n) Let's put that into a pandas dataframe<a name="3.2.14"></a>

## 2.o) First, let's write a function to sort the venues in descending order<a name="3.2.15"></a>

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## 2.p) Now let's create the new dataframe and display the top 10 venues for each neighborhood<a name="3.2.16"></a>

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Steakhouse,Thai Restaurant,Café,Hotel,Bakery,Bar,Restaurant,American Restaurant,Gym
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Farmers Market,Bakery,Seafood Restaurant,Steakhouse,Restaurant,Cheese Shop,Pub
2,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Bar,Bubble Tea Shop,Café,Burger Joint,Japanese Restaurant,Salad Place,Thai Restaurant
3,"Chinatown, Grange Park, Kensington Market",Café,Bar,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Coffee Shop,Chinese Restaurant,Mexican Restaurant,Dim Sum Restaurant,Dumpling Restaurant
4,Christie,Grocery Store,Café,Park,Coffee Shop,Nightclub,Diner,Baby Store,Italian Restaurant,Restaurant,Convenience Store
5,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Gastropub,Mediterranean Restaurant,Men's Store,Café
6,"Commerce Court, Victoria Hotel",Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Italian Restaurant,Gastropub,Seafood Restaurant,Deli / Bodega,Bakery
7,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gastropub,Seafood Restaurant,Bakery,Italian Restaurant,Deli / Bodega
8,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Steakhouse,Deli / Bodega,Burger Joint,Restaurant,Gastropub,Bakery,Seafood Restaurant
9,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Tea Room,Pizza Place,Ramen Restaurant,Bar,Restaurant


## 2.q) Let's look at statistics on neighborhood venues<a name="3.2.17"></a>

In [37]:
neighborhoods_venues_sorted.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,18,18,18,18,18,18,18,18,18,18,18
unique,18,5,12,12,15,13,15,15,15,15,18
top,Berczy Park,Coffee Shop,Café,Café,Café,Bakery,Seafood Restaurant,Gastropub,Mexican Restaurant,Italian Restaurant,Clothing Store
freq,1,13,4,4,2,4,2,3,2,2,1


## 3. Cluster Neighborhoods<a name="3.3.0"></a>

## 3.a) Run k-means to cluster the neighborhood into 5 clusters<a name="3.3.1"></a>


In [38]:
# set number of clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 4, 1, 1, 1, 1, 1], dtype=int32)

## 3.b) Let's create a new dataframe that includes the cluster as well as the top 10 venues sorted by postcode<a name="3.3.2"></a>

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)

downtown_merged = downtown_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_merged.head(300) # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Pub,Café,Bakery,Theater,Breakfast Spot,Mexican Restaurant,Restaurant,Health Food Store
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Tea Room,Pizza Place,Ramen Restaurant,Bar,Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Restaurant,Café,Hotel,Breakfast Spot,Italian Restaurant,Gastropub,Cosmetics Shop,Bakery,Clothing Store
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Café,Farmers Market,Bakery,Seafood Restaurant,Steakhouse,Restaurant,Cheese Shop,Pub
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1,Coffee Shop,Italian Restaurant,Sandwich Place,Bar,Bubble Tea Shop,Café,Burger Joint,Japanese Restaurant,Salad Place,Thai Restaurant
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564,4,Grocery Store,Café,Park,Coffee Shop,Nightclub,Diner,Baby Store,Italian Restaurant,Restaurant,Convenience Store
6,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,1,Coffee Shop,Steakhouse,Thai Restaurant,Café,Hotel,Bakery,Bar,Restaurant,American Restaurant,Gym
7,M5J,Downtown Toronto,"Union Station, Harbourfront East, Toronto Islands",43.640816,-79.381752,1,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Scenic Lookout,Fried Chicken Joint,Bakery,Pizza Place,Brewery
8,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576,1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gastropub,Seafood Restaurant,Bakery,Italian Restaurant,Deli / Bodega
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Italian Restaurant,Gastropub,Seafood Restaurant,Deli / Bodega,Bakery


## 3.c) Finally, let's visualize the resulting 5 clusters<a name="3.3.3"></a>

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.4).add_to(map_clusters)
       
map_clusters

## 3.d) Examine each Cluster of Neighborhoods<a name="3.3.4"></a>

## 3.e) Examine Cluster 1<a name="3.3.5"></a>

In [41]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,0,Café,Restaurant,Bar,Bakery,Bookstore,Japanese Restaurant,Chinese Restaurant,Poutine Place,Dessert Shop,Beer Bar


## <font color=orange>_This cluster shows an ethnically diverse area. The Poutine Place indicates that there is a small contingent of French Canadians coming here for job opportunities. It seems to be an ideal cluster for foodies._</font>

## 3.f) Examine Cluster 2<a name="3.3.6"></a>¶

In [42]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Park,Pub,Café,Bakery,Theater,Breakfast Spot,Mexican Restaurant,Restaurant,Health Food Store
1,Downtown Toronto,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Tea Room,Pizza Place,Ramen Restaurant,Bar,Restaurant
2,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,Breakfast Spot,Italian Restaurant,Gastropub,Cosmetics Shop,Bakery,Clothing Store
3,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Café,Farmers Market,Bakery,Seafood Restaurant,Steakhouse,Restaurant,Cheese Shop,Pub
4,Downtown Toronto,1,Coffee Shop,Italian Restaurant,Sandwich Place,Bar,Bubble Tea Shop,Café,Burger Joint,Japanese Restaurant,Salad Place,Thai Restaurant
6,Downtown Toronto,1,Coffee Shop,Steakhouse,Thai Restaurant,Café,Hotel,Bakery,Bar,Restaurant,American Restaurant,Gym
7,Downtown Toronto,1,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Scenic Lookout,Fried Chicken Joint,Bakery,Pizza Place,Brewery
8,Downtown Toronto,1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gastropub,Seafood Restaurant,Bakery,Italian Restaurant,Deli / Bodega
9,Downtown Toronto,1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Italian Restaurant,Gastropub,Seafood Restaurant,Deli / Bodega,Bakery
11,Downtown Toronto,1,Café,Bar,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Coffee Shop,Chinese Restaurant,Mexican Restaurant,Dim Sum Restaurant,Dumpling Restaurant


## <font color=orange>_Based on the top ten venues, we observe the 10th most common being a baby store. This indicates a family oriented cluster probably with single family dwellings and more suburban landscape._</font>

## 3.g) Examine Cluster 3<a name="3.3.7"></a>¶

In [43]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,2,Park,Playground,Trail,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner


## <font color=orange>_Based on the top ten venues, we observe a lot of variety. This indicates a hipster oriented cluster probably with many restaurants and pubs. There is probably a vibrant nightlife in this cluster. I would probably make a stop here when I come to visit. The most common venue is a coffee shop. This makes a lot of sense because, real estate is probably very expensive, and a coffee shop takes the smallest square footage in comparison to a restaurant or bar. It would be interesting to further investigate this area before my next trip to Toronto._</font>

## 3.h) Examine Cluster 4<a name="3.3.8"></a>¶

In [44]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,3,Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Sculpture Garden,Boutique,Plane,Boat or Ferry,Airport Gate,Airport


## <font color=orange>_Based on the top ten venues, we observe see a dog run trail and playground. This indicates a goto location for getting closer to nature on weekends. The discount store probably indicates a poorer cluster with older structures_</font>

## 3.i) Examine Cluster 5<a name="3.3.9"></a>¶

In [45]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,4,Grocery Store,Café,Park,Coffee Shop,Nightclub,Diner,Baby Store,Italian Restaurant,Restaurant,Convenience Store


## <font color=orange>_Based on the top ten venues, we observe see that everything is about transportation, like the airport or the Ferry. This is probably a location you would easily forget unless you are a tourist looking for cheap souvenirs._</font>

## 3.j) Observations of the Clusters<a name="3.3.10"></a>¶

### <font color=orange>_Upon examination of the clusters, I would further investigate Cluster 2 (in purple) it has the greatest number of different venues compared to the other clusters found in Downtown Toronto.  I would also visit Cluster 1 (in red), because I am French Canadian and noticed that this cluster has a sizable French speaking community. All of the clusters have something to offer, depending on what you are looking for. I would say that these clusters were formed based on groupings of similar venues and their proximity to one another. Just looking at each cluster, you can identify unique characteristics not found in another cluster like ethnic diversity._</font>