<p style="text-align: center;"><font size="6">Segmenting and Clustering Neighborhoods in Toronto</font></p>

# Building a Dataframe 

We need build the dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data.
 

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

To create the above dataframe:

+ The dataframe will consist of three columns: **PostalCode**, **Borough**, and **Neighborhood**
+ Only process the cells that have an assigned borough. *Ignore cells with a borough that is Not assigned*.
+ **More than one neighbourhood can exist in one postal code area*. For example, in the table on the Wikipedia page, you will notice that **M5A** is listed twice and has two neighborhoods: **Harbourfront** and **Regent Park**. *These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.*
+ If a cell has a borough but a **Not assigned** neighbourhood, then the neighborhood will be the same as the borough. So for the **9th** cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be **Queen's Park**.
+ Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
+ In the last cell of your notebook, use the *.shape method to print the number of rows of your dataframe*.

In [1]:
import os
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

## Loading Tables from given Wikipedia page

A forward sortation area (FSA) is a way to designate a geographical unit based on the first three characters in a Canadian postal code. To see more detail, please view in https://www.ic.gc.ca/eic/site/bsf-osb.nsf/eng/br03396.html

We check the existed file csv that contain data of postal codes in given path. If not existing, we will scrape the given Wikipedia page to obtain the data of postal codes. Then, we save to file csv.

In [2]:
file_input_path = "./var/Toronto_FSA.csv"
df = None

if os.path.exists(file_input_path):    
    print("Loading from saved csv '%s' that was downloaded from Wikipedia page" % file_input_path)
    df = pd.read_csv(file_input_path, header=0)
else:
    print("Loading table Toronto FSA from Wikipedia page")
    
    url_page = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
    tables = pd.read_html(url_page)
    
    print("The number of tables in given Wikipedia page : %s" % len(tables))    
    df = tables[0]
    
    print("Save to file csv: %s" % file_input_path)
    df.to_csv(file_input_path, header=True, index=False)

Loading table Toronto FSA from Wikipedia page
The number of tables in given Wikipedia page : 3
Save to file csv: ./var/Toronto_FSA.csv


## Examining the resulting dataframe

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [3]:
print("(row, column) = %s" % str(df.shape))

(row, column) = (287, 3)


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 3 columns):
Postcode         287 non-null object
Borough          287 non-null object
Neighbourhood    287 non-null object
dtypes: object(3)
memory usage: 6.9+ KB


In [5]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [6]:
df.columns

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')

## Filtering by "Borough"

Only process the cells that have an assigned borough. Ignore cells with a borough that is **Not assigned**.

In [7]:
COL_NAME_POSTCODE = "Postcode"
COL_NAME_BOROUGH = "Borough"
COL_NAME_NEIGHBOURHOOD = "Neighbourhood"

CONST_NOT_ASSIGNED = "Not assigned"

In [8]:
df = df[df[COL_NAME_BOROUGH] != CONST_NOT_ASSIGNED]

In [9]:
print("(row, column) = %s" % str(df.shape))

(row, column) = (210, 3)


In [10]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


## Combining the neighborhoods that have the same Postcode

*More than one neighbourhood can exist in one postal code area*. For example, in the table on the Wikipedia page, you will notice that **M5A** is listed twice and has two neighborhoods: **Harbourfront** and **Regent Park**. *These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.*

In [11]:
df_combine = df.groupby(by=[COL_NAME_POSTCODE, 
                            COL_NAME_BOROUGH]).agg(lambda x: ','.join(x)).reset_index()

### Showing top-5 in dataframe

In [12]:
df_combine.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Reviewing a special Postcode and Borough

In [13]:
df_combine[(df_combine[COL_NAME_POSTCODE]=="M5V") & (df_combine[COL_NAME_BOROUGH]=="Downtown Toronto")]

Unnamed: 0,Postcode,Borough,Neighbourhood
68,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf..."


## Updating neighbourhood that is assigned to Not assigned

If a cell has a borough but a **Not assigned** neighbourhood, then the neighborhood will be the same as the borough. So for the **9th** cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be **Queen's Park**.

### Listing neighbourhood having "Not assigned"

In [14]:
df_combine_filter = df_combine[df_combine[COL_NAME_NEIGHBOURHOOD]==CONST_NOT_ASSIGNED]
df_combine_filter

Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Queen's Park,Not assigned


In [15]:
postcode_temp = ""
if not df_combine_filter.empty:
    postcode_temp = df_combine_filter[COL_NAME_POSTCODE].iloc[0]
    print("Postcode whose neighbourhood is 'Not assigned': %s" % postcode_temp)

Postcode whose neighbourhood is 'Not assigned': M7A


### Updating neighbourhood

In [16]:
df_combine[COL_NAME_NEIGHBOURHOOD] = df_combine.apply(
    lambda row: row[COL_NAME_BOROUGH] if row[COL_NAME_NEIGHBOURHOOD]==CONST_NOT_ASSIGNED else row[COL_NAME_NEIGHBOURHOOD], 
    axis=1)

### Verifying the updated value

In [17]:
df_combine[df_combine[COL_NAME_POSTCODE]==postcode_temp]

Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Queen's Park,Queen's Park


In [18]:
df_combine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 3 columns):
Postcode         103 non-null object
Borough          103 non-null object
Neighbourhood    103 non-null object
dtypes: object(3)
memory usage: 2.5+ KB


In [19]:
df_combine

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie..."
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam..."


## Getting the size of dataframe

In [20]:
print("(row, column) = %s" % str(df_combine.shape))

(row, column) = (103, 3)


# Getting the latitude and the longitude coordinates of each neighborhood

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

## Loading the geographical coordinates of each postal code

In [21]:
COL_NAME_POSTAL_CODE = "Postal Code"
COL_NAME_LATITUDE = "Latitude"
COL_NAME_LONGITUDE = "Longitude"

file_input_path = "./var/Geospatial_Coordinates.csv"

# Loading file csv
df_coordinates = pd.read_csv(file_input_path, header=0)

In [22]:
print("(row, column) = %s" % str(df_coordinates.shape))

(row, column) = (103, 3)


In [23]:
df_coordinates.head(3)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711


In [24]:
df_coordinates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 3 columns):
Postal Code    103 non-null object
Latitude       103 non-null float64
Longitude      103 non-null float64
dtypes: float64(2), object(1)
memory usage: 2.5+ KB


## Merging two dataframes

In [25]:
# List of columns in dataframe df_combine
df_combine.columns

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')

In [26]:
# List of columns in dataframe coordinates
df_coordinates.columns

Index(['Postal Code', 'Latitude', 'Longitude'], dtype='object')

In [27]:
df_merged = pd.merge(df_combine, df_coordinates, 
                     left_on=COL_NAME_POSTCODE, right_on=COL_NAME_POSTAL_CODE,
                     how="inner")

Quickly examine the resulting dataframe.

In [28]:
df_merged.columns

Index(['Postcode', 'Borough', 'Neighbourhood', 'Postal Code', 'Latitude',
       'Longitude'],
      dtype='object')

In [29]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 103 entries, 0 to 102
Data columns (total 6 columns):
Postcode         103 non-null object
Borough          103 non-null object
Neighbourhood    103 non-null object
Postal Code      103 non-null object
Latitude         103 non-null float64
Longitude        103 non-null float64
dtypes: float64(2), object(4)
memory usage: 5.6+ KB


In [30]:
df_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476
...,...,...,...,...,...,...
98,M9N,York,Weston,M9N,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,M9P,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",M9R,43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",M9V,43.739416,-79.588437


## Removing column "Postal Code"

In [31]:
df_merged.drop([COL_NAME_POSTAL_CODE], axis=1, inplace=True)

In [32]:
df_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437


## Getting the size of merged dataframe

In [33]:
print("(row, column) = %s" % str(df_merged.shape))

(row, column) = (103, 5)


## Verifying the result with given list of Postcode 

In [34]:
df_result = df_merged[df_merged[COL_NAME_POSTCODE].isin(["M5G", "M2H", "M4B", 
                                                         "M1J", "M4G", "M4M", 
                                                         "M1R", "M9V", "M9L", 
                                                         "M5V", "M1B", "M5A"])]
df_result.reset_index(drop=True)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
2,M1R,Scarborough,"Maryvale,Wexford",43.750072,-79.295849
3,M2H,North York,Hillcrest Village,43.803762,-79.363452
4,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937
5,M4G,East York,Leaside,43.70906,-79.363452
6,M4M,East Toronto,Studio District,43.659526,-79.340923
7,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
8,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
9,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",43.628947,-79.39442


In [35]:
df_result.columns

Index(['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'], dtype='object')

# Exploring and clustering the neighborhoods in Toronto

We can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data.

Just make sure:

1. to add enough Markdown cells to explain what we decided to do and to report any observations we make.

2. to generate maps to visualize our neighborhoods and how they cluster together. 

## Listing distinct boroughs

In [36]:
df_merged[COL_NAME_BOROUGH].unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

In [37]:
print("(row, column) = %s" % str(df_merged.shape))

(row, column) = (103, 5)


## Filtering by  boroughs that contain the word Toronto

In [38]:
CONST_BOROUGH_TORONTO = "Toronto"
df = df_merged[df_merged[COL_NAME_BOROUGH].str.contains(CONST_BOROUGH_TORONTO, case=False, regex=False)]

Quickly examine the resulting dataframe.

In [39]:
print("(row, column) = %s" % str(df.shape))

(row, column) = (39, 5)


In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 39 entries, 37 to 93
Data columns (total 5 columns):
Postcode         39 non-null object
Borough          39 non-null object
Neighbourhood    39 non-null object
Latitude         39 non-null float64
Longitude        39 non-null float64
dtypes: float64(2), object(3)
memory usage: 1.8+ KB


In [41]:
df.head(3)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572


In [42]:
# Listing distinct borough
df[COL_NAME_BOROUGH].unique()

array(['East Toronto', 'Central Toronto', 'Downtown Toronto',
       'West Toronto'], dtype=object)

In [43]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
      len(df[COL_NAME_BOROUGH].unique()),
      df.shape[0]))

The dataframe has 4 boroughs and 39 neighborhoods.


## Using geopy library to get the latitude and longitude values of Toronto

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>toronto_explorer</em>, as shown below.

In [44]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


## Creating a map of Toronto with neighborhoods superimposed on top

In [45]:
# List of columns
df.columns

Index(['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'], dtype='object')

In [46]:
import folium

# create map of Toronto using latitude and longitude values
m = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df[COL_NAME_LATITUDE], 
                                           df[COL_NAME_LONGITUDE], 
                                           df[COL_NAME_BOROUGH], 
                                           df[COL_NAME_NEIGHBOURHOOD]):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(m)  
    
m

<img src="images/w3_h1.png">

## Creating a map of Central Toronto, Canada with neighborhoods superimposed on top

However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in **'Central Toronto'**. So let's slice the original dataframe and create a new dataframe of the **'Central Toronto'** data.

In [47]:
CONST_BOROUGH = "Central Toronto"
df_central_toronto = df[df[COL_NAME_BOROUGH]==CONST_BOROUGH].reset_index(drop=True)
df_central_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
5,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049
6,M5N,Central Toronto,Roselawn,43.711695,-79.416936
7,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307
8,M5R,Central Toronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678


Let's get the geographical coordinates of **'Central Toronto', Canada**

In [48]:
address = 'Central Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Central Toronto, CA are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Central Toronto, CA are 43.653963, -79.387207.


As we did with all of Toronto, Canada, let's visualizat **Central Toronto** the neighborhoods in it.

In [49]:
# create map of Manhattan using latitude and longitude values
m = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_central_toronto[COL_NAME_LATITUDE], 
                           df_central_toronto[COL_NAME_LONGITUDE], 
                           df_central_toronto[COL_NAME_NEIGHBOURHOOD]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(m)  
    
m

<img src="images/w3_h2.png">

### Defining Foursquare Credentials and Version

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [80]:
CLIENT_ID = 'XXX'     # Foursquare ID
CLIENT_SECRET = 'XXX' # Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXX
CLIENT_SECRET:XXX


### Let's explore the first neighborhood in our dataframe

Get the neighborhood's name.

In [51]:
df_central_toronto.loc[0, COL_NAME_NEIGHBOURHOOD]

'Lawrence Park'

Get the neighborhood's latitude and longitude values.

In [52]:
neighborhood_latitude = df_central_toronto.loc[0, COL_NAME_LATITUDE]   # neighborhood latitude value
neighborhood_longitude = df_central_toronto.loc[0, COL_NAME_LONGITUDE] # neighborhood longitude value

neighborhood_name = df_central_toronto.loc[0, COL_NAME_NEIGHBOURHOOD] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Lawrence Park are 43.7280205, -79.3887901.


### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [81]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XXX&client_secret=XXX&v=20180604&ll=43.7280205,-79.3887901&radius=500&limit=100'

Send the GET request and examine the resutls

In [54]:
import requests # library to handle requests

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5df54900760a7f001b149e46'},
 'response': {'headerLocation': 'Toronto',
  'headerFullLocation': 'Toronto',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.7325205045, 'lng': -79.3825744605273},
   'sw': {'lat': 43.7235204955, 'lng': -79.3950057394727}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '50e6da19e4b0d8a78a0e9794',
       'name': 'Lawrence Park Ravine',
       'location': {'address': '3055 Yonge Street',
        'crossStreet': 'Lawrence Avenue East',
        'lat': 43.72696303913755,
        'lng': -79.39438246708775,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.72696303913755,
          'lng': -79.39438246708775}],
        'distance': 465,
        'c

Let's borrow the **get_category_type** function from the Foursquare lab.

In [55]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [56]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Lawrence Park Ravine,Park,43.726963,-79.394382
1,Booty Camp Fitness,Gym / Fitness Center,43.728051,-79.387853
2,Zodiac Swim School,Swim School,43.728532,-79.38286
3,TTC Bus #162 - Lawrence-Donway,Bus Line,43.728026,-79.382805


And how many venues were returned by Foursquare?

In [57]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


## Exploring Neighborhoods of Central Toronto, Canada

### Let's create a function to repeat the same process to all the neighborhoods

In [58]:
COL_NAME_VENUE = "Venue"
COL_NAME_CATEGORY = "Category"

COL_NAME_NEIGHBOURHOOD_LATITUDE = COL_NAME_NEIGHBOURHOOD + " " + COL_NAME_LATITUDE
COL_NAME_NEIGHBOURHOOD_LONGITUDE = COL_NAME_NEIGHBOURHOOD + " " + COL_NAME_LONGITUDE
COL_NAME_VENUE_LATITUDE = COL_NAME_VENUE + " " + COL_NAME_LATITUDE
COL_NAME_VENUE_LONGITUDE = COL_NAME_VENUE + " " + COL_NAME_LONGITUDE
COL_NAME_VENUE_CATEGORY = COL_NAME_VENUE + " " + COL_NAME_CATEGORY


def get_near_by_venues(names, latitudes, longitudes, radius=500):    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [COL_NAME_NEIGHBOURHOOD, 
                             COL_NAME_NEIGHBOURHOOD_LATITUDE,
                             COL_NAME_NEIGHBOURHOOD_LONGITUDE,
                             COL_NAME_VENUE,
                             COL_NAME_VENUE_LATITUDE,
                             COL_NAME_VENUE_LONGITUDE,
                             COL_NAME_VENUE_CATEGORY]
    return(nearby_venues)


### Getting dataframe that contains all the neighborhoods of Central Toronto

In [59]:
venues_central_toronto = get_near_by_venues(
    names=df_central_toronto[COL_NAME_NEIGHBOURHOOD],
    latitudes=df_central_toronto[COL_NAME_LATITUDE],                           
    longitudes=df_central_toronto[COL_NAME_LONGITUDE])

Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville


### Let's check the size of the resulting dataframe

In [60]:
print("(row, column) = %s" % str(venues_central_toronto.shape))
venues_central_toronto.head()

(row, column) = (114, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Booty Camp Fitness,43.728051,-79.387853,Gym / Fitness Center
2,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
4,Davisville North,43.712751,-79.390197,Sherwood Park,43.716551,-79.387776,Park


### Let's check how many venues were returned for each neighborhood

In [61]:
venues_central_toronto.groupby(COL_NAME_NEIGHBOURHOOD).count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Davisville,35,35,35,35,35,35
Davisville North,9,9,9,9,9,9
"Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West",15,15,15,15,15,15
"Forest Hill North,Forest Hill West",4,4,4,4,4,4
Lawrence Park,4,4,4,4,4,4
"Moore Park,Summerhill East",3,3,3,3,3,3
North Toronto West,22,22,22,22,22,22
Roselawn,2,2,2,2,2,2
"The Annex,North Midtown,Yorkville",20,20,20,20,20,20


### Let's find out how many unique categories can be curated from all the returned venues

In [62]:
venues_central_toronto[COL_NAME_VENUE_CATEGORY].unique()

array(['Park', 'Gym / Fitness Center', 'Swim School', 'Bus Line',
       'Food & Drink Shop', 'Breakfast Spot', 'Clothing Store', 'Hotel',
       'Sandwich Place', 'Gym', 'Pizza Place', 'Chinese Restaurant',
       'Yoga Studio', 'Coffee Shop', 'Restaurant', 'Diner', 'Spa',
       'Salon / Barbershop', 'Sporting Goods Shop', 'Mexican Restaurant',
       'Dessert Shop', 'Burger Joint', 'Toy / Game Store',
       'Health & Beauty Service', 'Cosmetics Shop', 'Rental Car Location',
       'Café', 'Indian Restaurant', 'Seafood Restaurant',
       'Sushi Restaurant', 'Italian Restaurant', 'Brewery',
       'Thai Restaurant', 'Greek Restaurant', 'Gourmet Shop',
       'Farmers Market', 'Pharmacy', 'Costume Shop',
       'American Restaurant', 'Fried Chicken Joint',
       'Japanese Restaurant', 'Intersection', 'Playground',
       'Liquor Store', 'Supermarket', 'Sports Bar', 'Pub',
       'Vietnamese Restaurant', 'Light Rail Station', 'Bagel Shop',
       'Garden', 'Trail', 'Jewelry Store',
 

In [63]:
print('There are {} uniques categories.'.format(
    len(venues_central_toronto[COL_NAME_VENUE_CATEGORY].unique())))

There are 56 uniques categories.


## Analyzing Each Neighborhood  of Central Toronto, Canada

In [64]:
# one hot encoding
central_toronto_onehot = pd.get_dummies(venues_central_toronto[[COL_NAME_VENUE_CATEGORY]], 
                                        prefix="", 
                                        prefix_sep="")

# add neighborhood column back to dataframe
central_toronto_onehot[COL_NAME_NEIGHBOURHOOD] = venues_central_toronto[COL_NAME_NEIGHBOURHOOD] 

# move neighborhood column to the first column
fixed_columns = [central_toronto_onehot.columns[-1]] + list(central_toronto_onehot.columns[:-1])
central_toronto_onehot = central_toronto_onehot[fixed_columns]

central_toronto_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,...,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,Lawrence Park,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Davisville North,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [65]:
central_toronto_onehot.shape

(114, 57)

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [66]:
central_toronto_grouped = central_toronto_onehot.groupby(COL_NAME_NEIGHBOURHOOD).mean().reset_index()
central_toronto_grouped

Unnamed: 0,Neighbourhood,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,...,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Davisville,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,...,0.0,0.0,0.057143,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0
1,Davisville North,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
3,"Forest Hill North,Forest Hill West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0
4,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,"Moore Park,Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North Toronto West,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,...,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455
7,Roselawn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"The Annex,North Midtown,Yorkville",0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0


### Let's confirm the new size

In [67]:
central_toronto_grouped.shape

(9, 57)

### Let's print each neighborhood along with the top 5 most common venues

In [68]:
num_top_venues = 5
COL_NAME_FREQUENCE = 'freq'

for hood in central_toronto_grouped[COL_NAME_NEIGHBOURHOOD]:
    print("----"+hood+"----")
    temp = central_toronto_grouped[central_toronto_grouped[COL_NAME_NEIGHBOURHOOD] == hood].T.reset_index()
    temp.columns = [COL_NAME_VENUE, COL_NAME_FREQUENCE]
    temp = temp.iloc[1:]
    temp[COL_NAME_FREQUENCE] = temp[COL_NAME_FREQUENCE].astype(float)
    temp = temp.round({COL_NAME_FREQUENCE: 2})
    print(temp.sort_values(COL_NAME_FREQUENCE, ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Davisville----
            Venue  freq
0     Pizza Place  0.09
1  Sandwich Place  0.09
2    Dessert Shop  0.09
3             Gym  0.06
4     Coffee Shop  0.06


----Davisville North----
               Venue  freq
0              Hotel  0.22
1  Food & Drink Shop  0.11
2     Breakfast Spot  0.11
3               Park  0.11
4        Pizza Place  0.11


----Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West----
                 Venue  freq
0          Coffee Shop  0.13
1                  Pub  0.13
2  American Restaurant  0.07
3         Liquor Store  0.07
4          Pizza Place  0.07


----Forest Hill North,Forest Hill West----
                 Venue  freq
0        Jewelry Store  0.25
1                Trail  0.25
2                 Park  0.25
3     Sushi Restaurant  0.25
4  American Restaurant  0.00


----Lawrence Park----
                  Venue  freq
0                  Park  0.25
1              Bus Line  0.25
2           Swim School  0.25
3  Gym / Fitness Center  0.25
4   Ameri

### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [70]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = [COL_NAME_NEIGHBOURHOOD]
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted[COL_NAME_NEIGHBOURHOOD] = central_toronto_grouped[COL_NAME_NEIGHBOURHOOD]

for ind in np.arange(central_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(central_toronto_grouped.iloc[ind, :], 
                                                                          num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Sushi Restaurant,Café,Gym,Coffee Shop,Italian Restaurant,Japanese Restaurant,Diner
1,Davisville North,Hotel,Breakfast Spot,Gym,Park,Clothing Store,Pizza Place,Sandwich Place,Food & Drink Shop,Yoga Studio,Farmers Market
2,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",Pub,Coffee Shop,American Restaurant,Supermarket,Restaurant,Liquor Store,Light Rail Station,Sports Bar,Fried Chicken Joint,Sushi Restaurant
3,"Forest Hill North,Forest Hill West",Trail,Jewelry Store,Sushi Restaurant,Park,Yoga Studio,Dessert Shop,Health & Beauty Service,Gym / Fitness Center,Gym,Greek Restaurant
4,Lawrence Park,Gym / Fitness Center,Swim School,Bus Line,Park,Yoga Studio,Diner,History Museum,Health & Beauty Service,Gym,Greek Restaurant


## Clustering Neighborhoods  of Central Toronto, Canada

Run *k*-means to cluster the neighborhood into 5 clusters.

In [71]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

central_toronto_grouped_clustering = central_toronto_grouped.drop(COL_NAME_NEIGHBOURHOOD, 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(central_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 4, 3, 2, 0, 1, 0], dtype=int32)

In [72]:
df_central_toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [73]:
COL_NAME_CLUSTER_LABELS = 'Cluster Labels'

# add clustering labels
neighborhoods_venues_sorted.insert(0, COL_NAME_CLUSTER_LABELS, kmeans.labels_)

df_central_toronto_merged = df_central_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
df_central_toronto_merged = df_central_toronto_merged.join(neighborhoods_venues_sorted.set_index(COL_NAME_NEIGHBOURHOOD), 
                                                           on=COL_NAME_NEIGHBOURHOOD)

df_central_toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3,Gym / Fitness Center,Swim School,Bus Line,Park,Yoga Studio,Diner,History Museum,Health & Beauty Service,Gym,Greek Restaurant
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Hotel,Breakfast Spot,Gym,Park,Clothing Store,Pizza Place,Sandwich Place,Food & Drink Shop,Yoga Studio,Farmers Market
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Diner,Mexican Restaurant,Park,Dessert Shop,Cosmetics Shop,Health & Beauty Service
3,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Sandwich Place,Dessert Shop,Sushi Restaurant,Café,Gym,Coffee Shop,Italian Restaurant,Japanese Restaurant,Diner
4,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316,2,Intersection,Restaurant,Playground,History Museum,Health & Beauty Service,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop,Garden


Finally, let's visualize the resulting clusters

In [74]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Let's get the geographical coordinates of **'Central Toronto', Canada**
address = 'Central Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Central Toronto, CA are {}, {}.'.format(latitude, longitude))
# ------------------------------------------------------------------------------------------------
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_central_toronto_merged[COL_NAME_LATITUDE], 
                                  df_central_toronto_merged[COL_NAME_LONGITUDE], 
                                  df_central_toronto_merged[COL_NAME_NEIGHBOURHOOD], 
                                  df_central_toronto_merged[COL_NAME_CLUSTER_LABELS]):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Central Toronto, CA are 43.653963, -79.387207.


<img src="images/w3_h3.png">

## Examining Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

### Cluster 1

In [75]:
df_temp = df_central_toronto_merged.copy()

df_temp.loc[df_temp[COL_NAME_CLUSTER_LABELS] == 0, df_temp.columns[[1] + list(range(5, df_temp.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Toronto,0,Hotel,Breakfast Spot,Gym,Park,Clothing Store,Pizza Place,Sandwich Place,Food & Drink Shop,Yoga Studio,Farmers Market
2,Central Toronto,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Diner,Mexican Restaurant,Park,Dessert Shop,Cosmetics Shop,Health & Beauty Service
3,Central Toronto,0,Pizza Place,Sandwich Place,Dessert Shop,Sushi Restaurant,Café,Gym,Coffee Shop,Italian Restaurant,Japanese Restaurant,Diner
5,Central Toronto,0,Pub,Coffee Shop,American Restaurant,Supermarket,Restaurant,Liquor Store,Light Rail Station,Sports Bar,Fried Chicken Joint,Sushi Restaurant
8,Central Toronto,0,Café,Sandwich Place,Coffee Shop,American Restaurant,Indian Restaurant,Liquor Store,Park,Pharmacy,Pizza Place,Pub


### Cluster 2

In [76]:
df_temp.loc[df_temp[COL_NAME_CLUSTER_LABELS] == 1, df_temp.columns[[1] + list(range(5, df_temp.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Toronto,1,Health & Beauty Service,Garden,Yoga Studio,Vietnamese Restaurant,Hotel,History Museum,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop


### Cluster 3

In [77]:
df_temp.loc[df_temp[COL_NAME_CLUSTER_LABELS] == 2, df_temp.columns[[1] + list(range(5, df_temp.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,2,Intersection,Restaurant,Playground,History Museum,Health & Beauty Service,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop,Garden


### Cluster 4

In [78]:
df_temp.loc[df_temp[COL_NAME_CLUSTER_LABELS] == 3, df_temp.columns[[1] + list(range(5, df_temp.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,3,Gym / Fitness Center,Swim School,Bus Line,Park,Yoga Studio,Diner,History Museum,Health & Beauty Service,Gym,Greek Restaurant


### Cluster 5

In [79]:
df_temp.loc[df_temp[COL_NAME_CLUSTER_LABELS] == 4, df_temp.columns[[1] + list(range(5, df_temp.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Central Toronto,4,Trail,Jewelry Store,Sushi Restaurant,Park,Yoga Studio,Dessert Shop,Health & Beauty Service,Gym / Fitness Center,Gym,Greek Restaurant
