# Segmenting and Clustering Neighbourhoods in Toronto III
## Applied Data Science Capstone by IBM on Coursera
**Fernanda Oliveira**  
Data Analyst

## Introduction

In this Lab, I will explore the neighborhoods of Toronto using the Foursquare API. Then I will select principal venue categories in each neighborhood, and after that, I will use this feature to group the neighborhoods into clusters, for this, I will use the `*k*-means clustering algorithm`. In the end, I will create a map of the neighborhoods in Toronto and their emerging clusters. For this, I will use the `folium library`.

Create a map of Toronto with neighborhoods superimposed on top.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
df = pd.read_csv("Neighbourhood.csv")
#index_col ="Neighbourhood"

In [3]:
df = df.loc[df['Borough']=='Downtown Toronto'].reset_index()

In [4]:
df.drop(['Unnamed: 0', 'index'], axis=1, inplace=True);

In [5]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1E,Downtown Toronto,Harbourfront,43.763573,-79.188711
1,M1J,Downtown Toronto,Queen's Park,43.744734,-79.239476
2,M1N,Downtown Toronto,"Ryerson, Garden District",43.692657,-79.264848
3,M1W,Downtown Toronto,St. James Town,43.799525,-79.318389
4,M2L,Downtown Toronto,Berczy Park,43.75749,-79.374714


In [6]:
latitude = 43.806686
longitude = -79.194353

In [7]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
![alt text] map_toronto

/bin/sh: 1: [alt: not found


#### Define Foursquare Credentials and Version

In [8]:
CLIENT_ID = 'N1NXHRG0HUV5552OXJTHZVP0NSZHRTBOZKVRQSNSTAVZSABR' # your Foursquare ID
CLIENT_SECRET = 'B2L2LYXQOFBAWUM33DX4IWZIAS04LGN2K0N0DRA4KDLI354O' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N1NXHRG0HUV5552OXJTHZVP0NSZHRTBOZKVRQSNSTAVZSABR
CLIENT_SECRET:B2L2LYXQOFBAWUM33DX4IWZIAS04LGN2K0N0DRA4KDLI354O


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [9]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

In [10]:
results = requests.get(url).json()
results.keys()

[u'meta', u'response']

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [11]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [12]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056


In [13]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

1 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Toronto

#### Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues.

In [15]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1E,Downtown Toronto,Harbourfront,43.763573,-79.188711
1,M1J,Downtown Toronto,Queen's Park,43.744734,-79.239476
2,M1N,Downtown Toronto,"Ryerson, Garden District",43.692657,-79.264848
3,M1W,Downtown Toronto,St. James Town,43.799525,-79.318389
4,M2L,Downtown Toronto,Berczy Park,43.75749,-79.374714


In [16]:
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                )

Harbourfront
Queen's Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
First Canadian Place, Underground city
Church and Wellesley


#### Let's check the size of the resulting dataframe

In [17]:
print(toronto_venues.shape)
toronto_venues.head()

(155, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
1,Harbourfront,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
2,Harbourfront,43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa
3,Harbourfront,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
4,Harbourfront,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location


Let's check how many venues were returned for each neighborhood

In [18]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",2,2,2,2,2,2
Berczy Park,1,1,1,1,1,1
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",15,15,15,15,15,15
"Cabbagetown, St. James Town",2,2,2,2,2,2
Central Bay Street,6,6,6,6,6,6
"Chinatown, Grange Park, Kensington Market",39,39,39,39,39,39
Christie,3,3,3,3,3,3
Church and Wellesley,8,8,8,8,8,8
"Commerce Court, Victoria Hotel",2,2,2,2,2,2
"Design Exchange, Toronto Dominion Centre",20,20,20,20,20,20


#### Let's find out how many unique categories can be curated from all the returned venues

In [19]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 83 uniques categories.


## 3. Analyze Each Neighborhood

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Arts & Crafts Store,Asian Restaurant,Auto Workshop,Bakery,Bar,Baseball Field,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Butcher,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Stadium,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner,Discount Store,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,French Restaurant,Garden Center,Gastropub,General Entertainment,Gourmet Shop,Grocery Store,Gym,Hardware Store,Health Food Store,Ice Cream Shop,Indie Movie Theater,Intersection,Italian Restaurant,Jewelry Store,Latin American Restaurant,Light Rail Station,Liquor Store,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Nail Salon,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Rental Car Location,Restaurant,Sandwich Place,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Steakhouse,Supplement Shop,Sushi Restaurant,Tea Room,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [21]:
toronto_onehot.shape

(155, 84)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Airport,Arts & Crafts Store,Asian Restaurant,Auto Workshop,Bakery,Bar,Baseball Field,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Butcher,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Stadium,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner,Discount Store,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,French Restaurant,Garden Center,Gastropub,General Entertainment,Gourmet Shop,Grocery Store,Gym,Hardware Store,Health Food Store,Ice Cream Shop,Indie Movie Theater,Intersection,Italian Restaurant,Jewelry Store,Latin American Restaurant,Light Rail Station,Liquor Store,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Nail Salon,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Rental Car Location,Restaurant,Sandwich Place,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Steakhouse,Supplement Shop,Sushi Restaurant,Tea Room,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Yoga Studio
0,"Adelaide, King, Richmond",0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Chinatown, Grange Park, Kensington Market",0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.051282,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.076923,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.051282,0.0,0.0,0.0,0.025641,0.0,0.0,0.025641,0.025641,0.025641,0.025641,0.0,0.025641,0.0,0.025641,0.0,0.025641,0.0,0.025641,0.0,0.025641,0.0,0.051282,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.025641,0.0,0.025641,0.025641,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.051282,0.025641,0.0,0.025641,0.0,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.1,0.05,0.0,0.05,0.0,0.05,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [23]:
toronto_grouped.shape

(19, 84)

#### Let's print each neighborhood along with the top 5 most common venues

In [24]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
           venue  freq
0        Airport   0.5
1           Park   0.5
2       Pharmacy   0.0
3     Nail Salon   0.0
4  Movie Theater   0.0


----Berczy Park----
                venue  freq
0           Cafeteria   1.0
1             Airport   0.0
2  Light Rail Station   0.0
3                Park   0.0
4          Nail Salon   0.0


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                  venue  freq
0    Light Rail Station  0.13
1           Yoga Studio  0.07
2                   Spa  0.07
3         Garden Center  0.07
4  Fast Food Restaurant  0.07


----Cabbagetown, St. James Town----
                 venue  freq
0          Pizza Place   0.5
1  Empanada Restaurant   0.5
2                 Park   0.0
3           Nail Salon   0.0
4        Movie Theater   0.0


----Central Bay Street----
           venue  freq
0       Pharmacy  0.17
1    Pizza Place  0.17
2  Grocery Store  0.17
3    Coff

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Airport,Park,French Restaurant,Diner,College Stadium,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice
1,Berczy Park,Cafeteria,Yoga Studio,Discount Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Light Rail Station,Yoga Studio,Spa,Burrito Place,Pizza Place,Brewery,Restaurant,Smoke Shop,Skate Park,Comic Shop
3,"Cabbagetown, St. James Town",Pizza Place,Empanada Restaurant,Yoga Studio,Discount Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop
4,Central Bay Street,Discount Store,Pharmacy,Grocery Store,Pizza Place,Butcher,Coffee Shop,Dessert Shop,Comic Shop,Construction & Landscaping,Convenience Store


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 1, 4, 1, 1, 1, 1, 3, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [28]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1E,Downtown Toronto,Harbourfront,43.763573,-79.188711
1,M1J,Downtown Toronto,Queen's Park,43.744734,-79.239476
2,M1N,Downtown Toronto,"Ryerson, Garden District",43.692657,-79.264848
3,M1W,Downtown Toronto,St. James Town,43.799525,-79.318389
4,M2L,Downtown Toronto,Berczy Park,43.75749,-79.374714


In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1E,Downtown Toronto,Harbourfront,43.763573,-79.188711,1,Breakfast Spot,Intersection,Mexican Restaurant,Medical Center,Pizza Place,Electronics Store,Spa,Rental Car Location,Comic Shop,Construction & Landscaping
1,M1J,Downtown Toronto,Queen's Park,43.744734,-79.239476,3,Jewelry Store,Playground,Yoga Studio,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner
2,M1N,Downtown Toronto,"Ryerson, Garden District",43.692657,-79.264848,1,General Entertainment,Skating Rink,Café,College Stadium,Food,Fish & Chips Shop,French Restaurant,Comic Shop,Construction & Landscaping,Convenience Store
3,M1W,Downtown Toronto,St. James Town,43.799525,-79.318389,1,Chinese Restaurant,Fast Food Restaurant,Bubble Tea Shop,Breakfast Spot,Nail Salon,Pharmacy,Pizza Place,Sandwich Place,Coffee Shop,Grocery Store
4,M2L,Downtown Toronto,Berczy Park,43.75749,-79.374714,2,Cafeteria,Yoga Studio,Discount Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner


Finally, let's visualize the resulting clusters

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters;

<img src="toronto_map.png">

## 5. Examine Clusters

In this section I will examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, I will assign a name to each cluster. 

#### Cluster 0

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,0,Baseball Field,Yoga Studio,Electronics Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner
17,Downtown Toronto,0,Construction & Landscaping,Baseball Field,Yoga Studio,Electronics Store,Comic Shop,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner


#### Cluster 1

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Breakfast Spot,Intersection,Mexican Restaurant,Medical Center,Pizza Place,Electronics Store,Spa,Rental Car Location,Comic Shop,Construction & Landscaping
2,Downtown Toronto,1,General Entertainment,Skating Rink,Café,College Stadium,Food,Fish & Chips Shop,French Restaurant,Comic Shop,Construction & Landscaping,Convenience Store
3,Downtown Toronto,1,Chinese Restaurant,Fast Food Restaurant,Bubble Tea Shop,Breakfast Spot,Nail Salon,Pharmacy,Pizza Place,Sandwich Place,Coffee Shop,Grocery Store
5,Downtown Toronto,1,Discount Store,Pharmacy,Grocery Store,Pizza Place,Butcher,Coffee Shop,Dessert Shop,Comic Shop,Construction & Landscaping,Convenience Store
6,Downtown Toronto,1,Food & Drink Shop,Bus Stop,Park,Yoga Studio,Diner,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice
8,Downtown Toronto,1,Park,Spa,Pharmacy,Bus Stop,Cosmetics Shop,Curling Ice,Beer Store,Skating Rink,Video Store,Asian Restaurant
9,Downtown Toronto,1,Park,Sandwich Place,Movie Theater,Brewery,Fish & Chips Shop,Food & Drink Shop,Gym,Ice Cream Shop,Italian Restaurant,Liquor Store
11,Downtown Toronto,1,Turkish Restaurant,Bar,Restaurant,Coffee Shop,Sandwich Place,Yoga Studio,Diner,Comic Shop,Construction & Landscaping,Convenience Store
12,Downtown Toronto,1,Café,Pizza Place,Sushi Restaurant,Diner,Bookstore,Coffee Shop,Italian Restaurant,Gym,Latin American Restaurant,Health Food Store
13,Downtown Toronto,1,Light Rail Station,Yoga Studio,Spa,Burrito Place,Pizza Place,Brewery,Restaurant,Smoke Shop,Skate Park,Comic Shop


#### Cluster 2

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,2,Cafeteria,Yoga Studio,Discount Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner


#### Cluster 3

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,3,Jewelry Store,Playground,Yoga Studio,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop,Diner
7,Downtown Toronto,3,Airport,Park,French Restaurant,Diner,College Stadium,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice
10,Downtown Toronto,3,Playground,Park,Yoga Studio,Diner,College Stadium,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice


#### Cluster 4

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Downtown Toronto,4,Pizza Place,Empanada Restaurant,Yoga Studio,Discount Store,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Curling Ice,Dessert Shop
