<h1>IBM Applied Data Science Capstone Project ( Geospatial Data Analysis ) </h1>
<h2> Author: Tanmay Laud </h2>
<h3> This notebook will contain the code required for capstone project submission </h3>

In [96]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


In [2]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


<h2> Fetching Neighbourhood Data For Toronto, Canada </h2>

In [3]:
import urllib3
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
http = urllib3.PoolManager()
response = http.request('GET', url)

In [9]:
page = response.data
webpage = BeautifulSoup(page,'html.parser')
table = webpage.find_all(class_='wikitable sortable')

In [39]:
df = pd.read_html(str(table))[0]

In [40]:
df = df[df['Borough']!='Not assigned']
df = df.reset_index(drop=True)

In [41]:
df[df['Postal Code']=='M5A']

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [42]:
df.shape

(103, 3)

<h2> Geospatial Co-ordinates For Each Postal Code </h2>

In [43]:
coordinates = pd.read_csv('Geospatial_Coordinates.csv')
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h4>Let us join the data to get Co-ordinate information per Borough </h4>

In [44]:
df = df.join(coordinates.set_index('Postal Code'),on='Postal Code')
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [45]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


<h2>Using Folium to mark the Neighbourhoods on a Map</h2>

In [46]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [4]:
CLIENT_ID = '55KJ0IHEWQ5YZCWTEZEMKSLSQKOUDLQFS50CHCBEZ2SOUDV2' # your Foursquare ID
CLIENT_SECRET = '__HIDDEN__' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 55KJ0IHEWQ5YZCWTEZEMKSLSQKOUDLQFS50CHCBEZ2SOUDV2
CLIENT_SECRET:__HIDDEN__


In [49]:
df.loc[0,'Neighborhood']

'Parkwoods'

<h3>Let's look at the co-ordinates for Parkwoods </h3

In [50]:
neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [60]:
radius = 5000
LIMIT = 100

In [61]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'
.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

<h3>Let's fetch the venues for Parkwoods</h3>

In [None]:
results = requests.get(url).json()

In [63]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [64]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719
1,Donalda Golf & Country Club,Golf Course,43.752816,-79.342741
2,Island Foods,Caribbean Restaurant,43.745866,-79.346035
3,Galleria Supermarket,Supermarket,43.75352,-79.349518
4,Graydon Hall Manor,Event Space,43.763923,-79.342961


In [66]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [67]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h3>The api works as expected.</h3>
<h3>Let us now fetch the venues per neighborhood</h3>

In [69]:
toronto_venues = getNearbyVenues(df['Neighborhood'], df['Latitude'], df['Longitude'], 5000)

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [70]:
print(toronto_venues.shape)
toronto_venues.head()

(10264, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Donalda Golf & Country Club,43.752816,-79.342741,Golf Course
2,Parkwoods,43.753259,-79.329656,Island Foods,43.745866,-79.346035,Caribbean Restaurant
3,Parkwoods,43.753259,-79.329656,Galleria Supermarket,43.75352,-79.349518,Supermarket
4,Parkwoods,43.753259,-79.329656,Graydon Hall Manor,43.763923,-79.342961,Event Space


In [71]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,100,100,100,100,100,100
"Alderwood, Long Branch",100,100,100,100,100,100
"Bathurst Manor, Wilson Heights, Downsview North",100,100,100,100,100,100
Bayview Village,100,100,100,100,100,100
"Bedford Park, Lawrence Manor East",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"Birch Cliff, Cliffside West",100,100,100,100,100,100
"Brockton, Parkdale Village, Exhibition Place",100,100,100,100,100,100
Business reply mail Processing Centre,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",100,100,100,100,100,100


In [73]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 249 uniques categories.


<h3> Let us one-hot encode the categorical features so that they can be fed to clustering algorithm</h3>

In [77]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.describe()

Unnamed: 0,Zoo Exhibit,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
count,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,...,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0,10264.0
mean,0.003995,0.000682,0.001072,0.000682,0.007697,0.000487,0.000292,0.004287,0.006723,0.004482,...,0.005748,0.000292,0.007502,0.004189,0.000585,0.001461,0.000487,0.000877,0.003605,0.000585
std,0.063079,0.026107,0.032721,0.026107,0.087397,0.022067,0.017095,0.065337,0.081719,0.066798,...,0.075603,0.017095,0.086292,0.064593,0.024172,0.038202,0.022067,0.0296,0.059935,0.024172
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [78]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,Agincourt,0.000000,0.000,0.00,0.00,0.0100,0.00,0.00,0.0100,0.0100,...,0.010,0.00,0.0200,0.00,0.00,0.000000,0.000,0.01,0.000,0.000000
1,"Alderwood, Long Branch",0.000000,0.000,0.00,0.00,0.0000,0.00,0.00,0.0100,0.0000,...,0.000,0.01,0.0200,0.01,0.00,0.000000,0.000,0.00,0.020,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000,0.01,0.00,0.0000,0.00,0.00,0.0100,0.0100,...,0.000,0.00,0.0100,0.01,0.00,0.000000,0.000,0.00,0.000,0.000000
3,Bayview Village,0.000000,0.000,0.00,0.00,0.0000,0.00,0.00,0.0000,0.0000,...,0.010,0.00,0.0100,0.00,0.00,0.000000,0.000,0.00,0.000,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000,0.01,0.00,0.0100,0.00,0.00,0.0000,0.0100,...,0.000,0.00,0.0000,0.01,0.00,0.000000,0.000,0.00,0.010,0.000000
5,Berczy Park,0.000000,0.000,0.00,0.00,0.0100,0.00,0.01,0.0100,0.0000,...,0.020,0.00,0.0000,0.00,0.00,0.000000,0.000,0.00,0.010,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.000,0.00,0.00,0.0000,0.00,0.00,0.0000,0.0000,...,0.020,0.00,0.0000,0.01,0.00,0.000000,0.000,0.00,0.000,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.000,0.00,0.00,0.0100,0.00,0.00,0.0100,0.0000,...,0.020,0.00,0.0000,0.00,0.00,0.000000,0.000,0.00,0.000,0.000000
8,Business reply mail Processing Centre,0.000000,0.000,0.00,0.00,0.0100,0.00,0.00,0.0000,0.0100,...,0.010,0.00,0.0100,0.00,0.00,0.000000,0.000,0.00,0.000,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000,0.00,0.00,0.0100,0.00,0.00,0.0100,0.0000,...,0.020,0.00,0.0000,0.00,0.00,0.000000,0.000,0.00,0.000,0.000000


<h3> Let us now get the top 5 venues per Neighborhood </h3>

In [79]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.08
1           Coffee Shop  0.06
2  Caribbean Restaurant  0.05
3                Bakery  0.04
4      Sushi Restaurant  0.03


----Alderwood, Long Branch----
          venue  freq
0   Coffee Shop  0.05
1        Bakery  0.04
2          Café  0.04
3  Burger Joint  0.04
4   Pizza Place  0.03


----Bathurst Manor, Wilson Heights, Downsview North----
             venue  freq
0    Grocery Store  0.04
1      Coffee Shop  0.04
2       Restaurant  0.04
3   Clothing Store  0.03
4  Bubble Tea Shop  0.03


----Bayview Village----
                       venue  freq
0                Coffee Shop  0.08
1                Supermarket  0.06
2  Middle Eastern Restaurant  0.06
3        Japanese Restaurant  0.05
4                     Bakery  0.05


----Bedford Park, Lawrence Manor East----
           venue  freq
0           Park  0.06
1           Café  0.06
2  Grocery Store  0.05
3         Bakery  0.05
4    Coffee Shop  0.05


----Bercz

          venue  freq
0          Park  0.10
1   Coffee Shop  0.09
2        Bakery  0.06
3          Café  0.03
4  Dessert Shop  0.03


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
            venue  freq
0     Coffee Shop  0.06
1           Hotel  0.06
2      Restaurant  0.05
3        Pharmacy  0.05
4  Sandwich Place  0.04


----Lawrence Manor, Lawrence Heights----
                   venue  freq
0     Italian Restaurant  0.08
1            Coffee Shop  0.08
2  Vietnamese Restaurant  0.05
3                   Café  0.04
4                 Bakery  0.04


----Lawrence Park----
                venue  freq
0  Italian Restaurant  0.08
1                Café  0.07
2                Park  0.06
3         Coffee Shop  0.06
4              Bakery  0.04


----Leaside----
                venue  freq
0                Park  0.12
1  Italian Restaurant  0.07
2                Café  0.06
3              Bakery  0.05
4       Grocery Store  0.04


----Little Portugal, Trinity----


           venue  freq
0           Park  0.06
1    Supermarket  0.05
2   Burger Joint  0.04
3    Coffee Shop  0.04
4  Grocery Store  0.03


----West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale----
            venue  freq
0     Coffee Shop  0.06
1      Restaurant  0.04
2  Sandwich Place  0.04
3    Liquor Store  0.04
4   Grocery Store  0.04


----Westmount----
            venue  freq
0     Coffee Shop  0.10
1            Bank  0.06
2  Sandwich Place  0.06
3     Pizza Place  0.05
4          Bakery  0.05


----Weston----
                    venue  freq
0             Coffee Shop  0.06
1                  Bakery  0.06
2  Furniture / Home Store  0.06
3   Vietnamese Restaurant  0.05
4                    Bank  0.04


----Wexford, Maryvale----
                       venue  freq
0  Middle Eastern Restaurant  0.07
1               Burger Joint  0.04
2              Grocery Store  0.03
3                Supermarket  0.03
4                Coffee Shop  0.03


----Willowdale----
     

In [80]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h2>Top 10 venues per neighborhood </h2>

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Coffee Shop,Caribbean Restaurant,Bakery,Pizza Place,Noodle House,Park,Supermarket,Sushi Restaurant,Indian Restaurant
1,"Alderwood, Long Branch",Coffee Shop,Burger Joint,Café,Bakery,Pizza Place,Seafood Restaurant,Burrito Place,Grocery Store,Breakfast Spot,Middle Eastern Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Grocery Store,Coffee Shop,Restaurant,Park,Middle Eastern Restaurant,Clothing Store,Bagel Shop,Bubble Tea Shop,Liquor Store,Bookstore
3,Bayview Village,Coffee Shop,Middle Eastern Restaurant,Supermarket,Bakery,Japanese Restaurant,Korean Restaurant,Bank,Bagel Shop,Thai Restaurant,Seafood Restaurant
4,"Bedford Park, Lawrence Manor East",Park,Café,Coffee Shop,Bakery,Grocery Store,Sushi Restaurant,Tea Room,Liquor Store,Shopping Mall,Steakhouse


<h2> Clustering</h2>
<h3> We will use the K-means algorithm to generate clusters of neighborhoods with similar activity </h3>

In [93]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 3, 1, 2, 4, 4, 4, 2], dtype=int32)

In [94]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
df_merged = df_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

df_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3,Middle Eastern Restaurant,Supermarket,Caribbean Restaurant,Grocery Store,Coffee Shop,Liquor Store,Japanese Restaurant,Bakery,Burger Joint,Café
1,M4A,North York,Victoria Village,43.725882,-79.315572,3,Park,Supermarket,Burger Joint,Coffee Shop,Grocery Store,Middle Eastern Restaurant,Gym / Fitness Center,Steakhouse,Sports Bar,Bakery
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Park,Brewery,Restaurant,Café,Pizza Place,Middle Eastern Restaurant,Farmers Market,Ice Cream Shop,Beach
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3,Coffee Shop,Italian Restaurant,Vietnamese Restaurant,Bakery,Café,Park,Liquor Store,Sandwich Place,Breakfast Spot,Brazilian Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Park,Bakery,Spa,Mexican Restaurant,Restaurant,Dessert Shop,Grocery Store,Historic Site,Supermarket


In [97]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'],
                                  df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [103]:
def showCluster(clusterId=0):
    return df_merged.loc[df_merged['Cluster Labels'] == clusterId, 
                     df_merged.columns[[1] + list(range(5, df_merged.shape[1]))]
                    ]

In [104]:
showCluster(0)

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Etobicoke,0,Coffee Shop,Sandwich Place,Liquor Store,Bank,Park,Café,Grocery Store,Restaurant,Bakery,Pizza Place
6,Scarborough,0,Zoo Exhibit,Coffee Shop,Pharmacy,Gas Station,Sandwich Place,Fast Food Restaurant,Pizza Place,Burger Joint,Breakfast Spot,Fried Chicken Joint
12,Scarborough,0,Zoo Exhibit,Coffee Shop,Park,Gas Station,Bank,Breakfast Spot,Smoothie Shop,Beer Store,Fast Food Restaurant,Gift Shop
18,Scarborough,0,Coffee Shop,Bank,Sandwich Place,Park,Indian Restaurant,Pharmacy,Fast Food Restaurant,Gym,Gas Station,Ice Cream Shop
22,Scarborough,0,Coffee Shop,Pizza Place,Fast Food Restaurant,Sandwich Place,Caribbean Restaurant,Pharmacy,Park,Bank,Hotel,Pub
26,Scarborough,0,Coffee Shop,Fast Food Restaurant,Sandwich Place,Bank,Pizza Place,Indian Restaurant,Caribbean Restaurant,Ice Cream Shop,Fried Chicken Joint,Restaurant
50,North York,0,Coffee Shop,Hotel,Pharmacy,Burger Joint,Steakhouse,Italian Restaurant,Indian Restaurant,Bank,Sandwich Place,Chinese Restaurant
51,Scarborough,0,Coffee Shop,Park,Grocery Store,Sandwich Place,Pharmacy,Gym,Bank,Ice Cream Shop,Burger Joint,Pizza Place
57,North York,0,Bank,Coffee Shop,Pizza Place,Fast Food Restaurant,Pharmacy,Gas Station,Sandwich Place,Bakery,Beer Store,Chinese Restaurant
64,York,0,Bakery,Coffee Shop,Furniture / Home Store,Vietnamese Restaurant,Brewery,Bank,Pharmacy,Golf Course,Chinese Restaurant,Restaurant


In [105]:
showCluster(1)

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,North York,1,Café,Park,Coffee Shop,Italian Restaurant,Liquor Store,Dessert Shop,Brewery,Bakery,Trail,Grocery Store
16,York,1,Café,Park,Italian Restaurant,Coffee Shop,Grocery Store,Bar,Bakery,Dessert Shop,Middle Eastern Restaurant,Liquor Store
21,York,1,Café,Park,Coffee Shop,Italian Restaurant,Brewery,Restaurant,Bar,Bakery,Ice Cream Shop,Middle Eastern Restaurant
23,East York,1,Park,Italian Restaurant,Café,Bakery,Grocery Store,BBQ Joint,Coffee Shop,Tea Room,Pizza Place,Greek Restaurant
25,Downtown Toronto,1,Park,Café,Coffee Shop,Bakery,Pizza Place,Brewery,Beer Bar,Spa,Ice Cream Shop,Gastropub
29,East York,1,Park,Café,Italian Restaurant,Coffee Shop,Grocery Store,BBQ Joint,Bakery,Supermarket,American Restaurant,Pizza Place
31,West Toronto,1,Café,Park,Restaurant,Italian Restaurant,Bar,Bakery,Ice Cream Shop,Brewery,Coffee Shop,Beer Bar
55,North York,1,Park,Café,Coffee Shop,Bakery,Grocery Store,Sushi Restaurant,Tea Room,Liquor Store,Shopping Mall,Steakhouse
56,York,1,Café,Coffee Shop,Italian Restaurant,Brewery,Restaurant,Bar,Bakery,BBQ Joint,Park,Liquor Store
61,Central Toronto,1,Italian Restaurant,Café,Coffee Shop,Park,Bakery,Grocery Store,Gym,Sushi Restaurant,BBQ Joint,Restaurant


<h4> Cluster 2 has 'Coffee Shop' and 'Park' as most common venues </h4> 

In [106]:
showCluster(2)

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,2,Coffee Shop,Park,Brewery,Restaurant,Café,Pizza Place,Middle Eastern Restaurant,Farmers Market,Ice Cream Shop,Beach
4,Downtown Toronto,2,Coffee Shop,Park,Bakery,Spa,Mexican Restaurant,Restaurant,Dessert Shop,Grocery Store,Historic Site,Supermarket
9,Downtown Toronto,2,Coffee Shop,Park,Restaurant,Bakery,Garden,Mediterranean Restaurant,Hotel,Historic Site,Beer Bar,Sandwich Place
15,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Dessert Shop,Mexican Restaurant,Farmers Market,Historic Site,Sandwich Place,Café
20,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Mexican Restaurant,Hotel,Liquor Store,Dessert Shop,Plaza,Farmers Market
24,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Farmers Market,Supermarket,Dessert Shop,Sandwich Place,Gym,Beer Bar
30,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Dessert Shop,Liquor Store,Sandwich Place,Café,Plaza,Garden
36,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Dessert Shop,Pizza Place,Gym,Sandwich Place,Historic Site,Café
42,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Dessert Shop,Plaza,Supermarket,Mexican Restaurant,Mediterranean Restaurant,Liquor Store
48,Downtown Toronto,2,Coffee Shop,Park,Bakery,Restaurant,Liquor Store,Mexican Restaurant,Beer Bar,Plaza,Farmers Market,Vegetarian / Vegan Restaurant
