# Week 3 - Clustering in Toronto!

# Table of Contents

<div style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Get Neighborhoods Data from Wikipedia</a>

2. <a href="#item2">Get and Merge Geocode Data</a>

3. <a href="#item3">Analyze and Cluster the Data</a>

</font>
</div>

<a id='item1'></a>
## Part 1: Get the data from Wikipedia

In [67]:
import pandas as pd
import numpy as np
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from IPython.display import display # make printed dataframes look nice
from IPython.display import HTML # make printed dataframes look nice
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
import folium # map rendering library

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [68]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
dfs = pd.read_html(url)

In [69]:
# Table we want is the first one
df = dfs[0]

# Get rid of the rows without useful data (note that there aren't any rows that have a Borough but not a neighborhood)
df = df[df.Borough != "Not assigned"].reset_index(drop=True)

# Don't need to do any more processing because the data on Wikipedia is already in desired form
# Print dataframe's shape
print(df.shape)

# Show off the dataframe
display(df.head())

(103, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## Now, add the lat/long coordinates to the dataset

### Ok after lots of banging my head against the wall I've decided to just download the darn data
### The following code represents my various attempts to get this to work. Skip all these blocks and start again with **"Part 2"**

<a id='item2'></a>
## Part 2: Get and merge the geocoding data

In [70]:
# Load the data in a dataframe, merge with initial data
df_geo = pd.read_csv("Geospatial_Coordinates.csv")
df_merged = pd.merge(df, df_geo, on=['Postal Code'])
df_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


<a id='item3'></a>
## Part 3: Analyze, Cluster and Label the Data Accordingly

In [71]:
# Begin by stripping out rows that don't contain the word "Toronto" in the Borough column
df_tor = df_merged[df_merged.Borough.str.contains("Toronto")].reset_index(drop=True)
df_tor

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [124]:
CLIENT_ID = 'R1H1D2GMPVIW3DV0ISLESHSVU0Q5TDMSLJHSX3GZQZM0RC1U' # your Foursquare ID
CLIENT_SECRET = '5YPLTXG0405EEDDYWZ03Z0S3OUFANDX1O5BOO1PU13R2QK0D' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

<br>
<br>

#### Begin by seeing the neighborhoods on a map

In [72]:
# Get coordinates for Toronto
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [73]:
# create map using latitude and longitude values
map_tor = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df_tor['Latitude'], df_tor['Longitude'], df_tor['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor)  
    
map_tor

<br>

#### Get first neighborhood's info

In [74]:
neighborhood_latitude = df_tor.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_tor.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_tor.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


<br>

#### Set the GET request URL, show the results

In [75]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
results = requests.get(url).json()
venues1 = results['response']['venues']
venues1 = pd.json_normalize(venues1)
venues1

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.crossStreet,venuePage.id
0,5bdc6c2bba57b4002c4c71a8,Oldtown Bodega,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1593978421,False,43.653966,-79.360752,"[{'label': 'display', 'lat': 43.653966, 'lng':...",34,M5A 1L6,CA,Toronto,ON,Canada,"[Toronto ON M5A 1L6, Canada]",,,
1,4bc70f5d14d7952126a066e9,Sackville Playground,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1593978421,False,43.654656,-79.359871,"[{'label': 'display', 'lat': 43.65465604258614...",75,,CA,Toronto,ON,Canada,"[420 king st E, Toronto ON, Canada]",420 king st E,,
2,53b8466a498e83df908c3f21,Tandem Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1593978421,False,43.653559,-79.361809,"[{'label': 'display', 'lat': 43.65355870959944...",122,,CA,Toronto,ON,Canada,"[368 King St E (at Trinity St), Toronto ON, Ca...",368 King St E,at Trinity St,
3,50760559e4b0e8c7babe2497,Body Blitz Spa East,"[{'id': '4bf58dd8d48988d1ed941735', 'name': 'S...",v-1593978421,False,43.654735,-79.359874,"[{'label': 'display', 'lat': 43.65473505045365...",80,M5A 1L9,CA,Toronto,ON,Canada,[497 King Street East (btwn Sackville St and S...,497 King Street East,btwn Sackville St and Sumach St,
4,5e5d749285a0610007e60fe8,Terroni Sud Forno Produzione e Spaccio,"[{'id': '4bf58dd8d48988d1f5941735', 'name': 'G...",v-1593978421,False,43.653903,-79.360018,"[{'label': 'display', 'lat': 43.653903, 'lng':...",63,M5A 3E2,CA,Toronto,ON,Canada,"[22 Sackville St, Toronto ON M5A 3E2, Canada]",22 Sackville St,,
5,55e8cc7a498e795a53d81d36,TTC Streetcar #503 Kingston Rd,"[{'id': '4f2a23984b9023bd5841ed2c', 'name': 'M...",v-1593978421,False,43.648099,-79.382932,"[{'label': 'display', 'lat': 43.64809856353395...",1922,,CA,Toronto,ON,Canada,"[Toronto ON, Canada]",,,
6,4dc9d4d9d16495ca5add0803,Cam's Auto Service,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A...",v-1593978421,False,43.654195,-79.360545,"[{'label': 'display', 'lat': 43.65419500779484...",10,M4M 2T7,CA,Toronto,ON,Canada,"[475 King Street East (Sackville Street), Toro...",475 King Street East,Sackville Street,
7,4b0d4672f964a520854523e3,TTC Streetcar #504 King St,"[{'id': '4f2a23984b9023bd5841ed2c', 'name': 'M...",v-1593978421,False,43.646151,-79.396,"[{'label': 'display', 'lat': 43.64615120880793...",2988,,CA,Toronto,ON,Canada,"[King St. (Moving Target!), Toronto ON, Canada]",King St.,Moving Target!,
8,51853a73498e4d97a8b20831,Rooster Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1593978421,False,43.6519,-79.365609,"[{'label': 'display', 'lat': 43.65189965670432...",479,M5A 1L1,CA,Toronto,ON,Canada,"[343 King St E (btwn Princess & Berkeley St), ...",343 King St E,btwn Princess & Berkeley St,
9,54ea41ad498e9a11e9e13308,Roselle Desserts,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",v-1593978421,False,43.653447,-79.362017,"[{'label': 'display', 'lat': 43.65344672305267...",143,M5A 1K9,CA,Toronto,ON,Canada,"[362 King St E (Trinity St), Toronto ON M5A 1K...",362 King St E,Trinity St,


<br>

#### Define function to get the category from the JSON

In [76]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        #print(row['categories'][0]['name'])
        return row['categories'][0]['name']
    except:
        #print("Uncategorized")
        return "Uncategorized"
        

<br>

#### Organize the data

In [77]:
# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
venues = venues1.loc[:, filtered_columns]

# filter the category for each row
venues['categories'] = venues.apply(get_category_type, axis=1)

# clean columns
#venues.columns = [col.split(".")[-1] for col in venues.columns]
#venues = venues.dropna(how='any',axis=0)
#venues.reset_index(drop=True, inplace=True)

venues.head()

Unnamed: 0,name,categories,location.lat,location.lng
0,Oldtown Bodega,Café,43.653966,-79.360752
1,Sackville Playground,Park,43.654656,-79.359871
2,Tandem Coffee,Coffee Shop,43.653559,-79.361809
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Terroni Sud Forno Produzione e Spaccio,Gourmet Shop,43.653903,-79.360018


<br>

#### Define function to get nearby venues

In [78]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()
        #with open('debug.json', 'w') as outfile:
        #    json.dump(results, outfile)
        venues1 = results['response']['venues']
        venues1 = pd.json_normalize(venues1)
        filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
        venues = venues1.loc[:, filtered_columns]
        venues = venues.rename(columns={'location.lat': 'latitude', 'location.lng': 'longitude'})
        venues['categories'] = venues.apply(get_category_type, axis=1)
        
        for row in venues.itertuples():
            venues_list.append([(name, 
                               lat, 
                               lng, 
                               row.name, 
                               row.latitude, 
                               row.longitude, 
                               row.categories
                              )])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<br>

#### Run function on our neighborhoods and see the results

In [79]:
tor_venues = []
tor_venues = getNearbyVenues(names=df_tor['Neighborhood'],
                                   latitudes=df_tor['Latitude'],
                                   longitudes=df_tor['Longitude']
                                  )

In [80]:
tor_venues = tor_venues[tor_venues["Venue Category"] != "Uncategorized"].reset_index(drop=True)
print(tor_venues.shape)
tor_venues.head(20)

(3292, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Oldtown Bodega,43.653966,-79.360752,Café
1,"Regent Park, Harbourfront",43.65426,-79.360636,Sackville Playground,43.654656,-79.359871,Park
2,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Terroni Sud Forno Produzione e Spaccio,43.653903,-79.360018,Gourmet Shop
5,"Regent Park, Harbourfront",43.65426,-79.360636,TTC Streetcar #503 Kingston Rd,43.648099,-79.382932,Moving Target
6,"Regent Park, Harbourfront",43.65426,-79.360636,Cam's Auto Service,43.654195,-79.360545,Automotive Shop
7,"Regent Park, Harbourfront",43.65426,-79.360636,TTC Streetcar #504 King St,43.646151,-79.396,Moving Target
8,"Regent Park, Harbourfront",43.65426,-79.360636,Rooster Coffee,43.6519,-79.365609,Coffee Shop
9,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery


<br>

#### Get some information about the dataset

In [81]:
display(tor_venues.groupby('Neighborhood').count())
print('There are {} uniques categories.'.format(len(tor_venues['Venue Category'].unique())))

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,84,84,84,84,84,84
"Brockton, Parkdale Village, Exhibition Place",92,92,92,92,92,92
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",79,79,79,79,79,79
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",62,62,62,62,62,62
Central Bay Street,96,96,96,96,96,96
Christie,94,94,94,94,94,94
Church and Wellesley,90,90,90,90,90,90
"Commerce Court, Victoria Hotel",94,94,94,94,94,94
Davisville,81,81,81,81,81,81
Davisville North,71,71,71,71,71,71


There are 409 uniques categories.


<br>

#### Use data for neighborhoods analysis

In [82]:
# one hot encoding
tor_onehot = pd.get_dummies(tor_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tor_onehot['Neighborhood'] = tor_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [tor_onehot.columns[-1]] + list(tor_onehot.columns[:-1])
tor_onehot = tor_onehot[fixed_columns]
tor_grouped = tor_onehot.groupby('Neighborhood').mean().reset_index()

<br>

#### Check out the top five venues for each neighborhood

In [83]:
num_top_venues = 5

for hood in tor_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tor_grouped[tor_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                                      venue  freq
0                                    Office  0.08
1                                  Building  0.07
2                                   Parking  0.04
3  Residential Building (Apartment / Condo)  0.04
4                                Food Truck  0.02


----Brockton, Parkdale Village, Exhibition Place----
                                      venue  freq
0                                    Office  0.26
1  Residential Building (Apartment / Condo)  0.17
2                              Tech Startup  0.07
3                        Advertising Agency  0.03
4                                  Building  0.03


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0            Building  0.05
1  Light Rail Station  0.04
2              Office  0.04
3      Medical Center  0.03
4        Antique Shop  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront 

<br> 

#### Now go through and analyze each neighborhood based on most prevalent types of venues

In [84]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [85]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tor_grouped['Neighborhood']

for ind in np.arange(tor_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Office,Building,Residential Building (Apartment / Condo),Parking,Tech Startup,Food Truck,Event Space,Hotel,Laundry Service,Pub
1,"Brockton, Parkdale Village, Exhibition Place",Office,Residential Building (Apartment / Condo),Tech Startup,Advertising Agency,Café,Building,Coworking Space,Bar,Convenience Store,Coffee Shop
2,"Business reply mail Processing Centre, South C...",Building,Office,Light Rail Station,Convenience Store,Medical Center,Butcher,Fast Food Restaurant,Theater,Restaurant,Gym / Fitness Center
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Gate,Airport Service,Moving Target,Airport Terminal,Park,Airport Lounge,Harbor / Marina,Airport,Boat or Ferry,General Travel
4,Central Bay Street,Hospital,Hospital Ward,Coffee Shop,Medical Center,Pharmacy,Emergency Room,Japanese Restaurant,Sandwich Place,Parking,Mediterranean Restaurant


#### Now Cluster Neighborhoods

In [86]:
# set number of clusters
kclusters = 5

tor_grouped_clustering = tor_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tor_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 1, 4, 1, 0, 3, 0, 1, 2], dtype=int32)

<br>

#### Add labels

In [88]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tor_merged = df_tor

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
tor_merged = tor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

tor_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Automotive Shop,Office,Furniture / Home Store,Coffee Shop,Art Gallery,Italian Restaurant,Auto Dealership,Building,Park,Moving Target
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3,Government Building,Medical Center,Building,Office,Capitol Building,Restaurant,Doctor's Office,Medical Lab,College Library,Lounge
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,College Lab,University,College Communications Building,Coffee Shop,General College & University,Parking,College Arts Building,College Academic Building,General Entertainment,Salon / Barbershop
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Office,Building,Residential Building (Apartment / Condo),Event Space,Rental Car Location,Church,Laundry Service,Spa,Furniture / Home Store,Clothing Store
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Park,Jewelry Store,Playground,Laundry Service,Bus Stop,Breakfast Spot,Coffee Shop,Dance Studio,Miscellaneous Shop,Flower Shop


<br>

#### Show the clustered neighborhoods on a map

In [90]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tor_merged['Latitude'], tor_merged['Longitude'], tor_merged['Neighborhood'], tor_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [92]:
# Add column for giving descriptive name to clusters, move that column to the front
new_tor_merged = tor_merged
new_tor_merged['Cluster Name'] = ""
fixed_columns = [new_tor_merged.columns[-1]] + list(new_tor_merged.columns[:-1])
new_tor_merged = new_tor_merged[fixed_columns]

Unnamed: 0,Cluster Name,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Automotive Shop,Office,Furniture / Home Store,Coffee Shop,Art Gallery,Italian Restaurant,Auto Dealership,Building,Park,Moving Target
1,,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3,Government Building,Medical Center,Building,Office,Capitol Building,Restaurant,Doctor's Office,Medical Lab,College Library,Lounge
2,,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,College Lab,University,College Communications Building,Coffee Shop,General College & University,Parking,College Arts Building,College Academic Building,General Entertainment,Salon / Barbershop
3,,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Office,Building,Residential Building (Apartment / Condo),Event Space,Rental Car Location,Church,Laundry Service,Spa,Furniture / Home Store,Clothing Store
4,,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Park,Jewelry Store,Playground,Laundry Service,Bus Stop,Breakfast Spot,Coffee Shop,Dance Studio,Miscellaneous Shop,Flower Shop


<br>

## Now, look through the clusters and give each one an appropriate descriptive name

#### Cluster 0

In [103]:
# Check out the first category
display(new_tor_merged.loc[new_tor_merged['Cluster Labels'] == 0, new_tor_merged.columns[[2] + [3] + [0] + list(range(6, new_tor_merged.shape[1]))]])
# Give it an appropriate name
Labels = {0: "Commercial District"}

Unnamed: 0,Borough,Neighborhood,Cluster Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Downtown Toronto,St. James Town,,0,Office,Building,Residential Building (Apartment / Condo),Event Space,Rental Car Location,Church,Laundry Service,Spa,Furniture / Home Store,Clothing Store
7,Downtown Toronto,Christie,,0,Office,Café,Furniture / Home Store,Bakery,Design Studio,Grocery Store,Laundry Service,Gym / Fitness Center,Automotive Shop,Nightclub
8,Downtown Toronto,"Richmond, Adelaide, King",,0,Office,Building,Café,Coffee Shop,Food Court,Vegetarian / Vegan Restaurant,Pool,Indian Restaurant,Bike Rental / Bike Share,Ballroom
13,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",,0,Office,Coffee Shop,Restaurant,Park,Café,Building,Coworking Space,Event Space,Deli / Bodega,Italian Restaurant
16,Downtown Toronto,"Commerce Court, Victoria Hotel",,0,Office,Coffee Shop,Financial or Legal Service,Bank,Salon / Barbershop,Food Court,Video Game Store,Restaurant,Building,Deli / Bodega
34,Downtown Toronto,Stn A PO Boxes,,0,Office,Building,Tech Startup,Residential Building (Apartment / Condo),Pub,Hotel,Gym,Bar,Laundry Service,Italian Restaurant


#### Cluster 1

In [131]:
# Check out the next category
display(new_tor_merged.loc[new_tor_merged['Cluster Labels'] == 1, new_tor_merged.columns[[2] + [3] + [0] + list(range(6, new_tor_merged.shape[1]))]])
# Give it an appropriate name
Labels[1] = "Culture District"

Unnamed: 0,Borough,Neighborhood,Cluster Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",Culture District,1,Automotive Shop,Office,Furniture / Home Store,Coffee Shop,Art Gallery,Italian Restaurant,Auto Dealership,Building,Park,Moving Target
2,Downtown Toronto,"Garden District, Ryerson",Culture District,1,College Lab,University,College Communications Building,Coffee Shop,General College & University,Parking,College Arts Building,College Academic Building,General Entertainment,Salon / Barbershop
4,East Toronto,The Beaches,Culture District,1,Park,Jewelry Store,Playground,Laundry Service,Bus Stop,Breakfast Spot,Coffee Shop,Dance Studio,Miscellaneous Shop,Flower Shop
6,Downtown Toronto,Central Bay Street,Culture District,1,Hospital,Hospital Ward,Coffee Shop,Medical Center,Pharmacy,Emergency Room,Japanese Restaurant,Sandwich Place,Parking,Mediterranean Restaurant
9,West Toronto,"Dufferin, Dovercourt Village",Culture District,1,Automotive Shop,Park,Church,Furniture / Home Store,Office,Portuguese Restaurant,Factory,Speakeasy,Jewelry Store,Grocery Store
11,West Toronto,"Little Portugal, Trinity",Culture District,1,Art Gallery,Bar,Boutique,Coffee Shop,Furniture / Home Store,Pizza Place,Office,Clothing Store,Cocktail Bar,Salon / Barbershop
12,East Toronto,"The Danforth West, Riverdale",Culture District,1,Greek Restaurant,Spa,Salon / Barbershop,Miscellaneous Shop,Health Food Store,Women's Store,Gym / Fitness Center,Pilates Studio,Ice Cream Shop,Shop & Service
15,East Toronto,"India Bazaar, The Beaches West",Culture District,1,Convenience Store,Salon / Barbershop,Park,Light Rail Station,Church,Laundry Service,Pet Store,Art Gallery,Board Shop,Fast Food Restaurant
17,East Toronto,Studio District,Culture District,1,Automotive Shop,Coffee Shop,Moving Target,Pharmacy,Restaurant,Nail Salon,Dentist's Office,Spa,Building,Seafood Restaurant
18,Central Toronto,Lawrence Park,Culture District,1,College Classroom,Housing Development,Bus Line,School,Park,Pool,Salon / Barbershop,Building,Bank,Church


#### Cluster 2

In [130]:
# Check out the next category
display(new_tor_merged.loc[new_tor_merged['Cluster Labels'] == 2, new_tor_merged.columns[[2] + [3] + [0] + list(range(6, new_tor_merged.shape[1]))]])
# Give it an appropriate name
Labels[2] = "Apartments District"

Unnamed: 0,Borough,Neighborhood,Cluster Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",Apartments District,2,Residential Building (Apartment / Condo),Office,Coffee Shop,Building,Doctor's Office,Fried Chicken Joint,Parking,Light Rail Station,Indian Restaurant,Tech Startup
14,West Toronto,"Brockton, Parkdale Village, Exhibition Place",Apartments District,2,Office,Residential Building (Apartment / Condo),Tech Startup,Advertising Agency,Café,Building,Coworking Space,Bar,Convenience Store,Coffee Shop
20,Central Toronto,Davisville North,Apartments District,2,Residential Building (Apartment / Condo),Office,Dog Run,Gym,Scenic Lookout,Playground,Park,Strip Club,Breakfast Spot,Building
31,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",Apartments District,2,Residential Building (Apartment / Condo),Office,Doctor's Office,Building,Dentist's Office,Medical Center,Light Rail Station,Café,Spiritual Center,Diner
33,Downtown Toronto,Rosedale,Apartments District,2,Residential Building (Apartment / Condo),Office,Park,Other Great Outdoors,Trail,Bank,Government Building,Salon / Barbershop,Bridge,Dog Run


#### Cluster 3

In [113]:
# Check out the next category
display(new_tor_merged.loc[new_tor_merged['Cluster Labels'] == 3, new_tor_merged.columns[[2] + [3] + [0] + list(range(6, new_tor_merged.shape[1]))]])
# Give it an appropriate name
Labels[3] = "Municipal District"

Unnamed: 0,Borough,Neighborhood,Cluster Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,3,Government Building,Medical Center,Building,Office,Capitol Building,Restaurant,Doctor's Office,Medical Lab,College Library,Lounge
5,Downtown Toronto,Berczy Park,,3,Office,Building,Residential Building (Apartment / Condo),Parking,Tech Startup,Food Truck,Event Space,Hotel,Laundry Service,Pub
19,Central Toronto,Roselawn,,3,Office,Spa,Playground,Doctor's Office,Residential Building (Apartment / Condo),Gym,General Entertainment,Synagogue,Moving Target,Italian Restaurant
21,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",,3,Residential Building (Apartment / Condo),Office,Park,Gym / Fitness Center,Tech Startup,General Entertainment,Doctor's Office,Gas Station,Asian Restaurant,Lawyer
22,West Toronto,"High Park, The Junction South",,3,Church,Office,Government Building,Building,Salon / Barbershop,Parking,Post Office,College Classroom,Residential Building (Apartment / Condo),Chiropractor
24,Central Toronto,"The Annex, North Midtown, Yorkville",,3,Residential Building (Apartment / Condo),Bed & Breakfast,Building,General Entertainment,Metro Station,Garden,Café,Miscellaneous Shop,Spa,Speakeasy
29,Central Toronto,"Moore Park, Summerhill East",,3,Office,Residential Building (Apartment / Condo),Park,Other Great Outdoors,Building,Trail,Bridge,Playground,Martial Arts Dojo,Gym
37,Downtown Toronto,Church and Wellesley,,3,Residential Building (Apartment / Condo),Spa,General Entertainment,Office,Gym,Gym / Fitness Center,Building,Bank,Hotel,Pub


#### Cluster 4

In [112]:
# Check out the next category
display(new_tor_merged.loc[new_tor_merged['Cluster Labels'] == 4, new_tor_merged.columns[[2] + [3] + [0] + list(range(6, new_tor_merged.shape[1]))]])
# Give it an appropriate name
Labels[4] = "Airport"

Unnamed: 0,Borough,Neighborhood,Cluster Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",,4,Airport Gate,Airport Service,Moving Target,Airport Terminal,Park,Airport Lounge,Harbor / Marina,Airport,Boat or Ferry,General Travel


## Now put the labels in the dataframe, show some group statistics, and we are done!

In [123]:
new_tor_merged['Cluster Name'] = new_tor_merged['Cluster Labels'].map(Labels)
grouped = new_tor_merged.groupby(['Cluster Labels', 'Cluster Name']).size().reset_index(name='Number of Areas')
grouped.set_index('Cluster Labels', inplace=True)
grouped.sort_values('Cluster Labels')

Unnamed: 0_level_0,Cluster Name,Number of Areas
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Commercial District,6
1,Culture District,19
2,Apartments District,5
3,Municipal District,8
4,Airport,1
