# Assignment - Toronto data from Wiki page
*Submitted by Vikram Seshadri for Coursera Capstone Project Assignment Week 3*

#### First step is to import the necessary libraries to extract information from the *"List_of_postal_codes_of_Canada:_M"* wikipedia page.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
wiki_url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(wiki_url)

In [3]:
wiki_table = soup.find("table",class_ = "wikitable sortable")
PostalCode = []
Borough = []
Neighborhood = []
ColumnTitles = []
titles = soup.findAll('th')
for i in range(0,3):
    ColumnTitles.append(titles[i].find(text=True).strip('\n'))

for row in soup.findAll("tr"):
    cells = row.findAll('td')
    if len(cells)==3: #Only extract table body not heading
        PostalCode.append(str(cells[0].find(text=True).strip('\n')))
        Borough.append(str(cells[1].find(text=True).strip('\n')))
        Neighborhood.append(str(cells[2].find(text=True).strip('\n')))      

#### Convert the lists into a dataframe using Pandas

In [4]:
Toronto_df = pd.DataFrame(columns=ColumnTitles)
Toronto_df['Postcode'] = PostalCode
Toronto_df['Borough'] = Borough
Toronto_df['Neighbourhood'] = Neighborhood
Toronto_df.rename(columns={"Neighbourhood" : "Neighborhood"},inplace = True)
Toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Remove the rows that does not have an assigned Borough

In [5]:
Toronto_new_df = Toronto_df[Toronto_df.Borough != 'Not assigned']

In [6]:
Toronto_new_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


#### If a neighborhood is not assigned, assign the same name as that of the borough.

In [7]:
pd.set_option('mode.chained_assignment', None)
Toronto_new_df.Neighborhood = Toronto_new_df.Borough.where(Toronto_new_df.Neighborhood == 'Not assigned', Toronto_new_df.Neighborhood)
Toronto_new_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


In [8]:
Toronto_new_df.shape

(211, 3)

#### Download the latitude and longitude locations for the corresponding postal codes from the *cocl* website.

In [9]:
postal_codes = Toronto_new_df['Postcode'].values.tolist()
LatLng = pd.read_csv('https://cocl.us/Geospatial_data')
LatLng.rename(columns={"Postal Code" : "Postcode"},inplace = True)
LatLng.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
Toronto_loc_df = Toronto_new_df.join(LatLng.set_index('Postcode'), on='Postcode')
Toronto_disp_df = Toronto_loc_df
Toronto_postal_agg = pd.DataFrame(Toronto_disp_df.groupby(['Postcode','Borough'])['Neighborhood'].apply(lambda x: ','.join(x)))
Toronto_postal_disp = Toronto_postal_agg.join(LatLng.set_index('Postcode'), on='Postcode')
Toronto_postal_disp.reset_index().head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Look up the names of unique boroughs to identify the names of boroughs that have the word Toronto in it.

In [11]:
Toronto_loc_df['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

#### Store the location details and Neighborhood details of the Boroughs that have the word Toronto in it.

In [12]:
Toronto_nbh = Toronto_loc_df[Toronto_loc_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
Toronto_nbh.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
5,M4E,East Toronto,The Beaches,43.676357,-79.293031
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,M6G,Downtown Toronto,Christie,43.669542,-79.422564
9,M5H,Downtown Toronto,Adelaide,43.650571,-79.384568


In [13]:
Toronto_nbh.shape

(74, 5)

In [14]:
import json # library to handle JSON files
from geopy.geocoders import Nominatim 
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
print("Packages Installed!")

Packages Installed!


#### Obtain the Latitude and Longitude location of Toronto.

In [15]:
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of  Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of  Toronto are 43.653963, -79.387207.


#### Using Folium, visualize the map of Toronto with the Boroughs containing the name Toronto in it.

In [16]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Toronto_nbh['Latitude'], Toronto_nbh['Longitude'], Toronto_nbh['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

#### Provide Foursquare client ID, Client Secret, Version, and the limit of the search.

In [17]:
CLIENT_ID = 'KFQB4CA4OB0VCSZFMGBYRF3CWIUU5TVCPRNQXU3QXYXZ4VPT' # your Foursquare ID
CLIENT_SECRET = '2JP0S2ZVUZHSAXVJS53UYQOKZ0M1UE5SWDQ2GAMAA34EKNOC' # your Foursquare Secret
VERSION = '20190715' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KFQB4CA4OB0VCSZFMGBYRF3CWIUU5TVCPRNQXU3QXYXZ4VPT
CLIENT_SECRET:2JP0S2ZVUZHSAXVJS53UYQOKZ0M1UE5SWDQ2GAMAA34EKNOC


#### Function that will obtain the nearby venues for any number of places with their corresponding latitudes and longitudes. The function returns a dataframe that has both the neighborhood location, and venue location and category. This function can be used subsequently to obtain the details of the venues near the neighborhoods in Toronto.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
Toronto_nbh.columns

Index(['Postcode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude'], dtype='object')

In [20]:
Toronto_venues = getNearbyVenues(names=Toronto_nbh['Neighborhood'],
                                   latitudes=Toronto_nbh['Latitude'],
                                   longitudes=Toronto_nbh['Longitude']
                                  )

In [21]:
print(Toronto_venues.shape)
Toronto_venues.head()

(3304, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [22]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,100,100,100,100,100,100
Bathurst Quay,16,16,16,16,16,16
Berczy Park,56,56,56,56,56,56
Brockton,24,24,24,24,24,24
Business Reply Mail Processing Centre 969 Eastern,16,16,16,16,16,16
CN Tower,16,16,16,16,16,16
Cabbagetown,44,44,44,44,44,44
Central Bay Street,89,89,89,89,89,89
Chinatown,100,100,100,100,100,100
Christie,15,15,15,15,15,15


In [23]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 231 uniques categories.


#### Converting the venue category into numerical dummy values for performing K-Means algorithm.

In [24]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")
Toronto_onehot.head()

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### One of the columns of the dataframe *Toronto_onehot*  already has the name *Neighborhood*. Therefore when adding the neighborhood information to this dataframe, the column is named as *Neighborhoods*

In [25]:
Toronto_onehot.insert(0,'Neighborhoods', Toronto_venues['Neighborhood'], True)

In [26]:
Toronto_onehot.shape

(3304, 232)

#### Group the dataframe by neighborhood and take the mean of all the frequencies of occurence.

In [27]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhoods').mean().reset_index()
Toronto_grouped.head()

Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Bathurst Quay,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0
3,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [28]:
Toronto_grouped.shape

(73, 232)

#### Display the top five venues for each neighborhood.

In [29]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhoods']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
             venue  freq
0      Coffee Shop  0.07
1             Café  0.05
2       Steakhouse  0.04
3              Bar  0.04
4  Thai Restaurant  0.04


----Bathurst Quay----
              venue  freq
0    Airport Lounge  0.12
1   Airport Service  0.12
2  Airport Terminal  0.12
3  Sculpture Garden  0.06
4     Boat or Ferry  0.06


----Berczy Park----
            venue  freq
0     Coffee Shop  0.11
1    Cocktail Bar  0.05
2  Farmers Market  0.04
3     Cheese Shop  0.04
4          Bakery  0.04


----Brockton----
            venue  freq
0            Café  0.08
1     Coffee Shop  0.08
2  Breakfast Spot  0.08
3   Burrito Place  0.04
4   Grocery Store  0.04


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.12
1                 Spa  0.06
2          Restaurant  0.06
3             Brewery  0.06
4         Pizza Place  0.06


----CN Tower----
              venue  freq
0    Airport Lounge  0.12
1   Airport Service  0.12

                venue  freq
0                Café  0.08
1         Pizza Place  0.08
2         Coffee Shop  0.08
3    Sushi Restaurant  0.05
4  Italian Restaurant  0.05


----Ryerson----
                  venue  freq
0           Coffee Shop  0.09
1        Clothing Store  0.06
2        Cosmetics Shop  0.04
3                  Café  0.03
4  Fast Food Restaurant  0.03


----South Hill----
                   venue  freq
0            Coffee Shop  0.13
1                    Pub  0.13
2     Light Rail Station  0.07
3       Sushi Restaurant  0.07
4  Vietnamese Restaurant  0.07


----South Niagara----
              venue  freq
0    Airport Lounge  0.12
1   Airport Service  0.12
2  Airport Terminal  0.12
3  Sculpture Garden  0.06
4     Boat or Ferry  0.06


----St. James Town----
                venue  freq
0         Coffee Shop  0.07
1          Restaurant  0.05
2                Café  0.05
3  Italian Restaurant  0.04
4                Park  0.03


----Stn A PO Boxes 25 The Esplanade----
            

Function to sort the required number of venues in decending order of freqency.

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Use the above written function to obtain the top 10 venues from each neighborhood.

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhoods'] = Toronto_grouped['Neighborhoods']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,American Restaurant
1,Bathurst Quay,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
2,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Café,Steakhouse,Bakery,Seafood Restaurant,Farmers Market,Cheese Shop,Belgian Restaurant
3,Brockton,Breakfast Spot,Café,Coffee Shop,Gym,Furniture / Home Store,Burrito Place,Caribbean Restaurant,Stadium,Restaurant,Bar
4,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Pizza Place,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Restaurant,Burrito Place


#### In this study, 7 clusters are chosen in order to get a better classification of Toronto neighborhoods

In [32]:
kclusters = 7

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhoods', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)
kmeans.labels_[0:10]

array([1, 3, 1, 1, 1, 3, 1, 1, 1, 1])

In [33]:
len(kmeans.labels_)

73

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Adelaide,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,American Restaurant
1,3,Bathurst Quay,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
2,1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Café,Steakhouse,Bakery,Seafood Restaurant,Farmers Market,Cheese Shop,Belgian Restaurant
3,1,Brockton,Breakfast Spot,Café,Coffee Shop,Gym,Furniture / Home Store,Burrito Place,Caribbean Restaurant,Stadium,Restaurant,Bar
4,1,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Pizza Place,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Restaurant,Burrito Place
5,3,CN Tower,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
6,1,Cabbagetown,Coffee Shop,Park,Café,Restaurant,Italian Restaurant,Chinese Restaurant,Bakery,Pub,Pizza Place,Sandwich Place
7,1,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Ice Cream Shop,Burger Joint,Bubble Tea Shop,Spa,Sushi Restaurant,Bar
8,1,Chinatown,Café,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Chinese Restaurant,Bakery,Bar,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Burger Joint
9,1,Christie,Grocery Store,Café,Park,Diner,Convenience Store,Baby Store,Italian Restaurant,Coffee Shop,Restaurant,Nightclub


#### Before merging the column *Neighborhood* from *neighborhood_venues_sorted*, rename the column *Neighborhood* to *Neighborhoods* in Toronto_merged dataframe.

In [35]:
Toronto_merged = Toronto_nbh
Toronto_merged.rename(columns = {'Neighborhood':'Neighborhoods'},inplace=True)
Toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhoods,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418


In [36]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhoods'), on='Neighborhoods')
Toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1,Coffee Shop,Café,Park,Bakery,Restaurant,Mexican Restaurant,Theater,Pub,Breakfast Spot,Gym / Fitness Center
1,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,1,Coffee Shop,Café,Park,Bakery,Restaurant,Mexican Restaurant,Theater,Pub,Breakfast Spot,Gym / Fitness Center
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Theater,Tea Room,Bookstore,Italian Restaurant,Japanese Restaurant
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Theater,Tea Room,Bookstore,Italian Restaurant,Japanese Restaurant
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Restaurant,Café,Italian Restaurant,Hotel,Bakery,Gastropub,Breakfast Spot,Pizza Place,Park


In [37]:
Toronto_merged.shape

(74, 16)

#### Folium map with locations colored corresponding to different clusters.

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)+5))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhoods'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 1:
* This cluster mainly seem to contain Playground and park as top venue.
* All of them contain Yoga Studio, Ethiopian and Falafel restaurant, Farmers Market etc...

In [39]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Central Toronto,0,Playground,Park,Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
50,Central Toronto,0,Playground,Park,Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
66,Downtown Toronto,0,Park,Playground,Trail,Building,Yoga Studio,Dive Bar,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 2:
* Contains Coffee shop and cafe in top 5
* All have some common places in them

In [40]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Café,Park,Bakery,Restaurant,Mexican Restaurant,Theater,Pub,Breakfast Spot,Gym / Fitness Center
1,Downtown Toronto,1,Coffee Shop,Café,Park,Bakery,Restaurant,Mexican Restaurant,Theater,Pub,Breakfast Spot,Gym / Fitness Center
2,Downtown Toronto,1,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Theater,Tea Room,Bookstore,Italian Restaurant,Japanese Restaurant
3,Downtown Toronto,1,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Theater,Tea Room,Bookstore,Italian Restaurant,Japanese Restaurant
4,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Italian Restaurant,Hotel,Bakery,Gastropub,Breakfast Spot,Pizza Place,Park
6,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Beer Bar,Café,Steakhouse,Bakery,Seafood Restaurant,Farmers Market,Cheese Shop,Belgian Restaurant
7,Downtown Toronto,1,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Ice Cream Shop,Burger Joint,Bubble Tea Shop,Spa,Sushi Restaurant,Bar
8,Downtown Toronto,1,Grocery Store,Café,Park,Diner,Convenience Store,Baby Store,Italian Restaurant,Coffee Shop,Restaurant,Nightclub
9,Downtown Toronto,1,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,American Restaurant
10,Downtown Toronto,1,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,American Restaurant


#### Cluster 3:
* Contains same popular venues
* All of them are in Central Toronto

In [41]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Central Toronto,2,Coffee Shop,Pub,Fried Chicken Joint,Restaurant,Bagel Shop,Sports Bar,Supermarket,American Restaurant,Sushi Restaurant,Pizza Place
55,Central Toronto,2,Coffee Shop,Pub,Fried Chicken Joint,Restaurant,Bagel Shop,Sports Bar,Supermarket,American Restaurant,Sushi Restaurant,Pizza Place
56,Central Toronto,2,Coffee Shop,Pub,Fried Chicken Joint,Restaurant,Bagel Shop,Sports Bar,Supermarket,American Restaurant,Sushi Restaurant,Pizza Place
57,Central Toronto,2,Coffee Shop,Pub,Fried Chicken Joint,Restaurant,Bagel Shop,Sports Bar,Supermarket,American Restaurant,Sushi Restaurant,Pizza Place
58,Central Toronto,2,Coffee Shop,Pub,Fried Chicken Joint,Restaurant,Bagel Shop,Sports Bar,Supermarket,American Restaurant,Sushi Restaurant,Pizza Place


#### Cluster 4:
* Seems to be most popular for Airport services

In [42]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
59,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
60,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
61,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
62,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
63,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
64,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court
65,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court


#### Cluster 5:
* Only one venue in central Toronto popular for Music

In [43]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Central Toronto,4,Music Venue,Garden,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 6:
* Location in East Toronto popular for Health Food.
* This has neighborhood as the third common venue, hence choice of different label for the dataframes

In [44]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 5, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,East Toronto,5,Health Food Store,Other Great Outdoors,Neighborhood,Pub,Trail,Event Space,Ethiopian Restaurant,Electronics Store,Discount Store,Falafel Restaurant


#### Cluster 7:
* Seems to be very popular for Parks and Trials
* All of them have Bus line, Farmers Market, Dog Run, Falafel restaurant, and Event space.

In [45]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 6, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Central Toronto,6,Park,Swim School,Bus Line,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
34,Central Toronto,6,Park,Trail,Bus Line,Sushi Restaurant,Jewelry Store,Yoga Studio,Dog Run,Farmers Market,Falafel Restaurant,Event Space
35,Central Toronto,6,Park,Trail,Bus Line,Sushi Restaurant,Jewelry Store,Yoga Studio,Dog Run,Farmers Market,Falafel Restaurant,Event Space


*This marks the end of this Jupyter Notebook that analyzes the neighborhoods in Toronto, Canada using KMeans Algorithm.*