<h1>Web Scraping using Beautiful Soup</h1>

Here, we first import the required packages

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

We then use the requests library to get the HTML document from the given Toronto Neighbourhood's Wikipedia page. We then use the <b>'lxml'</b> parser from BeautifulSoup to parse through the HTML code. 

In [2]:
r=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup=BeautifulSoup(r.text,'lxml')

We navigate to the div which contains the required table.

In [3]:
s1=soup.find("div",id="content").find("div",id="bodyContent").find("div",id="mw-content-text").find("div",class_="mw-parser-output").find("table")

We Parse through the Table and create a 2D list of all the Postalcodes, Boroughs and Neighbourhoods.

In [4]:
l=[]
for i in s1.find_all('tr'):
    if(i.th):
        l1=i.find_all('th')
        l1[0]=l1[0].text
        l1[1]=l1[1].text
        l1[2]=l1[2].text.strip('\n')
        l.append(l1)
    if(i.td !=None):
        l1=i.find_all('td')
        l1[0]=l1[0].text
        l1[1]=l1[1].text
        l1[2]=l1[2].text.strip('\n')
        l.append(l1)

Using this list, we create our primary data frame, which will be used for further refinement.

In [5]:
df=pd.DataFrame(l[1:],columns=l[0])

We drop those rows which have unassigned Boroughs.

In [6]:
df=pd.DataFrame(l[1:],columns=l[0])
x=(df['Borough']!='Not assigned')
df=df[x]

We reset the index, and then proceed to clean the data. When multiple Neighbourhoods exist for the same PostCode, we group them together. Also, if a neighbourhood  is not assigned, then we simply make the Borough it's neighbourhood

In [7]:
df=df.reset_index()
df=df.drop('index',axis=1)

y=set(df['Postcode'])
lomo=[]
moon=[]
y=list(y)
for i in y:
    x=df['Postcode']==i
    dftemp=df[x]
    l1=[]
    for j in dftemp['Neighbourhood']:
        if(j=='Not assigned'):
            yolo=dftemp['Borough']
            l1.append(yolo.iloc[0])
            continue
        l1.append(j)
    lomo.append(','.join(l1).strip(','))
    moon.append(dftemp['Borough'].iloc[0])

The final processed DataFrame is got created from the 3 lists.

In [8]:
df_final=pd.DataFrame({'Postcode':y,'Borough':moon,'Neighbourhood':lomo})
df_final.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M9W,Etobicoke,Northwest
1,M5S,Downtown Toronto,"Harbord,University of Toronto"
2,M3J,North York,"Northwood Park,York University"
3,M2H,North York,Hillcrest Village
4,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B..."


In [10]:
#!conda install -c conda-forge geocoder --yes
import geocoder 

ltemp=[]
for postal_code in df_final['Postcode']:
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code),key="AIzaSyBWB2JRlbE1cZlsf77snlteamt2mdSiMzE")
      lat_lng_coords = g.latlng
    
    l1=[]
    l1.append(lat_lng_coords[0])
    l1.append(lat_lng_coords[1])
    ltemp.append(l1)
temp=pd.DataFrame(ltemp,columns=["lat","long"])
df_final=df_final.join(temp)
df_final.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,lat,long
0,M9W,Etobicoke,Northwest,43.706748,-79.594054
1,M5S,Downtown Toronto,"Harbord,University of Toronto",43.662696,-79.400049
2,M3J,North York,"Northwood Park,York University",43.76798,-79.487262
3,M2H,North York,Hillcrest Village,43.803762,-79.363452
4,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.643515,-79.577201


In [11]:
df_final.shape

(103, 5)

We will now get the Latitudes and Longitudes of all the postal codes.

In [12]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


In [28]:
lnew=[]
for i in df_final["Borough"]:
    if "Toronto" in i:
        lnew.append(True)
    else:
        lnew.append(False)
toronto_data = df_final[lnew].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,lat,long
0,M5S,Downtown Toronto,"Harbord,University of Toronto",43.662696,-79.400049
1,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049
2,M6R,West Toronto,"Parkdale,Roncesvalles",43.64896,-79.456325
3,M5R,Central Toronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678
4,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307


<H1>Clustering with Kmeans</h1>

In [29]:
address = 'Toronto'

geolocator = Nominatim(user_agent="tr_exp")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [30]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['lat'], toronto_data['long'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

We just created a map of all the boroughs which have the term "Toronto" in it. Now by usng the FourSquare API, let us get the top venues of the first Neighbourhood.

In [31]:
CLIENT_ID = '2SA1CUPSN0SUYZ2W3KJMBU2P1MNZQZR3UUUUGAYZVOY2C0JX'
CLIENT_SECRET = 'HCFCBWI4YDQVKKU2YQEJJLQEGONEM0LOONRK5R22NR3FYQCL'
VERSION = '20190520'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2SA1CUPSN0SUYZ2W3KJMBU2P1MNZQZR3UUUUGAYZVOY2C0JX
CLIENT_SECRET:HCFCBWI4YDQVKKU2YQEJJLQEGONEM0LOONRK5R22NR3FYQCL


In [33]:
toronto_data.loc[0, 'Neighbourhood']

'Harbord,University of Toronto'

In [34]:
neighborhood_latitude = toronto_data.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'long'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Harbord,University of Toronto are 43.6626956, -79.40004929999999.


In [35]:
LIMIT = 100
radius = 500 
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=2SA1CUPSN0SUYZ2W3KJMBU2P1MNZQZR3UUUUGAYZVOY2C0JX&client_secret=HCFCBWI4YDQVKKU2YQEJJLQEGONEM0LOONRK5R22NR3FYQCL&v=20190520&ll=43.6626956,-79.40004929999999&radius=500&limit=100'

In [37]:
results = requests.get(url).json()


In [38]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [39]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Yasu,Japanese Restaurant,43.662837,-79.403217
1,Piano Piano,Italian Restaurant,43.662949,-79.402898
2,Rasa,Restaurant,43.662757,-79.403988
3,The Dessert Kitchen,Dessert Shop,43.662823,-79.402746
4,Almond Butterfly,Bakery,43.662836,-79.403365


In [40]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

34 venues were returned by Foursquare.


We will now create a function to do the same for all the neighbourhoods.

In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['lat'],
                                   longitudes=toronto_data['long']
                                  )

Harbord,University of Toronto
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Parkdale,Roncesvalles
The Annex,North Midtown,Yorkville
Forest Hill North,Forest Hill West
Berczy Park
North Toronto West
St. James Town
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Davisville
Lawrence Park
Design Exchange,Toronto Dominion Centre
Moore Park,Summerhill East
The Beaches West,India Bazaar
Business Reply Mail Processing Centre 969 Eastern
Dovercourt Village,Dufferin
Commerce Court,Victoria Hotel
Church and Wellesley
The Danforth West,Riverdale
Runnymede,Swansea
Rosedale
Harbourfront,Regent Park
Ryerson,Garden District
Davisville North
Studio District
The Beaches
Stn A PO Boxes 25 The Esplanade
Harbourfront East,Toronto Islands,Union Station
Chinatown,Grange Park,Kensington Market
Cabbagetown,St. James Town
First Canadian Place,Underground city
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Little Portug

The total number of venues returned

In [45]:
print(toronto_venues.shape)
toronto_venues.head()

(1690, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbord,University of Toronto",43.662696,-79.400049,Yasu,43.662837,-79.403217,Japanese Restaurant
1,"Harbord,University of Toronto",43.662696,-79.400049,Piano Piano,43.662949,-79.402898,Italian Restaurant
2,"Harbord,University of Toronto",43.662696,-79.400049,Rasa,43.662757,-79.403988,Restaurant
3,"Harbord,University of Toronto",43.662696,-79.400049,The Dessert Kitchen,43.662823,-79.402746,Dessert Shop
4,"Harbord,University of Toronto",43.662696,-79.400049,Almond Butterfly,43.662836,-79.403365,Bakery


Counting the total number of venues returned for each neighbourhood

In [47]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"Brockton,Exhibition Place,Parkdale Village",20,20,20,20,20,20
Business Reply Mail Processing Centre 969 Eastern,15,15,15,15,15,15
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",14,14,14,14,14,14
"Cabbagetown,St. James Town",44,44,44,44,44,44
Central Bay Street,87,87,87,87,87,87
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,88,88,88,88,88,88


Lets get the total number of unique categories.

In [48]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 235 uniques categories.


In [49]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [50]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.214286,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.06,0.0,0.03,0.01,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011364,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.0,0.011364,0.0


In [51]:
toronto_grouped.shape

(38, 235)

The top 5 venues are got below

In [52]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2           Steakhouse  0.04
3                  Bar  0.04
4  American Restaurant  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2  Italian Restaurant  0.04
3  Seafood Restaurant  0.04
4         Cheese Shop  0.04


----Brockton,Exhibition Place,Parkdale Village----
            venue  freq
0  Breakfast Spot  0.10
1     Coffee Shop  0.10
2            Café  0.10
3    Climbing Gym  0.05
4             Gym  0.05


----Business Reply Mail Processing Centre 969 Eastern----
           venue  freq
0    Yoga Studio  0.07
1  Auto Workshop  0.07
2     Comic Shop  0.07
3           Park  0.07
4     Restaurant  0.07


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.21
1    Airport Lounge  0.14
2  Airport Terminal  

In [53]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Let us now get the top 10 venues of each neighbourhood

In [69]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Thai Restaurant,American Restaurant,Steakhouse,Hotel,Burger Joint,Cosmetics Shop,Bakery
1,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Farmers Market,Bakery,Café,Beer Bar,Italian Restaurant,Steakhouse
2,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Café,Breakfast Spot,Pet Store,Bar,Grocery Store,Furniture / Home Store,Italian Restaurant,Convenience Store,Performing Arts Venue
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Garden,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Fast Food Restaurant,Spa
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Boutique,Airport,Airport Food Court,Airport Gate,Harbor / Marina,Sculpture Garden


In [70]:
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Now let us merge both the labelled top 10 hotspots data set and the original toronto data set

In [71]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,lat,long,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5S,Downtown Toronto,"Harbord,University of Toronto",43.662696,-79.400049,1,Café,Bookstore,Restaurant,Bar,Japanese Restaurant,Bakery,Italian Restaurant,Nightclub,French Restaurant,Beer Store
1,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,1,Coffee Shop,Pub,Liquor Store,Pizza Place,Bagel Shop,Fried Chicken Joint,Sports Bar,Supermarket,American Restaurant,Light Rail Station
2,M6R,West Toronto,"Parkdale,Roncesvalles",43.64896,-79.456325,1,Gift Shop,Breakfast Spot,Italian Restaurant,Eastern European Restaurant,Restaurant,Movie Theater,Dessert Shop,Bank,Bar,Cuban Restaurant
3,M5R,Central Toronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678,1,Coffee Shop,Sandwich Place,Café,Pizza Place,American Restaurant,Liquor Store,Burger Joint,Jewish Restaurant,Metro Station,BBQ Joint
4,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307,1,Trail,Mexican Restaurant,Jewelry Store,Sushi Restaurant,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


<h2>Visualised Cluster Map</h2>

Let us create a Folium Map to visualize the clustered neighbourhoods in Toronto

In [72]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['lat'], toronto_merged['long'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let us explore the clusters

<h2>Cluster 1</h2>

In [73]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Central Toronto,0,Bus Line,Park,Swim School,Women's Store,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
12,Central Toronto,0,Playground,Park,Trail,Women's Store,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
20,Downtown Toronto,0,Park,Playground,Trail,Women's Store,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


<h2>Cluster 2</h2>

In [74]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Café,Bookstore,Restaurant,Bar,Japanese Restaurant,Bakery,Italian Restaurant,Nightclub,French Restaurant,Beer Store
1,Central Toronto,1,Coffee Shop,Pub,Liquor Store,Pizza Place,Bagel Shop,Fried Chicken Joint,Sports Bar,Supermarket,American Restaurant,Light Rail Station
2,West Toronto,1,Gift Shop,Breakfast Spot,Italian Restaurant,Eastern European Restaurant,Restaurant,Movie Theater,Dessert Shop,Bank,Bar,Cuban Restaurant
3,Central Toronto,1,Coffee Shop,Sandwich Place,Café,Pizza Place,American Restaurant,Liquor Store,Burger Joint,Jewish Restaurant,Metro Station,BBQ Joint
4,Central Toronto,1,Trail,Mexican Restaurant,Jewelry Store,Sushi Restaurant,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
5,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Farmers Market,Bakery,Café,Beer Bar,Italian Restaurant,Steakhouse
6,Central Toronto,1,Clothing Store,Coffee Shop,Health & Beauty Service,Gym / Fitness Center,Salon / Barbershop,Rental Car Location,Park,Mexican Restaurant,Metro Station,Italian Restaurant
7,Downtown Toronto,1,Coffee Shop,Café,Hotel,Restaurant,Cosmetics Shop,Gastropub,Bakery,Breakfast Spot,Clothing Store,Gym
8,Downtown Toronto,1,Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Boutique,Airport,Airport Food Court,Airport Gate,Harbor / Marina,Sculpture Garden
9,Central Toronto,1,Sandwich Place,Dessert Shop,Coffee Shop,Restaurant,Italian Restaurant,Café,Thai Restaurant,Pizza Place,Sushi Restaurant,Gourmet Shop


<h2>Cluster 3</h2>

In [75]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,Central Toronto,2,Garden,Women's Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
