# Applied Data Science Capstone
<i>IBM Data Science Coursera Course</i>

## Jupyter Notebook for Capstone Project
By Xander Mol

### Week 3 Peer Graded Assignment

### Step 1: Create dataframe of neighborhoods in Toronto from Wikipedia page

In [1]:
#Imports
import pandas as pd
import numpy as np

#Scrape table from website using Pandas
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
neigh=pd.read_html(url, header=0)[0]
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [2]:
#Get original dimensions
neigh.shape

(288, 3)

In [3]:
#Drop all rows with Neighbourhood is Not assigned
neigh.drop(neigh[neigh.Borough == 'Not assigned'].index, inplace=True)
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [4]:
#New dimensions
neigh.shape

(211, 3)

In [5]:
#Aggregate per postcode, seperating with comma
neigh = neigh.groupby(['Postcode' , 'Borough'], as_index=False).agg( ','.join)
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [6]:
#New dimensions
neigh.shape

(103, 3)

In [7]:
#Change Not assigned neighbourhood to name of Borough
neigh.loc[neigh['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = neigh['Borough']
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [8]:
#Check on final dimension
neigh.shape

(103, 3)

### Step 2: Get the latitude and the longitude coordinates of each neighborhood

In [9]:
#Imports and installs
!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

Solving environment: done

# All requested packages already installed.



<b>Obtain geolocation of each neighbourhood</b>

Used arcgis source instead of Google as Google results in an access denied on every request

In [10]:
# Obtain geolocations using Geocoder for all neighbourhoods

for postal_code in neigh['Postcode']:

    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    neigh.loc[neigh['Postcode'] == postal_code, 'Latitude'] = lat_lng_coords[0]
    neigh.loc[neigh['Postcode'] == postal_code, 'Longitude'] = lat_lng_coords[1]
    
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


<b>Cross check validation: compare with CSV file provided</b>

In [11]:
#Download CSV file provided as cross check
latlongcheck = pd.read_csv('https://cocl.us/Geospatial_data')
latlongcheck.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<b>NB: Obtained lat/long values from geocoder.arcgis deviate slightly from checkfile from delivered CSV. However, differences are small, so decided to use own obtained long/lat values instead of delivered CSV file</b>

<b>Complete resulting dataframe:</b>

In [12]:
neigh

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.785730,-79.158750
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.765690,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.217590
4,M1H,Scarborough,Cedarbrae,43.769688,-79.239440
5,M1J,Scarborough,Scarborough Village,43.743125,-79.231750
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.726245,-79.263670
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.713133,-79.285055
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.723575,-79.234976
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.696665,-79.260163


### Step 3: Explore and cluster the neighborhoods in Toronto

In [13]:
#Imports and installs

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import requests for GET requests
import requests

# Import k-means from clustering stage
from sklearn.cluster import KMeans

# Import Folium for maps
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


<b>Create Toronto map with neighbourhoods plotted</b>

Drop all Boroughs not containing Toronto

In [14]:
neigh = neigh[neigh.Borough.str.contains('toronto',case=False)]
neigh

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676845,-79.295225
41,M4K,East Toronto,"The Danforth West,Riverdale",43.683262,-79.35512
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.667965,-79.314673
43,M4M,East Toronto,Studio District,43.662766,-79.33483
44,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085
45,M4P,Central Toronto,Davisville North,43.712815,-79.388526
46,M4R,Central Toronto,North Toronto West,43.714523,-79.40696
47,M4S,Central Toronto,Davisville,43.703395,-79.385964
48,M4T,Central Toronto,"Moore Park,Summerhill East",43.690655,-79.383561
49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686083,-79.402335


New dimensions

In [15]:
neigh.shape

(38, 5)

Create map of neighbourhoods

In [16]:
# Obtain latitude and longitude of Toronto
g = geocoder.arcgis('Toronto, Ontario')
lat_lng_coords = g.latlng
lat_lng_coords

[43.648690000000045, -79.38543999999996]

In [17]:
#Create map of Toronto with neighbourhoods marked
map_toronto = folium.Map(location=[lat_lng_coords[0], lat_lng_coords[1]], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(neigh['Latitude'], neigh['Longitude'], neigh['Borough'], neigh['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<b>Obtain surrounding venues for each neighbourhood from Foursquare</b>

Hidden cell below containing secret Foursquare credentials.

In [18]:
# The code was removed by Watson Studio for sharing.

In [19]:
# Funtion to explore neighbourhood in Foursquare

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
# Obtain surrounding venues for all neighbourhoods

toronto_venues = getNearbyVenues(names=neigh['Neighbourhood'],
                                   latitudes=neigh['Latitude'],
                                   longitudes=neigh['Longitude']
                                  )

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

<b>Explore resulting dataframe</b>

In [21]:
# Explore result
print(toronto_venues.shape)
toronto_venues.head()

(1746, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676845,-79.295225,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676845,-79.295225,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676845,-79.295225,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676845,-79.295225,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West,Riverdale",43.683262,-79.35512,Dollarama,43.686197,-79.355989,Discount Store


Let's check how many venues were returned for each neighborhood

In [22]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,63,63,63,63,63,63
"Brockton,Exhibition Place,Parkdale Village",71,71,71,71,71,71
Business Reply Mail Processing Centre 969 Eastern,100,100,100,100,100,100
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",70,70,70,70,70,70
"Cabbagetown,St. James Town",41,41,41,41,41,41
Central Bay Street,98,98,98,98,98,98
"Chinatown,Grange Park,Kensington Market",95,95,95,95,95,95
Christie,10,10,10,10,10,10
Church and Wellesley,84,84,84,84,84,84


Let's find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 213 uniques categories.


<b>Analyze each neighbourhood</b>

One hot encoding

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

Shape and head of resulting dataset

In [25]:
print(toronto_onehot.shape)
toronto_onehot.head()

(1746, 213)


Unnamed: 0,Afghan Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group rows by neighborhood and take the mean of the frequency of occurrence of each category

In [26]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.03,0.0,0.01,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,...,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.028169,0.0,0.014085,0.0,0.0,0.0,...,0.0,0.0,0.0,0.028169,0.0,0.0,0.014085,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,...,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.010204,0.0,0.010204,0.010204,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.010204,0.0,0.010204,0.010204,0.010204,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.010526,0.0,0.010526,0.0,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.042105,0.010526,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011905,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.0,0.011905,0.0


New size

In [27]:
toronto_grouped.shape

(38, 213)

Print each neighborhood along with the top 5 most common venues

In [28]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                venue  freq
0         Coffee Shop  0.07
1                Café  0.06
2          Steakhouse  0.04
3               Hotel  0.04
4  Seafood Restaurant  0.03


----Berczy Park----
            venue  freq
0     Coffee Shop  0.10
1      Restaurant  0.05
2    Cocktail Bar  0.05
3          Bakery  0.03
4  Breakfast Spot  0.03


----Brockton,Exhibition Place,Parkdale Village----
                    venue  freq
0             Coffee Shop  0.11
1                    Café  0.06
2              Restaurant  0.04
3  Furniture / Home Store  0.04
4          Sandwich Place  0.04


----Business Reply Mail Processing Centre 969 Eastern----
         venue  freq
0  Coffee Shop  0.09
1         Café  0.05
2        Hotel  0.04
3   Steakhouse  0.04
4          Bar  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                venue  freq
0         Coffee Shop  0.09
1  Italian Restaurant  0.07
2        

 Function to sort the venues in descending orde

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create new dataframe and display the top 10 venues for each neighborhood.

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Steakhouse,Hotel,Asian Restaurant,Restaurant,Bar,Gym,Gastropub,Burger Joint
1,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Seafood Restaurant,Hotel,Beer Bar,Steakhouse,Café,Breakfast Spot,Bakery
2,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Café,Bakery,Restaurant,Sandwich Place,Furniture / Home Store,Italian Restaurant,Supermarket,Beer Bar,Bar
3,Business Reply Mail Processing Centre 969 Eastern,Coffee Shop,Café,Bar,Hotel,Steakhouse,Pizza Place,Japanese Restaurant,Sushi Restaurant,Italian Restaurant,Restaurant
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Coffee Shop,Italian Restaurant,Bar,Café,Restaurant,Gym / Fitness Center,Pub,Speakeasy,Sandwich Place,Bakery
5,"Cabbagetown,St. James Town",Coffee Shop,Restaurant,Chinese Restaurant,Pet Store,Pizza Place,Café,Market,Italian Restaurant,Bakery,Breakfast Spot
6,Central Bay Street,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Tea Room,Plaza,Bubble Tea Shop,Italian Restaurant,Sushi Restaurant,Bakery,Sandwich Place
7,"Chinatown,Grange Park,Kensington Market",Café,Bar,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Chinese Restaurant,Vietnamese Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Ice Cream Shop
8,Christie,Café,Grocery Store,Athletics & Sports,Baby Store,Playground,Coffee Shop,Italian Restaurant,Farmers Market,Fast Food Restaurant,Yoga Studio
9,Church and Wellesley,Coffee Shop,Gay Bar,Japanese Restaurant,Restaurant,Sushi Restaurant,Dance Studio,Men's Store,Fast Food Restaurant,Bubble Tea Shop,Pub


<b>Cluster neighbourhoods</b>

Run k-means to cluster the neighborhood into 5 clusters.

In [31]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = neigh
toronto_merged.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace = True)


# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676845,-79.295225,0,Health Food Store,Pub,Trail,Yoga Studio,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
41,M4K,East Toronto,"The Danforth West,Riverdale",43.683262,-79.35512,0,Bus Line,Park,Discount Store,Grocery Store,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Fish Market
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.667965,-79.314673,0,Park,Sandwich Place,Pizza Place,Italian Restaurant,Pub,Movie Theater,Fast Food Restaurant,Fish & Chips Shop,Burrito Place,Liquor Store
43,M4M,East Toronto,Studio District,43.662766,-79.33483,0,Café,Italian Restaurant,Diner,Bakery,American Restaurant,Brewery,Bar,Arts & Crafts Store,Pizza Place,Coffee Shop
44,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085,4,Bus Line,Swim School,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant
45,M4P,Central Toronto,Davisville North,43.712815,-79.388526,0,Food & Drink Shop,Hotel,Park,Breakfast Spot,Gym,Clothing Store,Falafel Restaurant,Food,Flower Shop,Flea Market
46,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,0,Playground,Park,Gym Pool,Garden,Yoga Studio,Ethiopian Restaurant,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
47,M4S,Central Toronto,Davisville,43.703395,-79.385964,0,Dessert Shop,Café,Sandwich Place,Coffee Shop,Italian Restaurant,Pizza Place,Thai Restaurant,Indian Restaurant,Farmers Market,Seafood Restaurant
48,M4T,Central Toronto,"Moore Park,Summerhill East",43.690655,-79.383561,0,Playground,Convenience Store,Gym,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant
49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686083,-79.402335,0,Light Rail Station,Coffee Shop,Supermarket,Liquor Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Flea Market,Fish Market


Shape

In [33]:
toronto_merged.shape

(38, 16)

Create Map

In [34]:
# create map
map_clusters = folium.Map(location=[lat_lng_coords[0], lat_lng_coords[1]], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<b>Examine Clusters</b>

Cluster 1

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,East Toronto,0,Health Food Store,Pub,Trail,Yoga Studio,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
41,East Toronto,0,Bus Line,Park,Discount Store,Grocery Store,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Fish Market
42,East Toronto,0,Park,Sandwich Place,Pizza Place,Italian Restaurant,Pub,Movie Theater,Fast Food Restaurant,Fish & Chips Shop,Burrito Place,Liquor Store
43,East Toronto,0,Café,Italian Restaurant,Diner,Bakery,American Restaurant,Brewery,Bar,Arts & Crafts Store,Pizza Place,Coffee Shop
45,Central Toronto,0,Food & Drink Shop,Hotel,Park,Breakfast Spot,Gym,Clothing Store,Falafel Restaurant,Food,Flower Shop,Flea Market
46,Central Toronto,0,Playground,Park,Gym Pool,Garden,Yoga Studio,Ethiopian Restaurant,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
47,Central Toronto,0,Dessert Shop,Café,Sandwich Place,Coffee Shop,Italian Restaurant,Pizza Place,Thai Restaurant,Indian Restaurant,Farmers Market,Seafood Restaurant
48,Central Toronto,0,Playground,Convenience Store,Gym,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant
49,Central Toronto,0,Light Rail Station,Coffee Shop,Supermarket,Liquor Store,Yoga Studio,Falafel Restaurant,Food,Flower Shop,Flea Market,Fish Market
50,Downtown Toronto,0,Playground,Building,Park,Bank,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop


Cluster 2

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,Central Toronto,1,Park,Yoga Studio,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
82,West Toronto,1,Park,Yoga Studio,Ethiopian Restaurant,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


Cluster 3

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Central Toronto,2,Health & Beauty Service,Yoga Studio,Ethiopian Restaurant,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


Cluster 4

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
59,Downtown Toronto,3,Harbor / Marina,Café,Music Venue,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop


Cluster 5

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
44,Central Toronto,4,Bus Line,Swim School,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


<b>Analysing clusters</b>

Cluster 1: Center of Toronto neighbourhoods with lots of restaurants, shops and hotels. Label = 'City center'

Cluster 2: Outskirts of center with parks and urban areas. Label = 'Residential'

Cluster 3: Area primarily having health and welfare function. Label = 'Health and welfare'

Cluster 4: Harbour / marina. Label = 'Harbour'

Cluster 5: Event space and recreation. Label = 'Recreation'