# Segmenting and Clustering Neighborhoods in Toronto

###### Made by David Campos 

<br>In this is made for the Applied Data Science Capstone in Coursera. The step-by-step in the notebook is based on the week 3 lab about Segmenting and Clustering Neighborhoods. <br>
 **Important note:** The same notebook will be used for the 3 parts requested in the lab.

## 1. Importing data from Wikipedia using Pandas

Data will be imported from wikipedia's page that contains the Postcodes, Borough and Neighbourhoods from Canada. Data source: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [169]:
#Importing pandas library.
import pandas as pd 

# Importing the table html from the url using 'pandas.read_html'
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(wiki_url, header=0)

# Checking how many tables are in wiki_url.
print('There are {} tables in the wikipedia url'.format(len(df))) 

There are 3 tables in the wikipedia url


Pandas detects there are 3 tables in the html document, however we need the first table.

In [170]:
# Getting the Canada's Postcode table and cheking if correctly uploaded.
df = df[0]
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West
285,M8Z,Etobicoke,South of Bloor


Import have been made succesfully!

## 2. Cleaning the dataframe

### 2.1 Droping unlabeled rows

We will drop the rows that contains 'Not assigned' valued in the Borough column.

In [171]:
# First, checking how many rows does not have a borough assigned.

print( 'There are {} rows with no borought assigned'.format((df['Borough']=='Not assigned').sum()))

There are 77 rows with no borought assigned


In [202]:
# Dropping the rows where there is not an assigned Borough.

df= df[df['Borough'] != 'Not assigned']
df.reset_index(inplace=True) 
df.drop(['index'],axis=1,inplace=True)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout..."


The original table had 287 rows, after dropping the 'Not assigned' there remains 210 rows, We can infer that the we drop the 77 values correctly.

### 2.2 Replacing not defined neighbourhood names

In [203]:
# Replacing the Neighborhoods catalog as 'Not Assigned' with the Borough name.

df['Neighbourhood'].replace('Not assigned', df['Borough'], inplace=True)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout..."


Now, lets check if the replacement where done correctly.

In [174]:
# Checking using booleans.

print('Is there any Not assigned value left?:',(df['Neighbourhood'] == 'Not assigned').unique()) 

Is there any Not assigned value left?: [False]


Returning a False confirms that there is no remaining 'Not assigned' values. Also checking the new datframe with the original table from wikipedia confirms the replacement was done correctly.

### 2.3 Joining the neighbourhoods with same Postcode

In [175]:
# Joining the neigbourhoods that shares the same Postal Code.

df = df.groupby(['Postcode','Borough'], sort = False).agg(lambda x: ','.join(x))
df.reset_index(inplace=True) 
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout..."


# 3. Adding the Lat & Lon into the DataFrame

## 3.1 Importing and reading the file

In [176]:
# Importing the coordinates from the CSV file using wget.
import wget
url = 'http://cocl.us/Geospatial_data'
canada_geo = wget.download(url)

In [177]:
# Reading the file.
geo_df = pd.read_csv(canada_geo)
geo_df= geo_df.rename({'Postal Code': 'Postalcode'}, axis=1)  # Renaming the colum to avoid futher inconvenience
geo_df.head()

Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


## 3.1 Merging the coordinates into the original DF

In [178]:
# Merging and renaming the Dataframe.
canada_df = df.merge(geo_df, left_on='Postcode', right_on='Postalcode')
canada_df.drop('Postalcode', axis=1, inplace=True)   #Drop the reapeated column
canada_df.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937


Now we have a clean DataFrame with the Postal Codes, borough, Neighbourhoods, attitud and longitude.

# 4. Segmenting and Clustering Toronto Neighborhoods.

In [179]:
# Importing neccesary libraries

#Requests
import requests
from pandas.io.json import json_normalize 

#Matplotlib
import matplotlib.cm as cm
import matplotlib.colors as colors

#Sklearn Clusters
from sklearn.cluster import KMeans

#Folium 
import folium 

## 4.1 Filtering by Toronto boroughs

Know we will filter just the Toronto boroughs and neighbourhoods from canada_df.

In [415]:
# Creating a filtered Datframe for Toronto Boroughts and Neighbours using 'str.contains'
toronto_df = canada_df[canada_df['Borough'].str.contains('Toronto')]
toronto_df.reset_index(inplace=True)
toronto_df.drop('index', axis=1, inplace=True) 
toronto_df.head()   # Toronto Dataframe created.

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


## 4.2 Configuring the Foursquare API

In [364]:
# Setting up Foursquare API
LIMIT = 100
CLIENT_ID = '4LPKUYABGERSUZPXNIJERW1TIXMTL1SIG3HKAXJJWR4A0FHR'
CLIENT_SECRET = 'VKEMUOAD1GZM2HUSD0NETPYYX1JQ4WB4OCFE33FVMJL2GG1W'
VERSION = '20200101'

## 4.3 Getting venues nearby Toronto Neighbourhoods

In [365]:
# Defining the function to extract venues from each Toronto nearby 
def NearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [366]:
# Making a new dataframe
toronto_venues = NearbyVenues(names=toronto_df['Neighbourhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude'],
                                   radius=400) # Defining venues in a 400m radius

Harbourfront
Queen's Park
Ryerson,Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Dovercourt Village,Dufferin
Harbourfront East,Toronto Islands,Union Station
Little Portugal,Trinity
The Danforth West,Riverdale
Design Exchange,Toronto Dominion Centre
Brockton,Exhibition Place,Parkdale Village
The Beaches West,India Bazaar
Commerce Court,Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North,Forest Hill West
High Park,The Junction South
North Toronto West
The Annex,North Midtown,Yorkville
Parkdale,Roncesvalles
Davisville
Harbord,University of Toronto
Runnymede,Swansea
Moore Park,Summerhill East
Chinatown,Grange Park,Kensington Market
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city

In [367]:
# Checking the toronto_venues DataFrame
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [368]:
# Cheking the Shape
print(toronto_venues.shape)

(1309, 7)


Lets see wich neighboorhoods have the most venues in a radius of 400 meters

In [369]:
# Cheking how many venues are in each neighborhood(s) area.
toronto_venues['Neighborhood'].value_counts()

Ryerson,Garden District                                                                                 100
Harbourfront East,Toronto Islands,Union Station                                                         100
First Canadian Place,Underground city                                                                   100
Design Exchange,Toronto Dominion Centre                                                                 100
Commerce Court,Victoria Hotel                                                                           100
Adelaide,King,Richmond                                                                                   99
Central Bay Street                                                                                       74
Chinatown,Grange Park,Kensington Market                                                                  66
Stn A PO Boxes 25 The Esplanade                                                                          62
St. James Town              

## 4.4 Rearraging dataframe to apply Kmeans Clustering

In [370]:
# Rearaging to cluster similar nieghbors
toronto_cluster = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_cluster['Hood'] = toronto_venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [toronto_cluster.columns[-1]] + list(toronto_cluster.columns[:-1])
toronto_cluster = toronto_cluster[fixed_columns]

toronto_cluster.head(4)

Unnamed: 0,Hood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [371]:
# Cheking the DF shape.
print(toronto_cluster.shape)

(1309, 209)


In [372]:
# Group by the frequency of the venue categories in each hood.
toronto_cluster1 = toronto_cluster.groupby('Hood').mean().reset_index()

print(toronto_cluster1.shape)
toronto_cluster1.head()

(36, 209)


Unnamed: 0,Hood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,...,0.0,0.0,0.020202,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.076923,0.076923,0.076923,0.153846,0.076923,0.153846,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [373]:
# Making a function to show each hood top 10 venues

num_top_venues = 10  # Number of venues to show.

for hood in toronto_cluster1['Hood']:
    print("----"+hood+"----")
    temp = toronto_cluster1[toronto_cluster1['Hood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    
# Defining a function to show venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
        # create a new dataframe with top 10 venues.
toronto_topvenues = pd.DataFrame(columns=columns)
toronto_topvenues['Neighborhood'] = toronto_cluster1['Hood']

for ind in np.arange(toronto_cluster1.shape[0]):
    toronto_topvenues.iloc[ind, 1:] = return_most_common_venues(toronto_cluster1.iloc[ind, :], num_top_venues)

toronto_topvenues.head(10)

----Adelaide,King,Richmond----
                 venue  freq
0          Coffee Shop  0.06
1           Steakhouse  0.04
2         Burger Joint  0.03
3  Japanese Restaurant  0.03
4  American Restaurant  0.03
5                 Café  0.03
6     Asian Restaurant  0.03
7     Sushi Restaurant  0.03
8                  Bar  0.03
9          Pizza Place  0.02


----Berczy Park----
                     venue  freq
0             Cocktail Bar  0.12
1             Concert Hall  0.06
2             Liquor Store  0.06
3         Department Store  0.06
4  Comfort Food Restaurant  0.06
5                     Park  0.06
6              Coffee Shop  0.06
7                Nightclub  0.06
8                 Fountain  0.06
9        French Restaurant  0.06


----Brockton,Exhibition Place,Parkdale Village----
                venue  freq
0      Breakfast Spot  0.13
1                Café  0.13
2           Nightclub  0.13
3         Coffee Shop  0.13
4  Italian Restaurant  0.07
5                 Bar  0.07
6          Resta

                             venue  freq
0                       Playground  0.33
1                       Bike Trail  0.33
2                         Building  0.33
3                     Neighborhood  0.00
4        Middle Eastern Restaurant  0.00
5               Miscellaneous Shop  0.00
6       Modern European Restaurant  0.00
7  Molecular Gastronomy Restaurant  0.00
8              Monument / Landmark  0.00
9              Moroccan Restaurant  0.00


----Runnymede,Swansea----
              venue  freq
0              Café  0.09
1       Coffee Shop  0.09
2       Pizza Place  0.09
3  Sushi Restaurant  0.06
4         Bookstore  0.03
5     Burrito Place  0.03
6    Sandwich Place  0.03
7        Restaurant  0.03
8               Pub  0.03
9      Dessert Shop  0.03


----Ryerson,Garden District----
                       venue  freq
0                Coffee Shop  0.13
1             Clothing Store  0.05
2                       Café  0.03
3             Sandwich Place  0.03
4                      Hot

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Steakhouse,Asian Restaurant,Bar,Burger Joint,Japanese Restaurant,American Restaurant,Café,Sushi Restaurant,Office
1,Berczy Park,Cocktail Bar,Greek Restaurant,French Restaurant,Comfort Food Restaurant,Coffee Shop,Nightclub,Department Store,Breakfast Spot,Liquor Store,Concert Hall
2,"Brockton,Exhibition Place,Parkdale Village",Nightclub,Coffee Shop,Café,Breakfast Spot,Gym,Bar,Italian Restaurant,Pet Store,Climbing Gym,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Park,Spa
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Terminal,Airport,Bar,Plane,Coffee Shop,Rental Car Location,Boutique,Airport Food Court,Airport Gate
5,"Cabbagetown,St. James Town",Coffee Shop,Café,Restaurant,Pizza Place,Butcher,Italian Restaurant,Japanese Restaurant,Snack Place,Jewelry Store,Sandwich Place
6,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Chinese Restaurant,Café,Sushi Restaurant,Japanese Restaurant,Juice Bar,Spa,Thai Restaurant
7,"Chinatown,Grange Park,Kensington Market",Vietnamese Restaurant,Café,Vegetarian / Vegan Restaurant,Bar,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Dumpling Restaurant,Cocktail Bar,Dessert Shop
8,Christie,Café,Grocery Store,Baby Store,Italian Restaurant,Coffee Shop,Nightclub,Candy Store,Diner,Event Space,Ethiopian Restaurant
9,Church and Wellesley,Gay Bar,Coffee Shop,Restaurant,Gym,Men's Store,Japanese Restaurant,Burger Joint,Hotel,Bubble Tea Shop,Pub


In [382]:
# create a new dataframe with top 10 venues.
toronto_topvenues = pd.DataFrame(columns=columns)
toronto_topvenues['Neighborhood'] = toronto_cluster1['Hood']

for ind in np.arange(toronto_cluster1.shape[0]):
    toronto_topvenues.iloc[ind, 1:] = return_most_common_venues(toronto_cluster1.iloc[ind, :], num_top_venues)

toronto_topvenues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Steakhouse,Asian Restaurant,Bar,Burger Joint,Japanese Restaurant,American Restaurant,Café,Sushi Restaurant,Office
1,Berczy Park,Cocktail Bar,Greek Restaurant,French Restaurant,Comfort Food Restaurant,Coffee Shop,Nightclub,Department Store,Breakfast Spot,Liquor Store,Concert Hall
2,"Brockton,Exhibition Place,Parkdale Village",Nightclub,Coffee Shop,Café,Breakfast Spot,Gym,Bar,Italian Restaurant,Pet Store,Climbing Gym,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Park,Spa
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Terminal,Airport,Bar,Plane,Coffee Shop,Rental Car Location,Boutique,Airport Food Court,Airport Gate
5,"Cabbagetown,St. James Town",Coffee Shop,Café,Restaurant,Pizza Place,Butcher,Italian Restaurant,Japanese Restaurant,Snack Place,Jewelry Store,Sandwich Place
6,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Chinese Restaurant,Café,Sushi Restaurant,Japanese Restaurant,Juice Bar,Spa,Thai Restaurant
7,"Chinatown,Grange Park,Kensington Market",Vietnamese Restaurant,Café,Vegetarian / Vegan Restaurant,Bar,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Dumpling Restaurant,Cocktail Bar,Dessert Shop
8,Christie,Café,Grocery Store,Baby Store,Italian Restaurant,Coffee Shop,Nightclub,Candy Store,Diner,Event Space,Ethiopian Restaurant
9,Church and Wellesley,Gay Bar,Coffee Shop,Restaurant,Gym,Men's Store,Japanese Restaurant,Burger Joint,Hotel,Bubble Tea Shop,Pub


In [383]:
# Checking if rearrangement where done correctly (Compare with toronto_cluster1 #rows)
toronto_topvenues.shape

(36, 11)

## 4.5 Clustering Neighborhoods

In [384]:
# set number of clusters.
kclusters = 6

toronto_clustering = toronto_cluster1.drop('Hood', 1)

# Fitting Kmeans to toronto_clustering.
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_clustering)

# check cluster labels generated for each row in the dataframe.
kmeans.labels_[0:50] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 3,
       1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1])

By the cluster array and the toronto_topvenues dataframe looks like toronto neighbors are pretty similar.

In [385]:
# Adding cluster labels to toronto_topvenues.
toronto_topvenues.insert(0, 'Cluster Labels', kmeans.labels_)

# Merging top venues to toronto_merge=toronto_df (original dataframe for toronto neighborhoods)
toronto_merged = toronto_df
toronto_merged = toronto_merged.join(toronto_topvenues.set_index('Neighborhood'), on='Neighbourhood')

# Some neighborhoods did not return venues that met the search criteria
# Dropping those neighborhoods.
toronto_merged.dropna(inplace=True)

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1.0,Coffee Shop,Restaurant,Yoga Studio,Hotel,Pub,Electronics Store,Breakfast Spot,Spa,Bank,Bakery
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1.0,Coffee Shop,Gym,Fried Chicken Joint,Portuguese Restaurant,Café,Sandwich Place,Italian Restaurant,Park,Gas Station,Deli / Bodega
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,1.0,Coffee Shop,Clothing Store,Hotel,Café,Sandwich Place,Middle Eastern Restaurant,Cosmetics Shop,Tea Room,Bakery,Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1.0,Coffee Shop,Restaurant,Café,Hotel,BBQ Joint,Cocktail Bar,Italian Restaurant,Park,Diner,Beer Bar
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,5.0,Trail,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


In [386]:
# Adjusting the index and Cluster labels Column dataype
toronto_merged = toronto_merged.reset_index(drop=True)
toronto_merged['Cluster Labels']= toronto_merged['Cluster Labels'].astype('int64')
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1,Coffee Shop,Restaurant,Yoga Studio,Hotel,Pub,Electronics Store,Breakfast Spot,Spa,Bank,Bakery
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1,Coffee Shop,Gym,Fried Chicken Joint,Portuguese Restaurant,Café,Sandwich Place,Italian Restaurant,Park,Gas Station,Deli / Bodega
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Hotel,Café,Sandwich Place,Middle Eastern Restaurant,Cosmetics Shop,Tea Room,Bakery,Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Restaurant,Café,Hotel,BBQ Joint,Cocktail Bar,Italian Restaurant,Park,Diner,Beer Bar
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,5,Trail,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


## 4.6 Mapping the clusters in folium map

In [400]:
# create map

toronto_lat = 43.66599
toronto_lon = -79.3794
zoom =  12

map_clusters = folium.Map(location=[toronto_lat, toronto_lon], zoom_start=zoom)


# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Well, looks like most toronto's neighborhoods are similar regarding venues. Exceptions are mostly in the border side of toronto. Exception neighborhoods are: 
- High Park,The Junction South (3)
- The Beaches(5), 
- North Toronto West (0)
- Moore Park,Summerhill East (2)
- Forest Hill North,Forest Hill West (4)


## 4.7 Cheking each clustered-neighborhood top ten places and conclusions

### Cluster 0

I would say this is a Shopping-like area.

In [391]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,North Toronto West,0,Boutique,Park,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


### Cluster 1

Looks like Toronto citizens really likes coffee and eating in restaurants.

In [397]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront,1,Coffee Shop,Restaurant,Yoga Studio,Hotel,Pub,Electronics Store,Breakfast Spot,Spa,Bank,Bakery
1,Queen's Park,1,Coffee Shop,Gym,Fried Chicken Joint,Portuguese Restaurant,Café,Sandwich Place,Italian Restaurant,Park,Gas Station,Deli / Bodega
2,"Ryerson,Garden District",1,Coffee Shop,Clothing Store,Hotel,Café,Sandwich Place,Middle Eastern Restaurant,Cosmetics Shop,Tea Room,Bakery,Restaurant
3,St. James Town,1,Coffee Shop,Restaurant,Café,Hotel,BBQ Joint,Cocktail Bar,Italian Restaurant,Park,Diner,Beer Bar
5,Berczy Park,1,Cocktail Bar,Greek Restaurant,French Restaurant,Comfort Food Restaurant,Coffee Shop,Nightclub,Department Store,Breakfast Spot,Liquor Store,Concert Hall
6,Central Bay Street,1,Coffee Shop,Sandwich Place,Italian Restaurant,Chinese Restaurant,Café,Sushi Restaurant,Japanese Restaurant,Juice Bar,Spa,Thai Restaurant
7,Christie,1,Café,Grocery Store,Baby Store,Italian Restaurant,Coffee Shop,Nightclub,Candy Store,Diner,Event Space,Ethiopian Restaurant
8,"Adelaide,King,Richmond",1,Coffee Shop,Steakhouse,Asian Restaurant,Bar,Burger Joint,Japanese Restaurant,American Restaurant,Café,Sushi Restaurant,Office
9,"Dovercourt Village,Dufferin",1,Bakery,Park,Pharmacy,Bank,Music Venue,Middle Eastern Restaurant,Pizza Place,Bar,Gym / Fitness Center,Fast Food Restaurant
10,"Harbourfront East,Toronto Islands,Union Station",1,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Pizza Place,Italian Restaurant,Chinese Restaurant,Sporting Goods Shop,Sports Bar


### Cluster 2

This people in this area like to exercise and do outdoors activities aparrently.

In [398]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,"Moore Park,Summerhill East",2,Gym,Trail,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### Cluster 3

How many parks could be arround here? 

In [399]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"High Park,The Junction South",3,Park,Bed & Breakfast,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


### Cluster 4

Looks like this area is the bussiness side of toronto.

In [395]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,"Forest Hill North,Forest Hill West",4,Business Service,Yoga Studio,Comfort Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


### Cluster 5

I would consider doing yoga in nearby the beach, would you?

In [396]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,The Beaches,5,Trail,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
