## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto with coordinates below


##### Part ONE: creating a dataframe

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

##### To create the above dataframe:

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.


Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.


In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [24]:
from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(url).text
Canada_data = BeautifulSoup(source, 'lxml')

In [26]:
column_names = ['Postalcode','Borough','Neighborhood']
toronto = pd.DataFrame(columns = column_names)

In [29]:
content = Canada_data.find('div', class_='mw-parser-output')
table = content.table.tbody
postcode = 0
borough = 0
neighborhood = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text
            i = i + 1
        elif i == 1:
            borough = td.text
            i = i + 1
        elif i == 2: 
            neighborhood = td.text.strip('\n').replace(']','')
    toronto = toronto.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

In [31]:
toronto = toronto[toronto.Borough!='Not assigned']
toronto = toronto[toronto.Borough!= 0]
toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,toronto.shape[0]):
    if toronto.iloc[i][2] == 'Not assigned':
        toronto.iloc[i][2] = toronto.iloc[i][1]
        i = i+1
                                 
df = toronto.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern, Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union, Highla..."
2,M1E,Scarborough,"Guildwood, Morningside, West Hill, Guildwood, ..."
3,M1G,Scarborough,"Woburn, Woburn"
4,M1H,Scarborough,"Cedarbrae, Cedarbrae"
5,M1J,Scarborough,"Scarborough Village, Scarborough Village"
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park, E..."
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge, Clairlea, Gol..."
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village Wes..."
9,M1N,Scarborough,"Birch Cliff, Cliffside West, Birch Cliff, Clif..."


In [32]:
df.tail(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
93,M9A,Etobicoke,"Islington Avenue, Islington Avenue"
94,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ..."
95,M9C,Etobicoke,"Bloordale Gardens, Eringate, Markland Wood, Ol..."
96,M9L,North York,"Humber Summit, Humber Summit"
97,M9M,North York,"Emery, Humberlea, Emery, Humberlea"
98,M9N,York,"Weston, Weston"
99,M9P,Etobicoke,"Westmount, Westmount"
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."
102,M9W,Etobicoke,"Northwest, Northwest"


In [119]:
print(df.shape)

(103, 3)


##### PART TWO: Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.




In [33]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 11.3MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


##### lets check if geocoder is working

In [34]:
import geocoder
g = geocoder.google('Mountain View, CA')
g.latlng
(37.3860517, -122.0838511)

(37.3860517, -122.0838511)

##### good. now let's build the dataframe

In [35]:
def get_latlng(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords
    
get_latlng('M9A')

[43.66229908300005, -79.52819499999998]

In [42]:
postal_codes = df['Postalcode']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes.tolist() ]

In [37]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [47]:
df[df.Postalcode == 'M9A']

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
93,M9A,Etobicoke,"Islington Avenue, Islington Avenue",43.662299,-79.528195


In [46]:
df.head(15)

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern, Rouge, Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union, Highla...",43.785665,-79.158725
2,M1E,Scarborough,"Guildwood, Morningside, West Hill, Guildwood, ...",43.765815,-79.175193
3,M1G,Scarborough,"Woburn, Woburn",43.768369,-79.21759
4,M1H,Scarborough,"Cedarbrae, Cedarbrae",43.769688,-79.23944
5,M1J,Scarborough,"Scarborough Village, Scarborough Village",43.743125,-79.23175
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park, E...",43.726276,-79.263625
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge, Clairlea, Gol...",43.713054,-79.285055
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village Wes...",43.724235,-79.227925
9,M1N,Scarborough,"Birch Cliff, Cliffside West, Birch Cliff, Clif...",43.69677,-79.259967


In [45]:
df.tail(3)

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.68681,-79.557284
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.743145,-79.584664
102,M9W,Etobicoke,"Northwest, Northwest",43.71174,-79.579181


##### Part THREE Creating a map

In [127]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

# All requested packages already installed.



In [50]:
address = 'Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

  app.launch_new_instance()


##### Foursquare

In [52]:
CLIENT_ID = 'AJVRRHMYY3EEPYOZBIBNIVCSHPNNDGO50G0ZTYUEMJNWSTAB' # your Foursquare ID
CLIENT_SECRET = '1UCZY3IMQM02NFH0N3KPT2WWYH0UKVEVK2N0T2RKEWBKZIZP' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AJVRRHMYY3EEPYOZBIBNIVCSHPNNDGO50G0ZTYUEMJNWSTAB
CLIENT_SECRET:1UCZY3IMQM02NFH0N3KPT2WWYH0UKVEVK2N0T2RKEWBKZIZP


In [53]:
address = 'Toronto'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.653963 -79.387207


In [59]:
Etobicoke_data = df[df['Borough'] == 'Etobicoke'].reset_index(drop=True)
address1 = 'Etobicoke,Toronto'

geolocator1 = Nominatim()
location1 = geolocator1.geocode(address1)
latitude1 = location1.latitude
longitude1 = location1.longitude
print('The geograpical coordinate of Etobicoke are {}, {}.'.format(latitude1, longitude1))

The geograpical coordinate of Etobicoke are 43.671459150000004, -79.55249206611668.




In [61]:
map_eto = folium.Map(location=[latitude1, longitude1], zoom_start=11)

for lat, lng, label in zip(Etobicoke_data ['Latitude'], Etobicoke_data ['Longitude'], Etobicoke_data ['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_eto)  
    
map_eto

In [62]:
neighborhood_latitude = Etobicoke_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Etobicoke_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = Etobicoke_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude1, longitude1, VERSION, radius, LIMIT)

Latitude and longitude values of Humber Bay Shores, Mimico South, New Toronto, Humber Bay Shores, Mimico South, New Toronto are 43.61220000000003, -79.49514569099995.


In [63]:

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [64]:
results = requests.get(url).json()
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(5)

Unnamed: 0,name,categories,lat,lng
0,Metro,Supermarket,43.67489,-79.555697
1,Starbucks,Coffee Shop,43.674358,-79.555189
2,Shoppers Drug Mart,Pharmacy,43.674209,-79.555424
3,The Garden,Garden,43.671618,-79.553836
4,Lloyd Manor Park,Playground,43.672486,-79.554382


##### Venues of Etobicoke

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
          # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
Etobicoke_venues = getNearbyVenues(names=Etobicoke_data['Neighborhood'],
                                   latitudes=Etobicoke_data['Latitude'],
                                   longitudes=Etobicoke_data['Longitude']
                                  )

Humber Bay Shores, Mimico South, New Toronto, Humber Bay Shores, Mimico South, New Toronto
Alderwood, Long Branch, Alderwood, Long Branch
The Kingsway, Montgomery Road, Old Mill North, The Kingsway, Montgomery Road, Old Mill North
Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea, Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea
Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor, Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
Islington Avenue, Islington Avenue
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park, Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe, Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
West

In [68]:
print(Etobicoke_venues.shape)
Etobicoke_venues.head()

(80, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Humber Bay Shores, Mimico South, New Toronto, ...",43.6122,-79.495146,No Frills,43.612186,-79.497828,Grocery Store
1,"Humber Bay Shores, Mimico South, New Toronto, ...",43.6122,-79.495146,Mimico Arena,43.612739,-79.498682,Skating Rink
2,"Humber Bay Shores, Mimico South, New Toronto, ...",43.6122,-79.495146,TD Canada Trust,43.613182,-79.489958,Bank
3,"Humber Bay Shores, Mimico South, New Toronto, ...",43.6122,-79.495146,7-Eleven,43.613076,-79.490047,Convenience Store
4,"Humber Bay Shores, Mimico South, New Toronto, ...",43.6122,-79.495146,Canadiana Restaurant,43.613588,-79.489666,Breakfast Spot


In [69]:
Etobicoke_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown, Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",15,15,15,15,15,15
"Alderwood, Long Branch, Alderwood, Long Branch",4,4,4,4,4,4
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe, Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe",6,6,6,6,6,6
"Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park, Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park",1,1,1,1,1,1
"Humber Bay Shores, Mimico South, New Toronto, Humber Bay Shores, Mimico South, New Toronto",8,8,8,8,8,8
"Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea, Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea",4,4,4,4,4,4
"Islington Avenue, Islington Avenue",8,8,8,8,8,8
"Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips, Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips",3,3,3,3,3,3
"Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor, Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor",17,17,17,17,17,17
"Northwest, Northwest",7,7,7,7,7,7


In [70]:
print('There are {} uniques categories.'.format(len(Etobicoke_venues['Venue Category'].unique())))

There are 49 uniques categories.


In [71]:
# one hot encoding
Etobicoke_onehot = pd.get_dummies(Etobicoke_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Etobicoke_onehot['Neighborhood'] = Etobicoke_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Etobicoke_onehot.columns[-1]] + list(Etobicoke_onehot.columns[:-1])
Etobicoke_onehot = Etobicoke_onehot[fixed_columns]

Etobicoke_onehot.head()

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bank,Beer Store,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Bus Line,Café,Caribbean Restaurant,Carpet Store,Chinese Restaurant,Coffee Shop,College Rec Center,Convenience Store,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Gas Station,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Indian Restaurant,Italian Restaurant,Liquor Store,Mattress Store,Middle Eastern Restaurant,Miscellaneous Shop,Optical Shop,Park,Pharmacy,Pizza Place,Pool,Print Shop,Pub,Restaurant,Sandwich Place,Shopping Mall,Skating Rink,Storage Facility,Sushi Restaurant,Thai Restaurant,Video Store
0,"Humber Bay Shores, Mimico South, New Toronto, ...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Humber Bay Shores, Mimico South, New Toronto, ...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,"Humber Bay Shores, Mimico South, New Toronto, ...",0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Humber Bay Shores, Mimico South, New Toronto, ...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Humber Bay Shores, Mimico South, New Toronto, ...",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [73]:
Etobicoke_onehot.shape

(80, 50)

In [74]:
Etobicoke_grouped = Etobicoke_onehot.groupby('Neighborhood').mean().reset_index()
Etobicoke_grouped

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bank,Beer Store,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Bus Line,Café,Caribbean Restaurant,Carpet Store,Chinese Restaurant,Coffee Shop,College Rec Center,Convenience Store,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Gas Station,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Indian Restaurant,Italian Restaurant,Liquor Store,Mattress Store,Middle Eastern Restaurant,Miscellaneous Shop,Optical Shop,Park,Pharmacy,Pizza Place,Pool,Print Shop,Pub,Restaurant,Sandwich Place,Shopping Mall,Skating Rink,Storage Facility,Sushi Restaurant,Thai Restaurant,Video Store
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.133333,0.0,0.0,0.066667,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667
1,"Alderwood, Long Branch, Alderwood, Long Branch",0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
3,"Cloverdale, Islington, Martin Grove, Princess ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Humber Bay Shores, Mimico South, New Toronto, ...",0.0,0.0,0.0,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
5,"Humber Bay, King's Mill Park, Kingsway Park So...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
6,"Islington Avenue, Islington Avenue",0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0
7,"Kingsview Village, Martin Grove Gardens, Richv...",0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Kingsway Park South West, Mimico NW, The Queen...",0.0,0.058824,0.0,0.058824,0.0,0.0,0.058824,0.058824,0.176471,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0
9,"Northwest, Northwest",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0


In [75]:
Etobicoke_grouped.shape

(12, 50)

In [76]:
num_top_venues = 5

for hood in Etobicoke_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Etobicoke_grouped[Etobicoke_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown, Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0         Grocery Store  0.13
1           Video Store  0.07
2  Caribbean Restaurant  0.07
3              Gym Pool  0.07
4          Liquor Store  0.07


----Alderwood, Long Branch, Alderwood, Long Branch----
                venue  freq
0                 Gym  0.25
1  Athletics & Sports  0.25
2                 Pub  0.25
3   Convenience Store  0.25
4            Pharmacy  0.00


----Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe, Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe----
                venue  freq
0  College Rec Center  0.17
1       Grocery Store  0.17
2       Shopping Mall  0.17
3   Fish & Chips Shop  0.17
4   Electronics Store  0.17


----Cloverdale, Islington, Martin Grove, Princess Gardens, We

In [77]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [79]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Etobicoke_grouped['Neighborhood']

for ind in np.arange(Etobicoke_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Etobicoke_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Video Store,Hardware Store,Caribbean Restaurant,Coffee Shop,Fast Food Restaurant,Park,Pharmacy,Pizza Place,Gym Pool
1,"Alderwood, Long Branch, Alderwood, Long Branch",Gym,Athletics & Sports,Pub,Convenience Store,Carpet Store,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",Carpet Store,Fish & Chips Shop,Shopping Mall,Electronics Store,College Rec Center,Grocery Store,Video Store,Fried Chicken Joint,Fast Food Restaurant,Eastern European Restaurant
3,"Cloverdale, Islington, Martin Grove, Princess ...",Print Shop,Video Store,Caribbean Restaurant,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant,Convenience Store,College Rec Center
4,"Humber Bay Shores, Mimico South, New Toronto, ...",Convenience Store,Grocery Store,Skating Rink,Bank,Beer Store,Breakfast Spot,Indian Restaurant,Chinese Restaurant,Fried Chicken Joint,Fish & Chips Shop


#### Clustering

In [80]:
# set number of clusters
kclusters = 5

Etobicoke_grouped_clustering = Etobicoke_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Etobicoke_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 0, 1, 0, 2, 0, 4, 2, 2], dtype=int32)

In [82]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Etobicoke_merged = Etobicoke_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Etobicoke_merged = Etobicoke_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Etobicoke_merged.head() # check the last columns!

ValueError: cannot insert Cluster Labels, already exists

In [83]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Etobicoke_merged['Latitude'], Etobicoke_merged['Longitude'], Etobicoke_merged['Neighborhood'], Etobicoke_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [84]:
Etobicoke_merged.loc[Etobicoke_merged['Cluster Labels'] == 0, Etobicoke_merged.columns[[1] + list(range(5, Etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Etobicoke,0,Convenience Store,Grocery Store,Skating Rink,Bank,Beer Store,Breakfast Spot,Indian Restaurant,Chinese Restaurant,Fried Chicken Joint,Fish & Chips Shop
5,Etobicoke,0,Pharmacy,Park,Grocery Store,Skating Rink,Shopping Mall,Bank,Café,Caribbean Restaurant,Fast Food Restaurant,Electronics Store
7,Etobicoke,0,Carpet Store,Fish & Chips Shop,Shopping Mall,Electronics Store,College Rec Center,Grocery Store,Video Store,Fried Chicken Joint,Fast Food Restaurant,Eastern European Restaurant


In [85]:
Etobicoke_merged.loc[Etobicoke_merged['Cluster Labels'] == 1, Etobicoke_merged.columns[[1] + list(range(5, Etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Etobicoke,1,Print Shop,Video Store,Caribbean Restaurant,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant,Convenience Store,College Rec Center


In [86]:
Etobicoke_merged.loc[Etobicoke_merged['Cluster Labels'] == 2, Etobicoke_merged.columns[[1] + list(range(5, Etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Etobicoke,2,Gym,Athletics & Sports,Pub,Convenience Store,Carpet Store,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant
3,Etobicoke,2,Sushi Restaurant,Fast Food Restaurant,Italian Restaurant,Coffee Shop,Video Store,Caribbean Restaurant,Fish & Chips Shop,Electronics Store,Eastern European Restaurant,Convenience Store
4,Etobicoke,2,Burrito Place,Gym,Eastern European Restaurant,Thai Restaurant,Gym / Fitness Center,Coffee Shop,Mattress Store,Middle Eastern Restaurant,Miscellaneous Shop,Optical Shop
10,Etobicoke,2,Grocery Store,Video Store,Hardware Store,Caribbean Restaurant,Coffee Shop,Fast Food Restaurant,Park,Pharmacy,Pizza Place,Gym Pool
11,Etobicoke,2,Gym,Sandwich Place,Coffee Shop,Middle Eastern Restaurant,Restaurant,Gas Station,Storage Facility,Carpet Store,Fish & Chips Shop,Fast Food Restaurant


In [87]:
Etobicoke_merged.loc[Etobicoke_merged['Cluster Labels'] == 3, Etobicoke_merged.columns[[1] + list(range(5, Etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Etobicoke,3,Park,Pool,Video Store,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant,Convenience Store,College Rec Center,Coffee Shop


In [88]:
Etobicoke_merged.loc[Etobicoke_merged['Cluster Labels'] == 4, Etobicoke_merged.columns[[1] + list(range(5, Etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Etobicoke,4,Pizza Place,Sandwich Place,Coffee Shop,Chinese Restaurant,Video Store,Caribbean Restaurant,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant
9,Etobicoke,4,Arts & Crafts Store,Bus Line,Pizza Place,Carpet Store,Fried Chicken Joint,Fish & Chips Shop,Fast Food Restaurant,Electronics Store,Eastern European Restaurant,Convenience Store
