# Segmenting and Clustering Neighborhoods in Toronto

## Scraping Data from Web page

This notebook represents the Peer-graded assignment, Segmenting and clustering neighbourhoods in Toront. First part of it is Scraping necessary data from web page. I will first scrap data from [Wikipedia](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)
and prepare it for further use in assignment.

In [1]:
import pandas as pd
import numpy as np

In order to obtain postal codes of Toronto neighbourhoods I will first install *lxml* library. It is feature-rich and easy to use library for processing **XML** and **HTML** in Python language.


In [2]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/64/28/0b761b64ecbd63d272ed0e7a6ae6e4402fc37886b59181bfdf274424d693/lxml-4.6.1-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 6.7MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.6.1
Note: you may need to restart the kernel to use updated packages.


pd.raed_html is a function that search for table elements on web page. It returns list of DataFrames.

In [3]:
table = pd.read_html('http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header = 0)
table[0]

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Now I will create DataFrame by passing column names for header and list of DataFrames scraped from web page for populating rows.

In [4]:
column_n = ['Postal Code', 'Borough', 'Neighbourhood']
df = pd.DataFrame(table[0], columns = column_n)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
#Extracting only data for which Borough is assigned.
df1 =df[df['Borough'] != 'Not assigned']
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
df1.shape #There is 103 rows and three columns

(103, 3)

In [7]:
df1.reset_index(drop = True, inplace = True) #Reseting index after discarding data for whom Borough is not assigned
df1.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [8]:
df1['Postal Code'].value_counts() #Checking for duplicate Postal Codes. There is no one.

M4G    1
M4M    1
M1L    1
M1W    1
M1K    1
      ..
M2L    1
M6H    1
M6N    1
M3L    1
M9A    1
Name: Postal Code, Length: 103, dtype: int64

In [9]:
df1['Neighbourhood'] == 'Not assigned' #Checking if all Neighbourhoods are assigned to specific name.

0      False
1      False
2      False
3      False
4      False
       ...  
98     False
99     False
100    False
101    False
102    False
Name: Neighbourhood, Length: 103, dtype: bool

## Adding Latitude and Longitude to Dataframe

Unfortunately I was unable to obtain latitude and longitude coordinates by using geocoder. As alternative I have used geospatal data from [GeoCoordinates](https://cocl.us/Geospatial_data). 

In [10]:
geo_data = pd.read_csv('Geospatial_Coordinates.csv', header = 0)
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [12]:
#merging data from different dataframes based on Postal Code values 
new_df = pd.merge(df1, geo_data, how = 'left', on = 'Postal Code')
new_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [13]:
print('Toronto has {} boroughs and {} neighbourhoods'.format(len(new_df['Borough'].unique()), new_df.shape[0]))

Toronto has 10 boroughs and 103 neighbourhoods


In [14]:
new_df['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

I wish to examine Downtown Toronto and provide analisysis for this borough.

## Segmenting, mapping and clustering data for Downtown Toronto Borough

Installing necessary libraries and packages

In [15]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

In [16]:
!conda install -c conda-forge folium=0.5.0 --yes

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    attrs-20.3.0               |     pyhd3deb0d_0          41 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    brotlipy-0.7.0             |py36he6145b8_1001         347 KB  conda-forge
    chardet-3.0.4              |py36h9880bd3_1008         194 KB  c

In [17]:
from geopy.geocoders import Nominatim
import folium
from sklearn.cluster import KMeans
import requests
from pandas.io.json import json_normalize

In [18]:
downtown_toronto = new_df[new_df['Borough'] == 'Downtown Toronto'].reset_index(drop = True)
downtown_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


Lets extract Geografical coordinates for Downtown Toronto borough.

In [19]:
address = 'Downtown Toronto, Toronto'
geolocator = Nominatim(user_agent = 'downtown_toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geografical coordinates of Downtown Toronto borough are: latitude {} and longitude {}'.format(latitude, longitude))

The geografical coordinates of Downtown Toronto borough are: latitude 43.6563221 and longitude -79.3809161


In [20]:
#create map of Downtown Toronto and all its neighbourhoods
dt_map = folium.Map(location = [latitude, longitude], zoom_start = 13)

#add markers to map
for lat, lng, label in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat, lng],
    radius = 10,
    popup = label,
    color = 'blue',
    fill = True,
    fill_color = 'purple',
    fill_opacity = .6,
    parse_html = False).add_to(dt_map)
dt_map  

**Define Forsquare credentials and version**

In [21]:
client_id = '0U4M3ZHKDLOWLMYZNTHUPMWF0HUMVGG5VADKKNRULDMEGW4M'
client_secret = '3A1EPFMKK1VQ1QRTT455CMJ2EK3ZE0WQ23M3RK4AQSKYTHSX'
version = '20201101'
limit = 100

**Let's  explore St. James Town neighbourhood in of North York**

In [22]:
downtown_toronto .loc[3, 'Neighbourhood']

'St. James Town'

In [23]:
#Latitude and Longitude
st_james_latitude = downtown_toronto.loc[3, 'Latitude']
st_james_longitude = downtown_toronto.loc[3, 'Longitude']
name = downtown_toronto.loc[3, 'Neighbourhood']

print('Latitude and Longitude values of {} are {} and {}'.format(name, st_james_latitude, st_james_longitude))


Latitude and Longitude values of St. James Town are 43.6514939 and -79.3754179


In [24]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(client_id, client_secret,st_james_latitude, st_james_longitude, version, radius, limit)
url

'https://api.foursquare.com/v2/venues/explore?client_id=0U4M3ZHKDLOWLMYZNTHUPMWF0HUMVGG5VADKKNRULDMEGW4M&client_secret=3A1EPFMKK1VQ1QRTT455CMJ2EK3ZE0WQ23M3RK4AQSKYTHSX&ll=43.6514939,-79.3754179&v=20201101&radius=500&limit=100'

In [25]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fafc4f3a5421e17cc3f6fd7'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'St. Lawrence',
  'headerFullLocation': 'St. Lawrence, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 85,
  'suggestedBounds': {'ne': {'lat': 43.6559939045, 'lng': -79.36921018606671},
   'sw': {'lat': 43.646993895499996, 'lng': -79.3816256139333}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '574ad72238fa943556d93b8e',
       'name': 'Gyu-Kaku Japanese BBQ',
       'location': {'address': '81 Church St',
        'crossStreet': 'at Adelaide St E',
        'lat': 43.651422275497914,
        'lng': -79.37504693687086,
        'labeledLatLngs':

In [26]:
#function that extracts venue category
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else: 
        return categories_list[0]['name']

Get relevant part of JSON and transform it into the dataframe

In [27]:
venues = results['response']['groups'][0]['items']
nearby_venues = pd.json_normalize(venues)

#filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

#filter category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)

#clean columns
nearby_venues.columns = [col.split('.')[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Gyu-Kaku Japanese BBQ,Japanese Restaurant,43.651422,-79.375047
1,GEORGE Restaurant,Restaurant,43.653346,-79.374445
2,Terroni,Italian Restaurant,43.650927,-79.375602
3,Pearl Diver,Gastropub,43.651481,-79.3736
4,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719


In [28]:
print ('{} nearby venues are returned by forsquare.api'.format(nearby_venues.shape[0]))

85 nearby venues are returned by forsquare.api


## Explore Neighbourhoods in Downtown Toronto


In [29]:
#Function for extracting data about all neighbourhoods from Downtown Toronto
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #create the api request url
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
        #make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat, 
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results   
        ])
        
    nearby_venues = pd.DataFrame([item for venues_list in venues_list for item in venues_list])
    nearby_venues.columns = ['Neighbourhood', 'Neighbourhood Latitude', 'Neighbourhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return nearby_venues

In [30]:
downtown_toronto_venues = getNearbyVenues(names = downtown_toronto['Neighbourhood'], 
                                          latitudes = downtown_toronto['Latitude'], 
                                          longitudes = downtown_toronto['Longitude'])
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head(10)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
(1248, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
5,"Regent Park, Harbourfront",43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub
6,"Regent Park, Harbourfront",43.65426,-79.360636,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
7,"Regent Park, Harbourfront",43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park
8,"Regent Park, Harbourfront",43.65426,-79.360636,The Extension Room,43.653313,-79.359725,Gym / Fitness Center
9,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [31]:
downtown_toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,55,55,55,55,55,55
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,68,68,68,68,68,68
Christie,16,16,16,16,16,16
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",74,74,74,74,74,74


In [32]:
print('There are {} unique venue categories in Downtown Toronto.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 212 unique venue categories in Downtown Toronto.


## Analyze each neighbourhood

In [33]:
#one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix = '', prefix_sep = '')

#add Neighbourhood column back to the dataframe
downtown_toronto_onehot['Neighbourhood'] = downtown_toronto_venues['Neighbourhood']

#move Neighbourhood column to the front of dataframe
fixed_col = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])

downtown_toronto_onehot = downtown_toronto_onehot[fixed_col]                                                        

downtown_toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
downtown_toronto_onehot.shape

(1248, 213)

In [35]:
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighbourhood').mean().reset_index()
downtown_toronto_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0625,0.0625,0.0625,0.125,0.125,0.0625,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.014706
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,...,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.026667
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.054054,0.0,0.040541,0.013514,0.0,0.0


In [36]:
downtown_toronto_grouped.shape

(19, 213)

In [37]:
#print each neighbourhood with top five venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighbourhood']:
    print('____'+hood+'____')
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')
    

____Berczy Park____
                venue  freq
0         Coffee Shop  0.09
1  Seafood Restaurant  0.04
2         Cheese Shop  0.04
3        Cocktail Bar  0.04
4            Beer Bar  0.04


____CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport____
             venue  freq
0   Airport Lounge  0.12
1  Airport Service  0.12
2            Plane  0.06
3          Airport  0.06
4         Boutique  0.06


____Central Bay Street____
                venue  freq
0         Coffee Shop  0.18
1                Café  0.06
2  Italian Restaurant  0.04
3      Sandwich Place  0.04
4     Bubble Tea Shop  0.03


____Christie____
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3     Restaurant  0.06
4     Baby Store  0.06


____Church and Wellesley____
                 venue  freq
0          Coffee Shop  0.09
1  Japanese Restaurant  0.05
2              Gay Bar  0.05
3     Sushi Restaurant  0.05
4           Res

**Let`s put that into pandas dataframe**

In [38]:
#Function for sorting venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    return row_categories_sorted.index.values[0:num_top_venues]

Lets create new dataframe with top 10 venues from each neighbourhood.

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#create columns acording to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most common venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most common venue'.format(ind + 1))   

In [40]:
#create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns = columns)
neighbourhoods_venues_sorted['Neighbourhood'] = downtown_toronto_grouped['Neighbourhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,Berczy Park,Coffee Shop,Cheese Shop,Bakery,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Breakfast Spot,Shopping Mall
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Harbor / Marina,Bar,Coffee Shop,Plane,Rental Car Location,Sculpture Garden,Boutique,Boat or Ferry
2,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Salad Place,Department Store,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Thai Restaurant
3,Christie,Grocery Store,Café,Park,Candy Store,Restaurant,Italian Restaurant,Baby Store,Athletics & Sports,Nightclub,Coffee Shop
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Pub,Men's Store,Mediterranean Restaurant,Hotel,Yoga Studio
5,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,Seafood Restaurant,Japanese Restaurant,American Restaurant,Deli / Bodega,Cocktail Bar
6,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Restaurant,Gym,Japanese Restaurant,Salad Place,Seafood Restaurant,Asian Restaurant,Steakhouse
7,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Bookstore,Diner,Lingerie Store,Italian Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Aquarium,Hotel,Café,Fried Chicken Joint,Scenic Lookout,Pizza Place,Brewery,Restaurant,Park
9,"Kensington Market, Chinatown, Grange Park",Bar,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Café,Vietnamese Restaurant,Park,Gaming Cafe,Dumpling Restaurant,Burger Joint


# Cluster Neighbourhoods

In [41]:
#set number of clusters
kclusters = 5

downtown_toronto_grouped_clust = downtown_toronto_grouped.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(downtown_toronto_grouped_clust)
kmeans.labels_[0 : 10]


array([1, 2, 1, 4, 1, 1, 1, 1, 1, 1], dtype=int32)

In [50]:
#add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labeles', kmeans.labels_)
downtown_toronto_merged = downtown_toronto 

# adding latitude and longitude for each neighbourgh
downtown_toronto_merged = downtown_toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Neighbourhood')
downtown_toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Shoe Store,Brewery,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Yoga Studio,Portuguese Restaurant,Italian Restaurant,Smoothie Shop,Beer Bar,Sandwich Place,Restaurant,Distribution Center,Diner
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Bookstore,Diner,Lingerie Store,Italian Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Restaurant,Cocktail Bar,Gastropub,Beer Bar,American Restaurant,Clothing Store,Cosmetics Shop,Seafood Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cheese Shop,Bakery,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Breakfast Spot,Shopping Mall


In [51]:
#matplotlib and associate ploting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


In [55]:
#let's create a map

map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 12)

#color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]

colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to the map

markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighbourhood'], downtown_toronto_merged['Cluster Labeles']):
    label = folium.Popup(str(poi) + 'Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
    [lat, lon],
    radius = 5,
    popup = label,
    color = rainbow[cluster - 1],
    fill = True,
    fill_color = rainbow[cluster - 1],
    fill_opacity = .7).add_to(map_clusters)
    
map_clusters

## Examine clusters

Cluster 1

In [56]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labeles'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,Downtown Toronto,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Shoe Store,Brewery,Restaurant
1,Downtown Toronto,0,Coffee Shop,Yoga Studio,Portuguese Restaurant,Italian Restaurant,Smoothie Shop,Beer Bar,Sandwich Place,Restaurant,Distribution Center,Diner


In [57]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labeles'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))] ]

Unnamed: 0,Borough,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
2,Downtown Toronto,1,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Bookstore,Diner,Lingerie Store,Italian Restaurant
3,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Cocktail Bar,Gastropub,Beer Bar,American Restaurant,Clothing Store,Cosmetics Shop,Seafood Restaurant
4,Downtown Toronto,1,Coffee Shop,Cheese Shop,Bakery,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Breakfast Spot,Shopping Mall
5,Downtown Toronto,1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Salad Place,Department Store,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Thai Restaurant
7,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Gym,Bar,Hotel,Clothing Store,Thai Restaurant,Steakhouse,Office
8,Downtown Toronto,1,Coffee Shop,Aquarium,Hotel,Café,Fried Chicken Joint,Scenic Lookout,Pizza Place,Brewery,Restaurant,Park
9,Downtown Toronto,1,Coffee Shop,Hotel,Restaurant,Café,American Restaurant,Seafood Restaurant,Salad Place,Japanese Restaurant,Bakery,Asian Restaurant
10,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,Gym,Seafood Restaurant,Japanese Restaurant,American Restaurant,Deli / Bodega,Cocktail Bar
11,Downtown Toronto,1,Café,Bookstore,Sandwich Place,Bar,Japanese Restaurant,Bakery,Yoga Studio,Italian Restaurant,Beer Bar,Beer Store
12,Downtown Toronto,1,Bar,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Café,Vietnamese Restaurant,Park,Gaming Cafe,Dumpling Restaurant,Burger Joint


In [58]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labeles'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))] ]

Unnamed: 0,Borough,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
13,Downtown Toronto,2,Airport Lounge,Airport Service,Harbor / Marina,Bar,Coffee Shop,Plane,Rental Car Location,Sculpture Garden,Boutique,Boat or Ferry


In [59]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labeles'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))] ]

Unnamed: 0,Borough,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
14,Downtown Toronto,3,Park,Playground,Trail,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


In [60]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labeles'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))] ]

Unnamed: 0,Borough,Cluster Labeles,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
6,Downtown Toronto,4,Grocery Store,Café,Park,Candy Store,Restaurant,Italian Restaurant,Baby Store,Athletics & Sports,Nightclub,Coffee Shop
