# __Segmenting and Clustering Neighborhoods in Toronto__ Part 1


<br/>

## *Web Scraping*

<br/>

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

<br/>

1. Start by creating a new Notebook for this assignment.

<br/>

2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

<br/>

![alt text](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1581984000000&hmac=aqqnfeTZdyKUZ-RkUdcZZEunf_3-V_IR0cy_wrB4KTw)

<br/>

3. To create the above dataframe:

<br/>

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is __Not assigned.__
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that __M5A__ is listed twice and has two neighborhoods: __Harbourfront__ and __Regent Park__. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in __row 11__ in the above table.
- If a cell has a borough but a __Not assigned neighborhood__, then the neighborhood will be the same as the borough. So for the __9th__ cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be __Queen's Park.__
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the __.shape__ method to print the number of rows of your dataframe.

<br/>

4. Submit a link to your Notebook on your Github repository. __(10 marks)__

<br/>

__Note__: *There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas to read the table into a pandas dataframe.*

*Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/*

*The package is so popular that there is a plethora of tutorials and examples on how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k*

*Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.*

<br/>
<br/>

___

<br/>

## Install Newest Version of Pandas for new functionalities

<br/>

In [0]:
!pip3 -q install pandas==1.0.1.

<br/>

## Import Necessary Packages for Web Scraping

<br/>

In [49]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
print("Pandas Version: " + pd.__version__)

Pandas Version: 1.0.1


<br/>

## Web Scrap for the List of Postal Codes of Canada using __Beautiful Soup__ with *html.parser*

<br/>

In [0]:
url = "http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
postal_codes_canada  = requests.get(url)
soup = BeautifulSoup(postal_codes_canada.text, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

<br/>

## Find Columns Headers of Table

<br/>

In [51]:
column_headers = table.find_all('th')
column_headers = [c.text.replace('\n', '') for c in column_headers]
print(column_headers)

['Postcode', 'Borough', 'Neighbourhood']


<br/>

## Set Columns Headers of Table to a new DataFrame: __df_postal_codes__. Then, print the column headers.

<br/>

In [52]:
df_postal_codes = pd.DataFrame(columns = column_headers)
df_postal_codes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood


<br/>

## Scrape for the column data in the table. Append the scraped data into the DataFrame and organize the DataFrame based on the directions in __STEP 3__.

<br/>

In [0]:
  num_rows = table.find_all('tr')
  for i in range(1,len(num_rows)):
    rows = num_rows[i].find_all('td')
    row = [rows[0].text, rows[1].text, rows[2].text.replace('\n','')] 
    if rows[2].text.replace('\n','') == 'Not assigned' and rows[1].text != 'Not assigned':
      row = [rows[0].text, rows[1].text, rows[1].text] 
      df_postal_codes = df_postal_codes.append(pd.Series(row, index = column_headers), ignore_index = True)
    elif rows[1].text != 'Not Assigned' and rows[2].text.replace('\n','') != 'Not assigned':
      row = [rows[0].text, rows[1].text, rows[2].text.replace('\n','')]
      df_postal_codes = df_postal_codes.append(pd.Series(row, index = column_headers), ignore_index = True)
      if i != 1 and row[0] == row_prev[0]:
        row = [rows[0].text, rows[1].text, rows[2].text.replace('\n','') + ", " + row_prev[2]]
        df_postal_codes = df_postal_codes.append(pd.Series(row, index = column_headers), ignore_index = True)
    row_prev = row

<br/>

## Remove Duplicate Rows with the same __Postcode__ based on directions in __STEP 3__.

<br/>

In [0]:
df_postal_codes.drop_duplicates(subset = ['Postcode'], keep = 'last', inplace = True, ignore_index = True)

<br/>

## Print First 20 Rows in the DataFrame, __df_postal_codes__.

<br/>

In [55]:
df_postal_codes.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


<br/>

## Print Number of Rows using the __.shape__ method based on directions in __STEP 3__.

<br/>

In [56]:
df_postal_codes.shape

(103, 3)

<br/>

## Convert DataFrame to __.csv__ File to use in the next portion of the assignment.

<br/>

In [0]:
df_postal_codes.to_csv("postal_codes_canada_m.csv")

<br/>

# __Segmenting and Clustering Neighborhoods in Toronto__ Part 2

<br/>

## *Adding Latitude and Longitude Coordinates to Postal Codes*

<br/>


Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

<br/>


In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

<br/>




## Geocoder did not work

<br/>

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

<br/>

Use the Geocoder package or the csv file to create the following dataframe:

<br/>

![alt text](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/HZ3jNHNOEeiMwApe4i-fLg_f44f0f10ccfaf42fcbdba9813364e173_Screen-Shot-2018-06-18-at-7.18.16-PM.png?expiry=1582070400000&hmac=pJ5WPxucteThdms8RjVnATQ9zADCb6v4WwZSBiAQJYY)

<br/>

Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

<br/>

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. __(2 marks)__

<br/>

___

<br/>

## Install Newest Version of Pandas for new functionalities

<br/>

In [0]:
!pip3 -q install pandas==1.0.1.

<br/>

## Import Necessary Packages

<br/>

In [59]:
import pandas as pd
import numpy as np
from google.colab import files


print("Pandas Version: " + pd.__version__)

Pandas Version: 1.0.1


<br/>

##  **Optional: Use Upload Files to Collect DataFrame Created in last portion of assignment

<br/>

In [60]:
dataframe = files.upload()

<br/>

## Use Pandas to read CSV and set it to DataFrame: __df_postal_codes__

<br/>

In [61]:
df_postal_codes = pd.read_csv("postal_codes_canada_m.csv", index_col=0)
df_postal_codes

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"Old Mill North, Montgomery Road, The Kingsway"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Sunnylea, Royal York South East, The Queensway..."


<br/>

## Read CSV and Data Values for Geospatial Data. And print First 20 Rows in the DataFrame, __df_geo_coords__.

<br/>

In [62]:
df_geo_coords = pd.read_csv("http://cocl.us/Geospatial_data")
df_geo_coords.head(20)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


<br/> 

## Retrieve Latitude and Longitude Values from DataFrame, __df_geo_coords__.

<br/>

In [0]:
latitude = np.zeros([len(df_postal_codes)])
longitude = np.zeros([len(df_postal_codes)])
n = 0

for i in df_postal_codes['Postcode']:
  for c in df_geo_coords['Postal Code']:
    if i==c:
      latitude[n] = df_geo_coords.loc[df_geo_coords['Postal Code'] == c].get('Latitude')
      longitude[n] = df_geo_coords.loc[df_geo_coords['Postal Code'] == c].get('Longitude')
      n += 1

<br/>

## Add Latitude and Longitude Values to DataFrame, __df_postal_codes__, as new Columns

<br/>

In [0]:
df_postal_codes['Latitude'] = latitude
df_postal_codes['Longitude'] = longitude

<br/>

## Display the Updated DataFrame, __df_postal_codes__.

<br/>

In [65]:
df_postal_codes

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"Old Mill North, Montgomery Road, The Kingsway",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Sunnylea, Royal York South East, The Queensway...",43.636258,-79.498509


<br/>

## Convert DataFrame to __.csv__ File to use in the next portion of the assignment.

<br/>

In [0]:
df_postal_codes.to_csv("postal_codes_canada_latlng.csv")

<br/>

# __Segmenting and Clustering Neighborhoods in Toronto__ Part 3

<br/>

## *Exploring and Clustering Neighborhoods in Toronto*

<br/>

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

<br/>

Just make sure:

<br/>

- To add enough Markdown cells to explain what you decided to do and to report any observations you make.

- To generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. __(3 marks)__

<br/>

___

<br/>

## Install Newest Version of Pandas for new functionalities

<br/>

In [67]:
!pip3 -q install pandas=1.0.1.

[31mERROR: Invalid requirement: 'pandas=1.0.1.'
Hint: = is not a valid operator. Did you mean == ?[0m


<br/> 

## Import Necessary Packages

<br/>

In [68]:
import pandas as pd
import numpy as np
from google.colab import files
import folium


print("Pandas Version: " + pd.__version__)

Pandas Version: 1.0.1


<br/>

##  **Optional: Use Upload Files to Collect DataFrame Created in last portion of assignment

<br/>

In [69]:
dataframe = files.upload()

<br/>

## Use Pandas to read CSV and set it to DataFrame: __df_postal_codes__

<br/>

In [70]:
df_postal_codes = pd.read_csv("postal_codes_canada_latlng.csv", index_col=0)
df_postal_codes

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"Old Mill North, Montgomery Road, The Kingsway",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Sunnylea, Royal York South East, The Queensway...",43.636258,-79.498509


<br/>

Use __.unique()__ to find # of boroughs and __len()__ to find the # ofneighborhoods in Toronto.

<br/>

In [71]:
unique_boroughs = df_postal_codes['Borough'].unique()
print("There are", len(unique_boroughs), "Boroughs.")
print("There are", len(df_postal_codes['Neighbourhood']), "Neighborhoods; however, many of these overlap with each other.")

There are 11 Boroughs.
There are 103 Neighborhoods; however, many of these overlap with each other.


<br/> 

## Create a Folium Map of Toronto and add Markers to show the Location of the Neighborhoods

<br/>


In [0]:
toronto_lat_lng = [43.6532, -79.3832]

map_toronto = folium.Map(location = toronto_lat_lng, zoom_start = 12)

for i in range(0, len(df_postal_codes['Borough'])):
  label = '{}, {}'.format(df_postal_codes['Neighbourhood'][i], df_postal_codes['Borough'][i])
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
    radius=8,
    location= [df_postal_codes['Latitude'][i], df_postal_codes['Longitude'][i]],
    popup= label,
    color='green',
    fill=True,
    fill_color='blue',
    fill_opacity=0.5,
    parse_html = False
  ).add_to(map_toronto)

map_toronto

<br/>

## Define Foursquare Credentials and Version

<br/>

In [73]:
CLIENT_ID = 'QKJXTMWEPIBSRFF321OXJU0YO1CKOELO3TET4QRR14NMWTNT' 
CLIENT_SECRET = 'NRFEGIB5AYPEXBVKDRBHOYRSZK3LPENSYSRRRYGJP2NLECK3' 
VERSION = '20180605'

print('Foursquare credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Foursquare credentials:
CLIENT_ID: QKJXTMWEPIBSRFF321OXJU0YO1CKOELO3TET4QRR14NMWTNT
CLIENT_SECRET:NRFEGIB5AYPEXBVKDRBHOYRSZK3LPENSYSRRRYGJP2NLECK3


<br/>

## Explore Neighborhoods in Toronto

<br/>

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    LIMIT = 100
    radius = 500
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<br/>

## Get Nearby Venues of Toronto

<br/>

In [93]:
toronto_venues = getNearbyVenues(names=df_postal_codes['Neighbourhood'],
                                   latitudes=df_postal_codes['Latitude'],
                                   longitudes=df_postal_codes['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park
Queen's Park
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Port Union, Rouge Hill, Highland Creek
Don Mills South, Flemingdon Park
Woodbine Heights
St. James Town
Humewood-Cedarvale
Old Burnhamthorpe, Markland Wood, Eringate, Bloordale Gardens
West Hill, Morningside, Guildwood
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Wilson Heights, Downsview North, Bathurst Manor
Thorncliffe Park
Richmond, King, Adelaide
Dufferin, Dovercourt Village
Scarborough Village
Oriole, Henry Farm, Fairview
York University, Northwood Park
East Toronto
Union Station, Toronto Islands, Harbourfront East
Trinity, Little Portugal
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East, CFB Toronto
Riverdale, The Danfort

<br/>

## Display the DataFrame, __toronto_venues__.

<br/>

In [94]:
toronto_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
2220,"South of Bloor, Royal York South West, The Que...",43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon
2221,"South of Bloor, Royal York South West, The Que...",43.628841,-79.520999,Once Upon A Child,43.631075,-79.518290,Kids Store
2222,"South of Bloor, Royal York South West, The Que...",43.628841,-79.520999,Value Village,43.631269,-79.518238,Thrift / Vintage Store
2223,"South of Bloor, Royal York South West, The Que...",43.628841,-79.520999,Kingsway Boxing Club,43.627254,-79.526684,Gym


<br/>

## Check how many venues were retrieved using __.shape__

<br/>

In [95]:
print("There were", toronto_venues.shape[0], "venues retrieved.")

There were 2225 venues retrieved.


<br/> 

## Group Venues by their Count

<br/>

In [97]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
Bayview Village,4,4,4,4,4,4
Berczy Park,56,56,56,56,56,56
Business Reply Mail Processing Centre 969 Eastern,18,18,18,18,18,18
Caledonia-Fairbanks,4,4,4,4,4,4
...,...,...,...,...,...,...
Woodbine Heights,9,9,9,9,9,9
York Mills West,5,5,5,5,5,5
"York Mills, Silver Hills",1,1,1,1,1,1
"York University, Northwood Park",7,7,7,7,7,7


<br/>

## Reorganize __toronto_venues__

<br/>

In [98]:
toronto_venues_fix = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_venues_fix['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_venues_fix.columns[-1]] + list(toronto_venues_fix.columns[:-1])
toronto_venues_fix = toronto_venues_fix[fixed_columns]

toronto_venues_fix.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,...,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Social Club,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<br/>

## Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

<br/>

In [101]:
toronto_grouped = toronto_venues_fix.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,...,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Social Club,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.200000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
1,Bayview Village,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.25,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
2,Berczy Park,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.017857,0.0,0.000000,0.0,0.0,0.000000,0.017857,0.0,0.017857,0.035714,0.00,0.000000,0.0,0.0,0.0,0.017857,0.017857,0.0,0.035714,0.000000,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,...,0.017857,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.017857,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
3,Business Reply Mail Processing Centre 969 Eastern,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.055556,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.055556,0.000000,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
4,Caledonia-Fairbanks,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Woodbine Heights,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.111111,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.111111,0.000000,0.0,0.0,0.0,0.0,0.111111,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.0,0.0,0.00
96,York Mills West,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.20,0.200000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
97,"York Mills, Silver Hills",0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
98,"York University, Northwood Park",0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.142857,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00


<br/>

## Print each neighborhood along with the top 5 most common venues

<br/>

In [102]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge   0.2
1               Skating Rink   0.2
2  Latin American Restaurant   0.2
3             Breakfast Spot   0.2
4             Clothing Store   0.2


----Bayview Village----
                 venue  freq
0                 Café  0.25
1                 Bank  0.25
2  Japanese Restaurant  0.25
3   Chinese Restaurant  0.25
4          Yoga Studio  0.00


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1          Steakhouse  0.04
2              Bakery  0.04
3  Seafood Restaurant  0.04
4         Cheese Shop  0.04


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.11
1         Yoga Studio  0.06
2       Auto Workshop  0.06
3         Pizza Place  0.06
4          Restaurant  0.06


----Caledonia-Fairbanks----
                venue  freq
0                Park  0.50
1       Women's Store  0.25
2                Pool  0.25
3       Lugg

<br/> 

## Create function __return_most_common_venues__ to sort the venues in descending order

<br/>

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<br/>

## Create the new dataframe, __neighborhoos_venues_sorted__, and display the top 10 venues for each neighborhood.

<br/>

In [0]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

# find the top 10 venues for each Neighborhood group and populate the associated column
for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

<br/> 

## Begin Clustering the Data from the Dataframe, __neighborhoods_venues_sorted__

<br/>

In [105]:
from sklearn.cluster import KMeans
kclusters = 5

toronto_grouped_neighborhood=toronto_grouped['Neighborhood']

#for the clustering we dont need the name coz duh!
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

print(kmeans.labels_)

[1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 4 0 1 0 1 0 0 1 0 1 1 1 4 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 3 1 0 1 1 1 1 1 1 1 0 1 1 0 1 4 1 1 0
 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 2 1 1]


<br/>

## Apply the clustering labels to out dataframe as their in the same order

<br/>

In [106]:
toronto_grouped_clustering.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_grouped_clustering

Unnamed: 0,Cluster Labels,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,...,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Social Club,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.200000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
1,1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.25,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
2,1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.017857,0.0,0.000000,0.0,0.0,0.000000,0.017857,0.0,0.017857,0.035714,0.00,0.000000,0.0,0.0,0.0,0.017857,0.017857,0.0,0.035714,0.000000,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,...,0.017857,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.017857,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
3,1,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.055556,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.055556,0.000000,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
4,0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.111111,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.111111,0.000000,0.0,0.0,0.0,0.0,0.111111,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.0,0.0,0.00
96,0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.20,0.200000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
97,2,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00
98,1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.142857,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00


<br/>

## Create a new dataframe, __toronto_merged__ that includes the cluster as well as the top 10 venues for each neighborhood.

<br/>

In [0]:
toronto_grouped_clustering['Neighborhood'] = toronto_grouped_neighborhood
toronto_merged = df_postal_codes.join(toronto_grouped_clustering.set_index('Neighborhood'), on='Neighbourhood',how = 'inner')

<br/>

## Show the DataFrame, __toronto_merged__

<br/>

In [115]:
toronto_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,...,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Social Club,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,0.000000,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,0.000000,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636,1,0.021739,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.021739,0.0,0.0,0.000000,0.021739,0.0,0.0,0.000000,0.0,0.0,0.0,0.065217,0.021739,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.021739,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.021739,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.043478,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,0.000000,0.153846,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.076923,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.000000,0.153846
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1,0.026316,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.026316,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.026316,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.026316,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.026316,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"Old Mill North, Montgomery Road, The Kingsway",43.653654,-79.506944,0,0.000000,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1,0.012500,0.000000,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.000000,0.0,0.0,0.012500,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.012500,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.012500,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0125,0.0,0.0,0.0125,0.0125,0.0,0.0000,0.050000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0125,0.012500,0.0125,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.012500,0.000000
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558,1,0.055556,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.055556,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0000,0.0,0.055556,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000
101,M8Y,Etobicoke,"Sunnylea, Royal York South East, The Queensway...",43.636258,-79.498509,4,0.000000,0.000000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,1.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0000,0.0,0.000000,0.0,0.000000,0.0000,0.0,0.0,0.0000,0.0000,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0000,0.0,0.0000,0.000000,0.0000,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.000000


<br/>

## Visualize the Clusters by plotting a Folium Map

In [120]:
from random import randint
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_toronto)
       
map_clusters_toronto