# IBM Data Science Capstone Project

**Hello and Welcome!**

This Notebook will be used solely for IBM Data Science Capstone Project! I wish a great day and a lot of luck reviewing my results!
<br/>
Below you can find couple of my favourite jokes about being Data Scientist to lighten your mood:

___

>*The data science motto: If at first you don’t succeed; call it version 1.0*

<!-- -->
___

>*Why should you take a data scientist with you into the jungle? <br/>- They can take care of Python problems*

___

In [211]:
import pandas as pd
import numpy as np
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


# Assignment #2: Segmenting and Clustering Neighborhoods in Toronto

The main task is to explore and cluster neihborhoods in Toronto, Canada, using Foresquare API and K-means clustring algorithm.

**Part 1**<br/>
Scrape Neighborhood list from wikipedia.

In [212]:
# Importing scraping libraries
import requests
from bs4 import BeautifulSoup as bs4

In [213]:
# Defining url and requesting source HTML
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
r = requests.get(url)

In [214]:
# Souping the html :D
soup = bs4(r.content,'html')

In [215]:
# Find table by class name and convert to DataFrame
table = soup.find_all('table',{'class':'wikitable sortable'})
neighbrhd_df = pd.read_html(str(table[0]))[0]
neighbrhd_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [216]:
# Filtering DataFrame
neighbrhd_df = neighbrhd_df[neighbrhd_df.Neighbourhood != 'Not assigned']
neighbrhd_df.columns = ['PostalCode','Borough','Neighborhood']
neighbrhd_df.reset_index(drop=True, inplace=True)
neighbrhd_df.head()

#NOTE: As you can see below Wikipedia already grouped Neighborhoods by postal codes (ex. M5A: "Regent Park, Harbourfront", 
# Hence, no addtional grouping required. 

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [217]:
neighbrhd_df.shape

(103, 3)

**Part 2**<br/>
Retrieving latitude, longitude for each naighborhood using geocoder. 

In [218]:
# Google Goecoder did not worked for me, I used csv attached to task
geo_df = pd.read_csv('Geospatial_Coordinates.csv')
geo_df.columns = ['PostalCode','Latitude','Longitude']
geo_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [219]:
# Example
postal_code = 'M5G'
geo_df[geo_df['PostalCode'] == postal_code]

Unnamed: 0,PostalCode,Latitude,Longitude
57,M5G,43.657952,-79.387383


In [220]:
neighbrhd_df = pd.merge(neighbrhd_df, geo_df, on='PostalCode', how='left')
neighbrhd_df.shape

(103, 5)

In [221]:
neighbrhd_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [222]:
import folium

latitude, longitude = 43.6532,-79.3832
# create map of Toronto using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighbrhd_df['Latitude'], neighbrhd_df['Longitude'], neighbrhd_df['Borough'], neighbrhd_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [226]:
# Initialize Foresqare
CLIENT_ID = '#' # your Foursquare ID
CLIENT_SECRET = '#' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y4SPYFQGWXXVGSNSI3M5PARWTNYUBHWCHXQYD2BJA5VP3PFA
CLIENT_SECRET:U4TXUQUZFRDVPRKBF4TL5RS5VQTDKUU4QNEE2ZFHG2AZW0YN


In [227]:
# Get nearby venue function
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Distances between neighborhoods in Toronto are larger then in Manhattan and density of venues is lower. Therefore, I have taken larger radius of 1km to collect nearby venues.

In [228]:
# Get venue for each neighborhood
torronto_venues = getNearbyVenues(neighbrhd_df.Neighborhood, neighbrhd_df.Latitude, neighbrhd_df.Longitude, radius=1000)

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [230]:
print(torronto_venues.shape)
torronto_venues.head()

(4985, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


Lets see how many venues per neighborhood we have

In [231]:
torronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,55,55,55,55,55,55
"Alderwood, Long Branch",26,26,26,26,26,26
"Bathurst Manor, Wilson Heights, Downsview North",30,30,30,30,30,30
Bayview Village,16,16,16,16,16,16
"Bedford Park, Lawrence Manor East",40,40,40,40,40,40
...,...,...,...,...,...,...
"Willowdale, Willowdale West",11,11,11,11,11,11
Woburn,9,9,9,9,9,9
Woodbine Heights,30,30,30,30,30,30
York Mills West,23,23,23,23,23,23


In [148]:
print('There are {} uniques categories.'.format(len(torronto_venues['Venue Category'].unique())))

There are 337 uniques categories.


Encoding venue categories and neighborhoods

In [232]:
# one hot encoding
toronto_onehot = pd.get_dummies(torronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = torronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = ['Neighborhood'] + [x for x in list(toronto_onehot.columns) if x not in ['Neighborhood']]
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [233]:
toronto_onehot.shape

(4985, 337)

In [234]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Agincourt,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.018182,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,...,0.025,0.000000,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
95,Woburn,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
96,Woodbine Heights,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0
97,York Mills West,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,...,0.000,0.000000,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0


Review Top5 venues per neighborhood

In [235]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.13
1  Caribbean Restaurant  0.05
2         Shopping Mall  0.05
3                Bakery  0.04
4           Coffee Shop  0.04


----Alderwood, Long Branch----
               venue  freq
0     Discount Store  0.12
1           Pharmacy  0.08
2  Convenience Store  0.08
3        Pizza Place  0.08
4               Park  0.08


----Bathurst Manor, Wilson Heights, Downsview North----
                venue  freq
0                Bank  0.07
1         Coffee Shop  0.07
2                Park  0.07
3          Restaurant  0.03
4  Chinese Restaurant  0.03


----Bayview Village----
                 venue  freq
0        Grocery Store  0.12
1  Japanese Restaurant  0.12
2                 Bank  0.12
3          Gas Station  0.12
4                 Park  0.06


----Bedford Park, Lawrence Manor East----
                venue  freq
0         Coffee Shop  0.08
1  Italian Restaurant  0.08
2         Pizza Place  0.05
3                Bank  0.

                    venue  freq
0  Furniture / Home Store  0.12
1             Coffee Shop  0.12
2              Restaurant  0.08
3             Pizza Place  0.08
4          Sandwich Place  0.04


----Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East----
                         venue  freq
0                         Park   0.3
1                Shopping Mall   0.1
2                     Bus Stop   0.1
3         Gym / Fitness Center   0.1
4  Eastern European Restaurant   0.1


----Parkdale, Roncesvalles----
              venue  freq
0              Café  0.05
1       Coffee Shop  0.04
2               Bar  0.04
3  Sushi Restaurant  0.04
4       Pizza Place  0.04


----Parkview Hill, Woodbine Gardens----
                        venue  freq
0        Gym / Fitness Center  0.09
1                 Pizza Place  0.09
2  Construction & Landscaping  0.09
3                 Coffee Shop  0.09
4                     Brewery 

Sorting all neighborhood venues by TOP10

In [236]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [237]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Caribbean Restaurant,Bakery,Sandwich Place,Indian Restaurant,Coffee Shop,Sushi Restaurant,Supermarket,Clothing Store
1,"Alderwood, Long Branch",Discount Store,Park,Convenience Store,Pizza Place,Pharmacy,Trail,Shopping Mall,Donut Shop,Garden Center,Liquor Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Park,Coffee Shop,Bank,Ice Cream Shop,Ski Area,Sushi Restaurant,Frozen Yogurt Shop,Supermarket,Fried Chicken Joint,Mediterranean Restaurant
3,Bayview Village,Bank,Grocery Store,Japanese Restaurant,Gas Station,Skating Rink,Playground,Park,Restaurant,Café,Chinese Restaurant
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Pizza Place,Sandwich Place,Bank,Café,Baby Store,Comfort Food Restaurant,Bagel Shop,Thai Restaurant


### Finding Clusters with K-means

In [238]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [239]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 6, 1, 1, 6, 1, 1, 1])

In [240]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = neighbrhd_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,6,Park,Bus Stop,Convenience Store,Shopping Mall,Pharmacy,Tennis Court,Shop & Service,Coffee Shop,Laundry Service,Chinese Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Coffee Shop,Sporting Goods Shop,Intersection,Café,Boxing Gym,French Restaurant,Men's Store,Golf Course,Grocery Store,Gym / Fitness Center
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Pub,Park,Theater,Café,Restaurant,Bakery,Breakfast Spot,Diner,Thai Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Vietnamese Restaurant,Dessert Shop,Sushi Restaurant,Fried Chicken Joint,Furniture / Home Store,Bank
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Park,Sushi Restaurant,Café,Italian Restaurant,Thai Restaurant,Ramen Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Pizza Place


In [241]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Check how many neighborhoods in each cluster

In [242]:
toronto_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35
1,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
6,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17


From the above summary we can see 3 larget categories (clusters). Those that are clustered by single neighborhood can be grouped into separate 4th category.

# Review of 3 largest clusters

### Cluster 1 

In [250]:
cluster1 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]
cluster1

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,"Lawrence Manor, Lawrence Heights",0,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Vietnamese Restaurant,Dessert Shop,Sushi Restaurant,Fried Chicken Joint,Furniture / Home Store,Bank
6,"Malvern, Rouge",0,Coffee Shop,Trail,Fast Food Restaurant,Bank,Restaurant,Caribbean Restaurant,Bakery,Paper / Office Supplies Store,Gym,Park
8,"Parkview Hill, Woodbine Gardens",0,Construction & Landscaping,Pizza Place,Brewery,Gym / Fitness Center,Coffee Shop,Breakfast Spot,Bakery,Bus Line,Fast Food Restaurant,Gastropub
10,Glencairn,0,Grocery Store,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Gas Station,Pizza Place,Pet Store,Restaurant,Rental Car Location,Convenience Store
16,Humewood-Cedarvale,0,Pizza Place,Convenience Store,Coffee Shop,Ice Cream Shop,Sandwich Place,Bagel Shop,Bakery,Bank,Korean Restaurant,Optical Shop
17,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0,Coffee Shop,Convenience Store,Farmers Market,Shopping Mall,Shopping Plaza,Beer Store,Gas Station,Grocery Store,Pharmacy,College Rec Center
18,"Guildwood, Morningside, West Hill",0,Pizza Place,Fast Food Restaurant,Bank,Coffee Shop,Food & Drink Shop,Beer Store,Sandwich Place,Supermarket,Greek Restaurant,Grocery Store
23,Leaside,0,Sporting Goods Shop,Coffee Shop,Grocery Store,Furniture / Home Store,Electronics Store,Burger Joint,Shopping Mall,Sandwich Place,Bank,Restaurant
26,Cedarbrae,0,Coffee Shop,Pizza Place,Indian Restaurant,Gas Station,Bank,Pharmacy,Bakery,Fried Chicken Joint,Hakka Restaurant,Burger Joint
27,Hillcrest Village,0,Coffee Shop,Park,Pharmacy,Ice Cream Shop,Residential Building (Apartment / Condo),Pool,Shopping Mall,Chinese Restaurant,Sandwich Place,Bank


In [252]:
cluster1.groupby(['1st Most Common Venue']).count()['Neighborhood']

1st Most Common Venue
Bakery                         1
Chinese Restaurant             3
Clothing Store                 1
Coffee Shop                   14
Construction & Landscaping     1
Discount Store                 1
Furniture / Home Store         1
Grocery Store                  1
Ice Cream Shop                 1
Intersection                   1
Middle Eastern Restaurant      1
Park                           1
Pharmacy                       2
Pizza Place                    4
Restaurant                     1
Sporting Goods Shop            1
Name: Neighborhood, dtype: int64

### Cluster 2

In [253]:
cluster2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]
cluster2

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,1,Coffee Shop,Sporting Goods Shop,Intersection,Café,Boxing Gym,French Restaurant,Men's Store,Golf Course,Grocery Store,Gym / Fitness Center
2,"Regent Park, Harbourfront",1,Coffee Shop,Pub,Park,Theater,Café,Restaurant,Bakery,Breakfast Spot,Diner,Thai Restaurant
4,"Queen's Park, Ontario Provincial Government",1,Coffee Shop,Park,Sushi Restaurant,Café,Italian Restaurant,Thai Restaurant,Ramen Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Pizza Place
7,Don Mills,1,Coffee Shop,Restaurant,Japanese Restaurant,Gym,Supermarket,Café,Burger Joint,Bank,Mobile Phone Shop,Asian Restaurant
9,"Garden District, Ryerson",1,Coffee Shop,Gastropub,Japanese Restaurant,Restaurant,Hotel,Italian Restaurant,Diner,Café,Seafood Restaurant,Middle Eastern Restaurant
13,Don Mills,1,Coffee Shop,Restaurant,Japanese Restaurant,Gym,Supermarket,Café,Burger Joint,Bank,Mobile Phone Shop,Asian Restaurant
14,Woodbine Heights,1,Park,Coffee Shop,Café,Skating Rink,Sandwich Place,Pizza Place,Ice Cream Shop,Plaza,Snack Place,Restaurant
15,St. James Town,1,Coffee Shop,Café,Restaurant,Japanese Restaurant,Gastropub,Bakery,Italian Restaurant,Beer Bar,Seafood Restaurant,Hotel
19,The Beaches,1,Pub,Coffee Shop,Pizza Place,Beach,Japanese Restaurant,Breakfast Spot,Health Food Store,Park,Caribbean Restaurant,Sandwich Place
20,Berczy Park,1,Coffee Shop,Café,Japanese Restaurant,Hotel,Beer Bar,Park,Restaurant,Gastropub,Bakery,Cocktail Bar


In [254]:
cluster2.groupby(['1st Most Common Venue']).count()['Neighborhood']

1st Most Common Venue
Café                   8
Coffee Shop           26
Greek Restaurant       1
Indian Restaurant      1
Italian Restaurant     4
Korean Restaurant      2
Park                   3
Pub                    1
Restaurant             1
Name: Neighborhood, dtype: int64

### Cluster 6

In [255]:
cluster6 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]
cluster6

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,6,Park,Bus Stop,Convenience Store,Shopping Mall,Pharmacy,Tennis Court,Shop & Service,Coffee Shop,Laundry Service,Chinese Restaurant
5,"Islington Avenue, Humber Valley Village",6,Pharmacy,Park,Convenience Store,Café,Skating Rink,Shopping Mall,Golf Course,Baseball Field,Bakery,Grocery Store
11,"West Deane Park, Princess Gardens, Martin Grov...",6,Park,Pizza Place,Hotel,Bank,Fish & Chips Shop,Restaurant,Mexican Restaurant,Clothing Store,Grocery Store,Gym
12,"Rouge Hill, Port Union, Highland Creek",6,Breakfast Spot,Italian Restaurant,Burger Joint,Playground,Park,Deli / Bodega,Filipino Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant
21,Caledonia-Fairbanks,6,Park,Pizza Place,Pharmacy,Cosmetics Shop,Gym,Discount Store,Coffee Shop,Falafel Restaurant,Fast Food Restaurant,Bus Stop
22,Woburn,6,Park,Coffee Shop,Fast Food Restaurant,Dog Run,Chinese Restaurant,Mobile Phone Shop,Indian Restaurant,Flower Shop,Food,Elementary School
39,Bayview Village,6,Bank,Grocery Store,Japanese Restaurant,Gas Station,Skating Rink,Playground,Park,Restaurant,Café,Chinese Restaurant
49,"North Park, Maple Leaf Park, Upwood Park",6,Park,Coffee Shop,Bakery,Dim Sum Restaurant,Gas Station,Athletics & Sports,Chinese Restaurant,Pizza Place,Mediterranean Restaurant,Convenience Store
57,"Humberlea, Emery",6,Convenience Store,Intersection,Storage Facility,Gas Station,Golf Course,Bakery,Auto Workshop,Discount Store,Park,African Restaurant
58,"Birch Cliff, Cliffside West",6,Park,Ice Cream Shop,College Stadium,Thai Restaurant,Motorcycle Shop,Gym,Skating Rink,Gym Pool,Auto Workshop,Asian Restaurant


In [256]:
cluster6.groupby(['1st Most Common Venue']).count()['Neighborhood']

1st Most Common Venue
Bank                  1
Breakfast Spot        1
Coffee Shop           1
College Gym           1
Convenience Store     1
Park                 10
Pharmacy              1
Train Station         1
Name: Neighborhood, dtype: int64