# Segmenting and Clustering Neigborhoods in Toronto

## Start  by creating a new Notebook for this assignment

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe like the one shown below:

In [1]:
# imports
import pandas as pd

### Read Wikipedia Raw Dataframe

In [2]:
wiki_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
wiki_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.
Rename Columns to `PostalCode`,`Borough`,`Neigborhood`

In [3]:
wiki_df = wiki_df.rename(columns={
    'Postal Code':'PostalCode',
    'Borough':'Borough',
    'Neighbourhood':'Neighborhood'
})
wiki_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [4]:
wiki_df = wiki_df[wiki_df['Borough']!='Not assigned']
wiki_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.

In [5]:
wiki_df[wiki_df['PostalCode']=='M5A']

Unnamed: 0,PostalCode,Borough,Neighborhood
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
wiki_df.describe()

Unnamed: 0,PostalCode,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M5P,North York,Downsview
freq,1,24,4


#### Postal Code seems to be unique in data so no Commaprocessing necessary

In [7]:
# If duplicates would arise
grouped_df = wiki_df.groupby(['PostalCode','Borough'], as_index=False).agg(lambda x: ', '.join(x))
grouped_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.


In [8]:
wiki_df[wiki_df['Neighborhood']=='Not assigned']

Unnamed: 0,PostalCode,Borough,Neighborhood


In [9]:
# Alternative if Not assigned in Neighborhood
def new_neighborhood(row):
    if row.Neighborhood == 'Not assigned':
        row.Neighborhood = row.Borough
    return row
wiki_df.apply(new_neighborhood, axis=1)
wiki_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### No Neighborhood with "Not assigned" with available Borough

Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.


In [10]:
wiki_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [11]:
wiki_df.shape

(103, 3)

## Part 2

In [12]:
#!pip install geocoder
import geocoder # import geocoder

def get_coords_geocoder(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    cnt = 0
    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng
      cnt += 1
      if cnt>10:
        return [None, None]
    return lat_lng_coords

In [13]:
# Import Coordinates List
coords = pd.read_csv('https://cocl.us/Geospatial_data')
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge and cleanup DataFrame with Latitude/Longitude

In [14]:
lnglat_df = wiki_df.merge(how='left', right=coords, left_on='PostalCode', right_on='Postal Code')
lnglat_df = lnglat_df[['PostalCode','Borough','Neighborhood','Latitude','Longitude']]
lnglat_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# Part 3

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 



In [15]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto Ontario are 43.6534817, -79.3839347.


# Visualization of Toronto Ontario Neighborhoods

In [16]:
import folium
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(lnglat_df['Latitude'], lnglat_df['Longitude'], lnglat_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Stored Image is
![Toronto Neighborhoods](img/Toronto_Map_1.PNG)

# Visualization of Toronto Neighborhoods (Name Toronto)

In [17]:
### Get Only Toronto Neighborhoods
df_toronto = lnglat_df[lnglat_df['Borough'].str.contains('Toronto')]
df_toronto.reset_index(drop=True, inplace=True)
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [18]:
# create map of Manhattan using latitude and longitude values
map_toronto2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto2)  
    
map_toronto2

Stored Image is
![Toronto Neighborhoods](img/Toronto_Map_2.PNG)

# Foursquare Api for reading venues

In [19]:
CLIENT_ID = 'EON4F5UAZZ3I0NY5Y3YKOVFYWDF1YJWADGO2KKRSDE3NBQ5G' # your Foursquare ID
CLIENT_SECRET = 'ZNV4OTYHPC41PJG0N5FNCY4WJLFA1D4YKJHA3XFJDO4X1UZC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
RADIUS = 500

In [20]:
import requests
venues = []
for index, row in df_toronto.iterrows():
    print('Latitude and Longitude of values of {} are {}, {}.'.format(row.Neighborhood,
                                                                 row.Latitude,
                                                                 row.Longitude))
    url = f"https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={row.Latitude},{row.Longitude}&radius={RADIUS}&limit={LIMIT}"
    results = requests.get(url).json()
    venue_result = results['response']['groups'][0]['items']
    for venue in venue_result:
        venues.append({
            "PostalCode":row.PostalCode,
            "Borough": row.Borough,
            "Neighborhood": row.Neighborhood,
            "Venue": venue['venue']['name'],
            "Latitude": venue['venue']['location']['lat'],
            "Longitude": venue['venue']['location']['lng'],
            "Category": venue['venue']['categories'][0]['name']
        })

Latitude and Longitude of values of Regent Park, Harbourfront are 43.6542599, -79.3606359.
Latitude and Longitude of values of Queen's Park, Ontario Provincial Government are 43.6623015, -79.3894938.
Latitude and Longitude of values of Garden District, Ryerson are 43.6571618, -79.37893709999999.
Latitude and Longitude of values of St. James Town are 43.6514939, -79.3754179.
Latitude and Longitude of values of The Beaches are 43.67635739999999, -79.2930312.
Latitude and Longitude of values of Berczy Park are 43.644770799999996, -79.3733064.
Latitude and Longitude of values of Central Bay Street are 43.6579524, -79.3873826.
Latitude and Longitude of values of Christie are 43.669542, -79.4225637.
Latitude and Longitude of values of Richmond, Adelaide, King are 43.65057120000001, -79.3845675.
Latitude and Longitude of values of Dufferin, Dovercourt Village are 43.66900510000001, -79.4422593.
Latitude and Longitude of values of Harbourfront East, Union Station, Toronto Islands are 43.640815

### List Venues

In [21]:
venue_df = pd.DataFrame(venues)
venue_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Venue,Latitude,Longitude,Category
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,Downtown Toronto,"Regent Park, Harbourfront",Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",Body Blitz Spa East,43.654735,-79.359874,Spa
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",Impact Kitchen,43.656369,-79.35698,Restaurant


## Show count of  Venues for each Neighborhood

In [22]:
venue_df.groupby(['PostalCode','Borough','Neighborhood']).count().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Venue,Latitude,Longitude,Category
PostalCode,Borough,Neighborhood,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,East Toronto,The Beaches,5,5,5,5
M4K,East Toronto,"The Danforth West, Riverdale",43,43,43,43
M4L,East Toronto,"India Bazaar, The Beaches West",20,20,20,20
M4M,East Toronto,Studio District,36,36,36,36
M4N,Central Toronto,Lawrence Park,4,4,4,4


# Explore the Neighborhoods in Toronto

In [23]:
print('There are {} uniques categories.'.format(len(venue_df['Category'].unique())))

There are 235 uniques categories.


### Analyze Each Neighborhood

Doing onehot encoding to show how many venues of which type are available in a specific neighborhood

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(venue_df[['Category']], prefix="", prefix_sep="")
# avoid duplicate Column Neighborhood which is also a category
toronto_onehot.rename(columns={'Neighborhood':'Cat_Neighborhood'},inplace=True)
# add neighborhood column back to dataframe
toronto_onehot['PostalCode'] = venue_df['PostalCode']
toronto_onehot['Borough'] = venue_df['Borough']
toronto_onehot['Neighborhood'] = venue_df['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-3],toronto_onehot.columns[-2],toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-3])
#fixed_columns
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
toronto_onehot.shape

(1604, 238)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
showing now relative values (percentage of venue type in relation to number of venues)

In [26]:
toronto_grouped = toronto_onehot.groupby(['PostalCode','Borough','Neighborhood']).mean().reset_index()
toronto_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West, Riverdale",0.0,0.0,0.0,0.0,0.0,0.023256,0.0,...,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256
2,M4L,East Toronto,"India Bazaar, The Beaches West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
toronto_grouped.shape

(39, 238)

#### Intermediate result show most common venues for a neighborhood

In [28]:
num_top_5_venues = 5
sample_size = 5
cnt = 0
for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[3:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_5_venues))
    print("\n")
    cnt += 1
    if cnt > sample_size:
        break

----The Beaches----
               venue  freq
0                Pub   0.2
1              Trail   0.2
2   Asian Restaurant   0.2
3   Cat_Neighborhood   0.2
4  Health Food Store   0.2


----The Danforth West, Riverdale----
                    venue  freq
0        Greek Restaurant  0.19
1             Coffee Shop  0.09
2      Italian Restaurant  0.07
3  Furniture / Home Store  0.05
4          Ice Cream Shop  0.05


----India Bazaar, The Beaches West----
                  venue  freq
0  Fast Food Restaurant  0.10
1                  Park  0.10
2            Steakhouse  0.05
3            Restaurant  0.05
4      Sushi Restaurant  0.05


----Studio District----
                 venue  freq
0          Coffee Shop  0.08
1              Brewery  0.06
2            Gastropub  0.06
3                 Café  0.06
4  American Restaurant  0.06


----Lawrence Park----
              venue  freq
0              Park  0.25
1          Bus Line  0.25
2  Business Service  0.25
3       Swim School  0.25
4           

Put that result in a dataframe

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [30]:
import numpy as np
num_top_venues = 10
indicators = ['st','nd','rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# create a new Dataframe 
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
#neighborhoods_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']
#neighborhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind,1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,Cat_Neighborhood,Health Food Store,Asian Restaurant,Trail,Pub,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run
1,"The Danforth West, Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Bubble Tea Shop,Spa,Japanese Restaurant,Cosmetics Shop
2,"India Bazaar, The Beaches West",Park,Fast Food Restaurant,Sushi Restaurant,Movie Theater,Sandwich Place,Restaurant,Italian Restaurant,Fish & Chips Shop,Intersection,Steakhouse
3,Studio District,Coffee Shop,American Restaurant,Bakery,Brewery,Café,Gastropub,Yoga Studio,Fish Market,Park,Cat_Neighborhood
4,Lawrence Park,Park,Bus Line,Business Service,Swim School,Yoga Studio,Discount Store,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters

In [31]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop(['PostalCode','Borough','Neighborhood'],1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]


array([0, 2, 2, 2, 0, 2, 2, 2, 3, 2])

Lets create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)


## Add Cluster Number to Dataframe

In [33]:
toronto_merged = df_toronto

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Pub,Theater,French Restaurant,Greek Restaurant,Wine Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Sushi Restaurant,Diner,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Coffee Shop,Clothing Store,Italian Restaurant,Cosmetics Shop,Hotel,Bubble Tea Shop,Middle Eastern Restaurant,Café,Japanese Restaurant,Lingerie Store
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Café,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Department Store,Lingerie Store,Gym,Park
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Cat_Neighborhood,Health Food Store,Asian Restaurant,Trail,Pub,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run


## Visualize cluster on Map

In [34]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Stored Image is
![Toronto Venue Map](img/Toronto_Venue_Map.PNG)

### Examine the Clusters

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0,1,2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4E,East Toronto,The Beaches,0,Cat_Neighborhood,Health Food Store,Asian Restaurant,Trail,Pub,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run
18,M4N,Central Toronto,Lawrence Park,0,Park,Bus Line,Business Service,Swim School,Yoga Studio,Discount Store,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant
33,M4W,Downtown Toronto,Rosedale,0,Park,Playground,Trail,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0,1,2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,M5N,Central Toronto,Roselawn,1,Health & Beauty Service,Garden,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0,1,2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",2,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Pub,Theater,French Restaurant,Greek Restaurant,Wine Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",2,Coffee Shop,Sushi Restaurant,Diner,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",2,Coffee Shop,Clothing Store,Italian Restaurant,Cosmetics Shop,Hotel,Bubble Tea Shop,Middle Eastern Restaurant,Café,Japanese Restaurant,Lingerie Store
3,M5C,Downtown Toronto,St. James Town,2,Coffee Shop,Café,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Department Store,Lingerie Store,Gym,Park
5,M5E,Downtown Toronto,Berczy Park,2,Coffee Shop,Cocktail Bar,Bakery,Restaurant,Pharmacy,Seafood Restaurant,Farmers Market,Beer Bar,Cheese Shop,Greek Restaurant
6,M5G,Downtown Toronto,Central Bay Street,2,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Burger Joint,Thai Restaurant,Japanese Restaurant,Bubble Tea Shop,Salad Place,Portuguese Restaurant
7,M6G,Downtown Toronto,Christie,2,Grocery Store,Café,Park,Coffee Shop,Italian Restaurant,Candy Store,Restaurant,Athletics & Sports,Baby Store,Nightclub
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",2,Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Deli / Bodega,Thai Restaurant,Bakery,Gym,Concert Hall
9,M6H,West Toronto,"Dufferin, Dovercourt Village",2,Bakery,Pharmacy,Music Venue,Café,Recording Studio,Middle Eastern Restaurant,Bar,Supermarket,Bank,Pool
10,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",2,Coffee Shop,Aquarium,Café,Hotel,Scenic Lookout,Restaurant,Brewery,Italian Restaurant,Fried Chicken Joint,Pizza Place


In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0,1,2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,M4T,Central Toronto,"Moore Park, Summerhill East",3,Lawyer,Trail,Summer Camp,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0,1,2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",4,Trail,Jewelry Store,Mexican Restaurant,Sushi Restaurant,Yoga Studio,Diner,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


1. Cluster[0] is a park and are for leisure time acitivities with Beaches and Parks mainly at the borders of the city centers
2. Cluster[1] is onyl one neighborhood and it is mainly focussed on wellness, like health and beauty, garden yoga and other leisure time acitivities lik event space and escape rooms.
3. Cluster[2] is mainly standard city business district with cafees  and other restaurants
4. Cluster[3] is some kind of business area might be also courts because of most common venues the lawyers
5. Cluster[4] is some kind of multicultural leisure ground