# The Battle of Neighborhoods (Week 2)

## Determine the best area to open a new restaurant in Toronto using location data 

I will be using this notebook to show how to determine the best area to open a new restaurant in Toronto using location data from Foursquare API

### **Target Audience** : 

New business owner(s) who want to open a restaurant in Toronto area

### **Introduction**

The purpose of this project is to explore the various neighborhoods in Toronto using location data to help Target Audience make an informated decision about the area in which they want to open a new restaurant business. This project will provide them data about competion, neighborhood, population etc.,

### **Business Problem:**

Toronto is the provincial capital of Ontario and the most populous city in Canada, with a population of 2,731,571 as of 2016. Current to 2016, the Toronto census metropolitan area (CMA), of which the majority is within the Greater Toronto Area (GTA), held a population of 5,928,040, making it Canada's most populous CMA. The city is the anchor of the Golden Horseshoe, an urban agglomeration of 9,245,438 people (as of 2016) surrounding the western end of Lake Ontario. Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world

Toronto is a prominent centre for music,theatre,motion picture production,and television production, and is home to the headquarters of Canada's major national broadcast networks and media outlets. Its varied cultural institutions, which include numerous museums and galleries, festivals and public events, entertainment districts, national historic sites, and sports activities, attract over 43 million tourists each year.

Toronto encompasses a geographical area formerly administered by many separate municipalities. These municipalities have each developed a distinct history and identity over the years, and their names remain in common use among Torontonians. Former municipalities include East York, Etobicoke, Forest Hill, Mimico, North York, Parkdale, Scarborough, Swansea, Weston and York. Throughout the city there exist hundreds of small neighbourhoods and some larger neighbourhoods covering a few square kilometres.

Diverse population and a vast geographical area means that there will be intense competition among businesses to make maximum profit by attracting a lot of customers. This project will help solve that problem using location data by enabling new business owner(s) with insights about various neighborhoods, competition, population etc.,




### **Data**

For this project, I will be using the following datasets 

1. Neighborhoods in Toronto - I'll be scrapping this data set from wikipedia -- https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. This contains the Neighborhoods & the corresponding Borough and Postalcode in Toronto


2. Restaurants in Toronto (using Foursquare API) - https://foursquare.com/explore?mode=url&ne=44.418088%2C-78.362732&q=Restaurant&sw=42.742978%2C-80.554504


#### Get Toronto Neighborhood Data

In [None]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import time
import json
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
# !conda install -c conda-forge folium=0.5.0 --yes
import folium
from sklearn.cluster import KMeans
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)



In [10]:
# define the dataframe columns
column_names = ['Postal_Code','Borough', 'Neighborhood'] 
toronto_nebr_df = pd.DataFrame(columns=column_names)



In [11]:
# Scrape neighborhood data from wiki
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urlopen(wiki)
soup = BeautifulSoup(page, "lxml")
_tbl = soup.find('table', class_='wikitable sortable')


In [12]:
_postal_code=[]
_borough=[]
_neighborhood=[]
for _row in _tbl.findAll("tr"):
    cells = _row.findAll('td')
    if len(cells)==3: #Only extract table body not heading
        _postal_code.append(cells[0].find(text=True))
        _borough.append(cells[1].find(text=True))
        _neighborhood.append(cells[2].find(text=True))

        
#Adding Data to toronto_nebr_df DataFrame
toronto_nebr_df['Postal_Code']=_postal_code
toronto_nebr_df['Borough']=_borough
toronto_nebr_df['Neighborhood']=_neighborhood

toronto_nebr_df



Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [13]:
toronto_nebr_df.shape

(287, 3)

#### Data Cleaning
Drop rows if Borough is Not Assigned
Reset Index

In [14]:
toronto_nebr_df = toronto_nebr_df.drop(toronto_nebr_df[toronto_nebr_df['Borough'].str.contains("Not assigned")==True].index, axis=0, inplace=False)

toronto_nebr_df.index = pd.RangeIndex(len(toronto_nebr_df.index))
toronto_nebr_df



Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Not assigned
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [15]:
# Assign Borough value to Neighborhood if Neighborhood value is "Not Assigned"
toronto_nebr_df1=toronto_nebr_df

for row_index,row in toronto_nebr_df.iterrows():
    if((toronto_nebr_df.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned') or (toronto_nebr_df.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned\n')):
        toronto_nebr_df1.loc[row_index,['Neighborhood']] = toronto_nebr_df1.loc[row_index,['Borough']].values.astype('str') 
        
toronto_nebr_df1


Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Queen's Park
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [16]:
# Narrowing down only to four borough - East, West, Centrail and Downtown Toronto
# Ungroup dataset if more than 1 neighborhood is found in the same row
column = ['Postal_Code','Borough', 'Neighborhood'] 
toronto_nebr_df_ungrp = pd.DataFrame(columns=column_names)

toronto_nebr_df_ungrp = toronto_nebr_df1.drop(toronto_nebr_df1[toronto_nebr_df1['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)


toronto_nebr_df_ungrp.index = pd.RangeIndex(len(toronto_nebr_df_ungrp.index))
toronto_nebr_df_ungrp

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M5A,Downtown Toronto,Harbourfront
1,M7A,Downtown Toronto,Queen's Park
2,M5B,Downtown Toronto,Ryerson
3,M5B,Downtown Toronto,Garden District
4,M5C,Downtown Toronto,St. James Town
5,M4E,East Toronto,The Beaches
6,M5E,Downtown Toronto,Berczy Park
7,M5G,Downtown Toronto,Central Bay Street
8,M6G,Downtown Toronto,Christie
9,M5H,Downtown Toronto,Adelaide


In [None]:
# Geocode locations
geolocator = Nominatim(scheme='http', user_agent="ES1234")
for row_index, item in toronto_nebr_df_ungrp.iterrows():    
    list1 = toronto_nebr_df_ungrp.loc[[row_index],['Neighborhood']].values.astype('str')
    loc = ' , Toronto, Ontario, Canada'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    print(item)
    location = geolocator.geocode(list1 , limit = 15)
    if(location is not None):
        toronto_nebr_df_ungrp.loc[toronto_nebr_df_ungrp.index[row_index], 'Latitude'] = location.latitude
        toronto_nebr_df_ungrp.loc[toronto_nebr_df_ungrp.index[row_index], 'Longitude'] = location.longitude
        

In [18]:
#Drop records that could not be geocoded
toronto_nebr_df_ungrp.dropna(inplace =True)
toronto_nebr_df_ungrp.index = pd.RangeIndex(len(toronto_nebr_df_ungrp.index))
toronto_nebr_df_ungrp.head()
    

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015
1,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034
2,M5B,Downtown Toronto,Ryerson,43.658469,-79.378993
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114
4,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704


In [19]:
# Create a map of Toronto using Folium
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=10)

# Mark the Neighborhoods in the map
for lat, lng, borough, neighborhood in zip(toronto_nebr_df_ungrp['Latitude'], toronto_nebr_df_ungrp['Longitude'], toronto_nebr_df_ungrp['Borough'], toronto_nebr_df_ungrp['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='gray',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

#### Explore the Neighborhoods using Foursquare API

In [20]:
CLIENT_ID = 'WDF0SWRQGOTZQV1VGULQPEUONSGYBABDGLOKYW0QHFEG3XJM' # your Foursquare ID
CLIENT_SECRET = 'DDPQV0V55J5IHIJKCJUYSL5XAUWXHR2CAS1B1YRPSIVWMNEU' # your Foursquare Secret
VERSION = '20200121' # Foursquare API version


In [23]:
# Limit to top 100 locations in a 500 meter radius per Neighborhood
def getNearbyVenues(names, latitudes, longitudes,LIMIT=100, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
toronto_venues = getNearbyVenues(names=toronto_nebr_df_ungrp['Neighborhood'],
                                   latitudes=toronto_nebr_df_ungrp['Latitude'],
                                   longitudes=toronto_nebr_df_ungrp['Longitude']
                                  )



In [30]:
print(toronto_venues.shape)
print(toronto_venues.head())
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

(3425, 7)
   Neighborhood  Neighborhood Latitude  Neighborhood Longitude                Venue  Venue Latitude  Venue Longitude       Venue Category
0  Harbourfront               43.64008               -79.38015  Harbour Square Park       43.639253       -79.378395                 Park
1  Harbourfront               43.64008               -79.38015         Lake Ontario       43.638945       -79.379665                 Lake
2  Harbourfront               43.64008               -79.38015         Harbourfront       43.639526       -79.380688         Neighborhood
3  Harbourfront               43.64008               -79.38015                 Miku       43.641374       -79.377531  Japanese Restaurant
4  Harbourfront               43.64008               -79.38015     Natrel Pond/Rink       43.638431       -79.382528         Skating Rink
There are 287 uniques categories.


#### Analyze each Neighborhood

In [34]:
# one hot encoding
onehot_toronto_df = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

onehot_toronto_df['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot_toronto_df.columns[-1]] + list(onehot_toronto_df.columns[:-1])
onehot_toronto_df = onehot_toronto_df[fixed_columns]

onehot_toronto_df.head()
onehot_toronto_df.shape


(3425, 287)

In [36]:
toronto_grpd_df = onehot_toronto_df.groupby('Neighborhood').mean().reset_index()
toronto_grpd_df.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,African Restaurant,Airport Service,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Butcher,Café,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Theater,Indonesian Restaurant,Indoor Play Area,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoor Supply Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Racetrack,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tree,Tunnel,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,Bathurst Quay,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
4,CN Tower,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032609,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.021739,0.01087,0.0,0.0,0.01087,0.0,0.01087,0.01087,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.021739,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.032609,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.01087,0.0,0.021739,0.0,0.0,0.0,0.0,0.043478,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.032609,0.0,0.01087,0.0,0.01087,0.0,0.01087,0.032609,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,0.0


Checking top 10 venus in each Neighborhood

In [39]:
n = 10

for _neighborhood in toronto_grpd_df['Neighborhood']:
    print("----"+_neighborhood+"----")
    temp = toronto_grpd_df[toronto_grpd_df['Neighborhood'] == _neighborhood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(n))
    print('\n')

----Adelaide
----
                 venue  freq
0          Coffee Shop  0.09
1           Restaurant  0.05
2       Cosmetics Shop  0.04
3                 Café  0.04
4  Japanese Restaurant  0.03
5   Italian Restaurant  0.03
6  American Restaurant  0.03
7                Hotel  0.03
8                  Gym  0.03
9            Gastropub  0.03


----Bathurst Quay
----
                  venue  freq
0           Coffee Shop  0.17
1                  Café  0.12
2                  Park  0.08
3          Dance Studio  0.04
4                Tunnel  0.04
5  Caribbean Restaurant  0.04
6                Garden  0.04
7      Sushi Restaurant  0.04
8      Sculpture Garden  0.04
9                   Gym  0.04


----Berczy Park----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.06
2           Restaurant  0.04
3  Japanese Restaurant  0.04
4   Italian Restaurant  0.04
5               Bakery  0.04
6                Hotel  0.03
7         Cocktail Bar  0.03
8             Beer Bar  0

Change the above dataset into a dataframe

In [43]:
n = 10

def _most_common_venues(row, n):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:n]


indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(n):
    try:
        columns.append('{}{} Popular Venues'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Popular Venues'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grpd_df['Neighborhood']

for ind in np.arange(toronto_grpd_df.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = _most_common_venues(toronto_grpd_df.iloc[ind, :], n)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
0,Adelaide,Coffee Shop,Restaurant,Cosmetics Shop,Café,Japanese Restaurant,Italian Restaurant,Gym,American Restaurant,Gastropub,Hotel
1,Bathurst Quay,Coffee Shop,Café,Park,Diner,Tunnel,Caribbean Restaurant,Sculpture Garden,Japanese Restaurant,Ramen Restaurant,Dance Studio
2,Berczy Park,Coffee Shop,Café,Japanese Restaurant,Restaurant,Italian Restaurant,Bakery,Seafood Restaurant,Beer Bar,Hotel,Gym
3,Brockton,Bar,Park,Vietnamese Restaurant,Grocery Store,Coffee Shop,Dive Bar,Portuguese Restaurant,Art Gallery,Playground,Bakery
4,CN Tower,Hotel,Pizza Place,Coffee Shop,Aquarium,Italian Restaurant,Restaurant,Gym,Bar,Scenic Lookout,Fast Food Restaurant
5,Cabbagetown,Restaurant,Coffee Shop,Café,Gastropub,Italian Restaurant,Diner,Pizza Place,Japanese Restaurant,Pub,Indian Restaurant
6,Chinatown,Café,Chinese Restaurant,Dessert Shop,Vietnamese Restaurant,Dumpling Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Coffee Shop,Cocktail Bar,Grocery Store
7,Christie,Korean Restaurant,Coffee Shop,Café,Karaoke Bar,Japanese Restaurant,Indian Restaurant,Cocktail Bar,Ice Cream Shop,Dessert Shop,Mexican Restaurant
8,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Café,Gay Bar,Burger Joint,Restaurant,Hotel,Pizza Place,Mediterranean Restaurant
9,Commerce Court,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Gym,Deli / Bodega,Beer Bar,Italian Restaurant,Steakhouse


#### Cluster Neighborhoods using K-Mean

In [46]:
kClusters = 5
toronto_grpd_clustered_df = toronto_grpd_df.drop('Neighborhood',1)

kmeans = KMeans(init = "k-means++", n_clusters=kClusters, random_state=0).fit(toronto_grpd_clustered_df)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_[0:63] 
print(labels)



[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0]


In [48]:
toronto_merged = toronto_nebr_df_ungrp
print(toronto_merged.shape)
labels = np.append(labels,labels[0])
print(labels.shape)
# add clustering labels
toronto_merged['Cluster Labels'] = labels.tolist()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!



(64, 5)
(64,)


Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
0,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015,0,Coffee Shop,Café,Hotel,Pizza Place,Restaurant,Italian Restaurant,History Museum,Chinese Restaurant,Plaza,Sandwich Place
1,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034,0,Coffee Shop,Café,Sandwich Place,Italian Restaurant,French Restaurant,Ice Cream Shop,Bubble Tea Shop,Middle Eastern Restaurant,Chinese Restaurant,Juice Bar
2,M5B,Downtown Toronto,Ryerson,43.658469,-79.378993,0,Coffee Shop,Café,Clothing Store,Fast Food Restaurant,Japanese Restaurant,Diner,Middle Eastern Restaurant,Ramen Restaurant,Burger Joint,Bakery
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Cosmetics Shop,Restaurant,Middle Eastern Restaurant,Café,Plaza,Tea Room,Theater
4,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704,0,Coffee Shop,Pizza Place,Café,Grocery Store,Pharmacy,Bakery,Restaurant,Bike Rental / Bike Share,Beer Store,Library


In [53]:
# Create a map of Toronto using Folium and plot the clusters
map_clusters = folium.Map(location=[43.653963, -79.387207], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kClusters)
ys = [i+x+(i*x)**2 for i in range(kClusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters