# Capstone Project - The Battle of the Neighborhoods

## Introduction: Business Problem 

The Vancouver Real Estate market is among the hottest globally. Just last year, Vancouver was ranked the 4th most expensive place to purchase a home in the [World](https://dailyhive.com/vancouver/vancouver-4-most-expensive-housing-market-cbre-2019). Real Estate agents and companies must be providing their clients with the best reccomendations to ensure company growth and client satisfaction. 

If Real Estate Agencies were able to utilize data, they could provide their clients with tailored neighbourhoods and areas to choose from. These reccomendations could be based on amenities in the neighbourhood in addition to safety metrics and quality from life. Home buyers will be able to use this information in addition to their budget to target certain neighbourhoods to purchase a home. With home buying being such a large investment in Vancouver, it is important that all stakeholders are utlizing all information available to them.

## Data: 

Based on the problem above, the following data would be ideal to use in such a project 

- Neighbourhood names and locations in Vancouver 
- Crime data for these neighbourhoods in order to advise on safety 
- Census information for the neighourhoods to further understand demographics, population and general makeup of the neighbourhoods 
- Information on amenities in these neighbourhoods to tailor to clients persona 

Utilizing all this information is beyond the scope of this project. For this project, the following sources of data will be used.

- [Vancouver Police Crime Data](https://geodash.vpd.ca/opendata/): Downloaded as a csv and added to project workbook
- [2016 Census Data](https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm?Lang=E&Geo1=CMACA&Code1=933&Geo2=PR&Code2=59&Data=Count&SearchText=Vancouver&SearchType=Begins&SearchPR=01&B1=All&TABID=1): Downloaded as a csv and added to project workbook. *Note that in this project only population data is used. Utilization of other census data is beyond the scope of this project
- Foursquare API: Used as a source to find information on amenities in surrounding neighbourhoods 


 
## Use of Data and Data Wrangling:

The data will be gathered and used in the following way

#### Part 1: Crime Data Wrangling

- VPD Data will be uploaded to the workbook. Neighbourhoods will be accessed and crimes will be summed to find the total number of crimes per neighbourhood. Crime data from 2018 will and onwards will only be used.

#### Part 2: Merge with Census Data 

- Population census data will be added to the workbook.The neighbourhoods wil be accessed and reconciled with the VPD data to ensure neighbourhoods match. If there are discrepencies between the two sets of neighbourhoods, neighbourhoods will be merged based on geographic location in order to match. 
- Once neighbourhoods match in VPD and Census data, these two datasets will be merged so the "Crime per population" can be calculated. This makes it possible to compare neighbourhood crime rates even if populations are different.

#### Part 3: Defining geographic locations for neighbourhoods 

- Latitude and Longittude will be generated for each neighbourhood by using [Geocoder](https://developers.google.com/maps/documentation/geocoding/start?utm_source=google&utm_medium=cpc&utm_campaign=FY18-Q2-global-demandgen-paidsearchonnetworkhouseads-cs-maps_contactsal_saf&utm_content=text-ad-none-none-DEV_c-CRE_315916117661-ADGP_Hybrid%20%7C%20AW%20SEM%20%7C%20SKWS%20~%20Geocoding%20API-KWID_43700039136946177-kwd-303183099831-userloc_9001500&utm_term=KW_geocoder%20api-ST_geocoder%20api&gclid=Cj0KCQiArvX_BRCyARIsAKsnTxMxodFMrsJgtspQpClgBUF8LNhP5BdUbt4Xal9Oz0-TLJAljCGWiE4aAs5TEALw_wcB) to determine coordinates. 

#### Part 4: Foursquare API integration to find venues 

- The foursquare API will be used to find venue locations in neighbourhoods in vancouver 


#### Part 5: K-means clusterins and Analysis

- K-means clustering will be used to divide the neighbourhoods into similar groups. From this each group will be analyzed and assigned a "Persona". This persona is what the Real Estate Agencies could use to tailor based on their clientele. Each group will have crime rate available to see so the user is able to see what the safest neighbourhood in the traget group they are looking at 




In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

#Command to install OpenCage Geocoder for fetching Lat and Lng of Neighborhood
#!pip install opencage

#Importing OpenCage Geocoder
from opencage.geocoder import OpenCageGeocode

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [3]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_49ed30c51b484654ac74f6108c3dc574 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='Fke9ormSAk7rszBD6n_8LTOk3x6g8GsOj7Ut8G8njdzg',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_49ed30c51b484654ac74f6108c3dc574.get_object(Bucket='capstoneprojectibmdatascience-donotdelete-pr-hbypfe38nmhkdl',Key='crimedata_csv_all_years.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)

van_crime_df = df_data_1



#Dropping X,Y which represents Lat, Lng data as Coordinates, the data seems to be corrupt
van_crime_df.drop(['MONTH','MINUTE', 'HUNDRED_BLOCK', 'X', 'Y','DAY','HOUR','MINUTE'], axis = 1, inplace = True)
van_crime_df = van_crime_df.loc[van_crime_df['YEAR'] > 2017]


van_crime_df.columns = ['Type', 'Year','Neighbourhood']
van_crime_df.head()


Unnamed: 0,Type,Year,Neighbourhood
1,Break and Enter Commercial,2019,Fairview
2,Break and Enter Commercial,2019,West End
4,Break and Enter Commercial,2020,West End
12,Break and Enter Commercial,2018,West End
20,Break and Enter Commercial,2020,West End


In [4]:
van_crime_df['Neighbourhood'].value_counts()


Central Business District    31010
West End                      8635
Mount Pleasant                6908
Strathcona                    5883
Fairview                      5742
Renfrew-Collingwood           5315
Grandview-Woodland            5103
Kitsilano                     5013
Kensington-Cedar Cottage      4258
Hastings-Sunrise              3713
Sunset                        2891
Riley Park                    2547
Marpole                       2496
Victoria-Fraserview           1707
Killarney                     1583
Oakridge                      1393
Dunbar-Southlands             1322
Kerrisdale                    1222
West Point Grey               1157
South Cambie                  1110
Shaughnessy                   1094
Arbutus Ridge                  933
Stanley Park                   408
Musqueam                        53
Name: Neighbourhood, dtype: int64

In [5]:
van_crime_cat = pd.pivot_table(van_crime_df,
                               values=['Year'],
                               index=['Neighbourhood'],
                               columns=['Type'],
                               aggfunc=len,
                               fill_value=0,
                               margins=True)
van_crime_cat

Unnamed: 0_level_0,Year,Year,Year,Year,Year,Year,Year,Year,Year,Year
Type,Break and Enter Commercial,Break and Enter Residential/Other,Mischief,Other Theft,Theft from Vehicle,Theft of Bicycle,Theft of Vehicle,Vehicle Collision or Pedestrian Struck (with Fatality),Vehicle Collision or Pedestrian Struck (with Injury),All
Neighbourhood,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Arbutus Ridge,39,237,139,62,320,55,34,1,46,933
Central Business District,2064,420,5661,5617,14664,1583,428,3,570,31010
Dunbar-Southlands,32,267,219,91,543,81,32,1,56,1322
Fairview,626,263,810,801,2210,743,136,1,152,5742
Grandview-Woodland,453,412,1044,567,1792,347,331,1,156,5103
Hastings-Sunrise,153,379,667,236,1729,156,215,1,177,3713
Kensington-Cedar Cottage,202,414,797,483,1658,264,210,5,225,4258
Kerrisdale,78,291,163,35,464,59,46,1,85,1222
Killarney,102,163,255,108,679,80,90,3,103,1583
Kitsilano,362,477,808,522,1954,571,160,1,158,5013


In [6]:
van_crime_cat.reset_index(inplace = True)
van_crime_cat.columns = van_crime_cat.columns.map(''.join)
van_crime_cat.rename(columns={'YearAll':'Total Crime'}, inplace=True)
van_crime_df = van_crime_cat[['Neighbourhood', 'Total Crime']] 

van_crime_df.drop([12,18,24], axis= 0,inplace = True )
van_crime_df.reset_index()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,index,Neighbourhood,Total Crime
0,0,Arbutus Ridge,933
1,1,Central Business District,31010
2,2,Dunbar-Southlands,1322
3,3,Fairview,5742
4,4,Grandview-Woodland,5103
5,5,Hastings-Sunrise,3713
6,6,Kensington-Cedar Cottage,4258
7,7,Kerrisdale,1222
8,8,Killarney,1583
9,9,Kitsilano,5013


In [7]:
body = client_49ed30c51b484654ac74f6108c3dc574.get_object(Bucket='capstoneprojectibmdatascience-donotdelete-pr-hbypfe38nmhkdl',Key='CensusLocalAreaProfiles2016.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )
    
df_data_2 = pd.read_csv(body)

df_data_2.loc[0, 'Neighborhood'] = 'Arbutus Ridge' 
df_data_2.loc[1, 'Neighborhood'] = 'Central Business District' 
df_data_2.loc[2, 'Neighborhood'] = 'Dunbar-Southlands' 
df_data_2.loc[3, 'Neighborhood'] = 'Fairview' 
df_data_2.loc[4, 'Neighborhood'] = 'Grandview-Woodland' 
df_data_2.loc[5, 'Neighborhood'] = 'Hastings-Sunrise' 
df_data_2.loc[6, 'Neighborhood'] = 'Kensington-Cedar Cottage' 
df_data_2.loc[7, 'Neighborhood'] = 'Kerrisdale' 
df_data_2.loc[8, 'Neighborhood'] = 'Killarney' 
df_data_2.loc[9, 'Neighborhood'] = 'Kitsilano' 
df_data_2.loc[10, 'Neighborhood'] = 'Marpole' 
df_data_2.loc[11, 'Neighborhood'] = 'Mount Pleasant' 
df_data_2.loc[12, 'Neighborhood'] = 'Oakridge' 
df_data_2.loc[13, 'Neighborhood'] = 'Renfrew-Collingwood' 
df_data_2.loc[14, 'Neighborhood'] = 'Riley Park' 
df_data_2.loc[15, 'Neighborhood'] = 'Shaughnessy' 
df_data_2.loc[16, 'Neighborhood'] = 'South Cambie' 
df_data_2.loc[17, 'Neighborhood'] = 'Strathcona' 
df_data_2.loc[18, 'Neighborhood'] = 'Sunset' 
df_data_2.loc[19, 'Neighborhood'] = 'Victoria-Fraserview' 
df_data_2.loc[20, 'Neighborhood'] = 'West End' 
df_data_2.loc[21, 'Neighborhood'] = 'West Point Grey' 

df_data_2.rename(columns={'Neighborhood':'Neighbourhood'}, inplace=True)


df_data_2

Unnamed: 0,Neighbourhood,Population
0,Arbutus Ridge,15295
1,Central Business District,62030
2,Dunbar-Southlands,21425
3,Fairview,33620
4,Grandview-Woodland,29175
5,Hastings-Sunrise,34575
6,Kensington-Cedar Cottage,49325
7,Kerrisdale,13975
8,Killarney,29325
9,Kitsilano,43045


In [8]:
crime_pop_df = pd.merge(df_data_2,van_crime_df, on ='Neighbourhood', how = 'inner')

crime_pop_df["Crime per Population"] = crime_pop_df["Total Crime"] / crime_pop_df["Population"]
crime_pop_df





Unnamed: 0,Neighbourhood,Population,Total Crime,Crime per Population
0,Arbutus Ridge,15295,933,0.061
1,Central Business District,62030,31010,0.499919
2,Dunbar-Southlands,21425,1322,0.061704
3,Fairview,33620,5742,0.170791
4,Grandview-Woodland,29175,5103,0.17491
5,Hastings-Sunrise,34575,3713,0.10739
6,Kensington-Cedar Cottage,49325,4258,0.086325
7,Kerrisdale,13975,1222,0.087442
8,Killarney,29325,1583,0.053981
9,Kitsilano,43045,5013,0.11646


In [9]:
Latitude = []
Longitude = []

Neighbourhood = crime_pop_df['Neighbourhood']



key = '830323b5ca694362904814ff0a11b803'
geocoder = OpenCageGeocode(key)

for i in range(len(Neighbourhood)):
    address = '{}, Vancouver, BC, Canada'.format(Neighbourhood[i])
    location = geocoder.geocode(address)
    Latitude.append(location[0]['geometry']['lat'])
    Longitude.append(location[0]['geometry']['lng'])
print(Latitude, Longitude)


[49.2463051, 49.2714086, 49.237864, 49.2619557, 49.2758495, 49.2778297, 49.2467896, 49.2209848, 49.2180118, 49.2694099, 49.2092233, 49.2640483, 49.2266149, 49.2485768, 49.2448536, 49.2463051, 49.2464639, 49.2776935, 49.2190935, 49.2189795, 49.2841308, 49.2681022] [-123.159636, -123.1012588, -123.1843544, -123.1304084, -123.0669344, -123.0400054, -123.0734751, -123.1595484, -123.037115, -123.155267, -123.1361495, -123.0962492, -123.1229433, -123.0401793, -123.1030349, -123.1384051, -123.1216027, -123.0885393, -123.0916654, -123.0638159, -123.1317949, -123.2026425]


In [10]:
ws_neig_dict = {'Neighbourhood': Neighbourhood,'Latitude': Latitude,'Longitude':Longitude}
ws_neig_geo = pd.DataFrame(data=ws_neig_dict, columns=['Neighbourhood', 'Latitude', 'Longitude'], index=None)

map_df = pd.merge(ws_neig_geo,crime_pop_df, on ='Neighbourhood', how = 'inner')

map_df





Unnamed: 0,Neighbourhood,Latitude,Longitude,Population,Total Crime,Crime per Population
0,Arbutus Ridge,49.246305,-123.159636,15295,933,0.061
1,Central Business District,49.271409,-123.101259,62030,31010,0.499919
2,Dunbar-Southlands,49.237864,-123.184354,21425,1322,0.061704
3,Fairview,49.261956,-123.130408,33620,5742,0.170791
4,Grandview-Woodland,49.275849,-123.066934,29175,5103,0.17491
5,Hastings-Sunrise,49.27783,-123.040005,34575,3713,0.10739
6,Kensington-Cedar Cottage,49.24679,-123.073475,49325,4258,0.086325
7,Kerrisdale,49.220985,-123.159548,13975,1222,0.087442
8,Killarney,49.218012,-123.037115,29325,1583,0.053981
9,Kitsilano,49.26941,-123.155267,43045,5013,0.11646


In [11]:
address = 'Vancouver, BC, Canada'

location = geocoder.geocode(address)
latitude = location[0]['geometry']['lat']
longitude = location[0]['geometry']['lng']

print('The geograpical coordinate of Vancouver, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver, Canada are 49.2608724, -123.1139529.


In [12]:
van_map = folium.Map(location=[latitude, longitude], zoom_start=12)


for lat,lon,area,size in zip(map_df['Latitude'],map_df['Longitude'],map_df['Neighbourhood'],map_df['Crime per Population']):
     folium.CircleMarker([lat, lon],
                            popup=area,
                            radius=size*50,
                            color='b',
                            fill=True,
                            fill_opacity=0.7,
                            fill_color='#3186cc',
                           ).add_to(van_map) 
    
van_map

In [13]:
CLIENT_ID = '4D4C1U0AGVKGV0HASXZ2H11XF351OE0PQCUPRP1WNS0EPJ3K' # your Foursquare ID
CLIENT_SECRET = 'TMHU4HQARDN4ZEIBUDV4KFK0FWIEIK1ER0QOKG1RRULWLBJ4' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4D4C1U0AGVKGV0HASXZ2H11XF351OE0PQCUPRP1WNS0EPJ3K
CLIENT_SECRET:TMHU4HQARDN4ZEIBUDV4KFK0FWIEIK1ER0QOKG1RRULWLBJ4


In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
van_venues = getNearbyVenues(names=map_df['Neighbourhood'],latitudes=map_df['Latitude'],longitudes=map_df['Longitude'])

Arbutus Ridge
Central Business District
Dunbar-Southlands
Fairview
Grandview-Woodland
Hastings-Sunrise
Kensington-Cedar Cottage
Kerrisdale
Killarney
Kitsilano
Marpole
Mount Pleasant
Oakridge
Renfrew-Collingwood
Riley Park
Shaughnessy
South Cambie
Strathcona
Sunset
Victoria-Fraserview
West End
West Point Grey


In [16]:
print(van_venues.shape)
van_venues.head()

(591, 5)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Arbutus Ridge,49.246305,-123.159636,Quilchena Park,Park
1,Arbutus Ridge,49.246305,-123.159636,The Arbutus Club,Event Space
2,Arbutus Ridge,49.246305,-123.159636,The Patty Shop,Caribbean Restaurant
3,Arbutus Ridge,49.246305,-123.159636,Butter Baked Goods,Bakery
4,Arbutus Ridge,49.246305,-123.159636,La Buca,Italian Restaurant


In [17]:
# one hot encoding
van_onehot = pd.get_dummies(van_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
van_onehot['Neighbourhood'] = van_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [van_onehot.columns[-1]] + list(van_onehot.columns[:-1])
van_onehot = van_onehot[fixed_columns]

van_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Amphitheater,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,...,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Arbutus Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arbutus Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Arbutus Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arbutus Ridge,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,Arbutus Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
van_grouped = van_onehot.groupby('Neighbourhood').mean().reset_index()
van_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Amphitheater,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,...,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Arbutus Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Central Business District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
2,Dunbar-Southlands,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Fairview,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,...,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0
4,Grandview-Woodland,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0
5,Hastings-Sunrise,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
6,Kensington-Cedar Cottage,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.233333,0.0,0.0
7,Kerrisdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Killarney,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,...,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0
9,Kitsilano,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667


In [19]:
num_top_venues = 10

for hood in van_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = van_grouped[van_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arbutus Ridge----
               venue  freq
0             Bakery  0.11
1  Convenience Store  0.06
2                Gym  0.06
3      Shopping Mall  0.06
4               Park  0.06
5     Discount Store  0.06
6         Food Truck  0.06
7   Sushi Restaurant  0.06
8    Bubble Tea Shop  0.06
9          Pet Store  0.06


----Central Business District----
              venue  freq
0           Brewery  0.13
1            Bakery  0.13
2    Ice Cream Shop  0.07
3              Park  0.03
4    Science Museum  0.03
5       Coffee Shop  0.03
6  Community Center  0.03
7       Salad Place  0.03
8      Dessert Shop  0.03
9       Pizza Place  0.03


----Dunbar-Southlands----
                  venue  freq
0         Grocery Store  0.19
1                   Gym  0.08
2           Coffee Shop  0.08
3           Golf Course  0.08
4  Gym / Fitness Center  0.04
5                  Café  0.04
6              Bus Stop  0.04
7        Sandwich Place  0.04
8      Sushi Restaurant  0.04
9    Mexican Restaurant  0.04



In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = van_grouped['Neighbourhood']

for ind in np.arange(van_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(van_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arbutus Ridge,Bakery,Convenience Store,Sushi Restaurant,Park,Coffee Shop,Discount Store,Caribbean Restaurant,Sandwich Place,Event Space,Shopping Mall
1,Central Business District,Bakery,Brewery,Ice Cream Shop,Pizza Place,Salad Place,Park,Liquor Store,Sculpture Garden,Science Museum,Middle Eastern Restaurant
2,Dunbar-Southlands,Grocery Store,Gym,Golf Course,Coffee Shop,Pub,Mexican Restaurant,Café,Sandwich Place,Bus Stop,Pet Store
3,Fairview,Japanese Restaurant,Camera Store,Park,Restaurant,Bakery,Coffee Shop,French Restaurant,Kitchen Supply Store,Juice Bar,Breakfast Spot
4,Grandview-Woodland,Coffee Shop,Pizza Place,Brewery,Indian Restaurant,Cuban Restaurant,Tapas Restaurant,Scandinavian Restaurant,Mexican Restaurant,Pub,French Restaurant


In [22]:
# set number of clusters
kclusters = 6

van_clustering = van_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(van_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:21]

array([3, 0, 0, 3, 0, 3, 1, 4, 3, 3, 1, 0, 1, 2, 3, 5, 5, 3, 1, 3, 3],
      dtype=int32)

In [23]:
# add clustering labels

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vancouver_merged = map_df

vancouver_merged = vancouver_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

vancouver_merged.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude,Population,Total Crime,Crime per Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arbutus Ridge,49.246305,-123.159636,15295,933,0.061,3,Bakery,Convenience Store,Sushi Restaurant,Park,Coffee Shop,Discount Store,Caribbean Restaurant,Sandwich Place,Event Space,Shopping Mall
1,Central Business District,49.271409,-123.101259,62030,31010,0.499919,0,Bakery,Brewery,Ice Cream Shop,Pizza Place,Salad Place,Park,Liquor Store,Sculpture Garden,Science Museum,Middle Eastern Restaurant
2,Dunbar-Southlands,49.237864,-123.184354,21425,1322,0.061704,0,Grocery Store,Gym,Golf Course,Coffee Shop,Pub,Mexican Restaurant,Café,Sandwich Place,Bus Stop,Pet Store
3,Fairview,49.261956,-123.130408,33620,5742,0.170791,3,Japanese Restaurant,Camera Store,Park,Restaurant,Bakery,Coffee Shop,French Restaurant,Kitchen Supply Store,Juice Bar,Breakfast Spot
4,Grandview-Woodland,49.275849,-123.066934,29175,5103,0.17491,0,Coffee Shop,Pizza Place,Brewery,Indian Restaurant,Cuban Restaurant,Tapas Restaurant,Scandinavian Restaurant,Mexican Restaurant,Pub,French Restaurant


In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lon, poi, cluster, size in zip(vancouver_merged['Latitude'], vancouver_merged['Longitude'], vancouver_merged['Neighbourhood'], vancouver_merged['Cluster Labels'],vancouver_merged['Crime per Population']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=size*50,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [25]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 0, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Business District,0.499919,Bakery,Brewery,Ice Cream Shop,Pizza Place,Salad Place,Park,Liquor Store,Sculpture Garden,Science Museum,Middle Eastern Restaurant
2,Dunbar-Southlands,0.061704,Grocery Store,Gym,Golf Course,Coffee Shop,Pub,Mexican Restaurant,Café,Sandwich Place,Bus Stop,Pet Store
4,Grandview-Woodland,0.17491,Coffee Shop,Pizza Place,Brewery,Indian Restaurant,Cuban Restaurant,Tapas Restaurant,Scandinavian Restaurant,Mexican Restaurant,Pub,French Restaurant
11,Mount Pleasant,0.209619,Brewery,Coffee Shop,Sushi Restaurant,Yoga Studio,Bookstore,Burrito Place,Dessert Shop,Diner,Outdoor Sculpture,Donut Shop


In [26]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 1, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Kensington-Cedar Cottage,0.086325,Vietnamese Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Café,Cantonese Restaurant,Smoke Shop,Seafood Restaurant,Burger Joint,Sandwich Place,Grocery Store
10,Marpole,0.102044,Sushi Restaurant,Chinese Restaurant,Bank,Café,Japanese Restaurant,American Restaurant,Scenic Lookout,Sandwich Place,Restaurant,Coffee Shop
12,Oakridge,0.106907,Fast Food Restaurant,Coffee Shop,Sushi Restaurant,Bubble Tea Shop,Tea Room,Shopping Mall,Chocolate Shop,Restaurant,Burger Joint,Light Rail Station
18,Sunset,0.079205,Indian Restaurant,Chinese Restaurant,Pharmacy,Restaurant,Bakery,Bank,Coffee Shop,Sushi Restaurant,Cosmetics Shop,Dessert Shop


In [27]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 2, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Renfrew-Collingwood,0.103144,Park,Pizza Place,Bus Stop,Bookstore,Plaza,Chinese Restaurant,Business Service,Bus Station,Deli / Bodega,Malay Restaurant


In [28]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 3, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arbutus Ridge,0.061,Bakery,Convenience Store,Sushi Restaurant,Park,Coffee Shop,Discount Store,Caribbean Restaurant,Sandwich Place,Event Space,Shopping Mall
3,Fairview,0.170791,Japanese Restaurant,Camera Store,Park,Restaurant,Bakery,Coffee Shop,French Restaurant,Kitchen Supply Store,Juice Bar,Breakfast Spot
5,Hastings-Sunrise,0.10739,Coffee Shop,Vietnamese Restaurant,Event Space,Sushi Restaurant,Pharmacy,Convenience Store,Chinese Restaurant,Middle Eastern Restaurant,Fair,Sandwich Place
8,Killarney,0.053981,Bus Stop,Coffee Shop,Sushi Restaurant,Grocery Store,Gas Station,Juice Bar,Liquor Store,Fast Food Restaurant,Farmers Market,Park
9,Kitsilano,0.11646,Restaurant,Yoga Studio,Coffee Shop,French Restaurant,Bakery,Beach,Ice Cream Shop,Asian Restaurant,Record Shop,Pool
14,Riley Park,0.112924,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Café,Farmers Market,Seafood Restaurant,Coffee Shop,Gym,Furniture / Home Store,Restaurant,Falafel Restaurant
17,Strathcona,0.467461,Pizza Place,Café,Restaurant,Sandwich Place,Park,Gourmet Shop,Asian Restaurant,Sushi Restaurant,Cheese Shop,Chinese Restaurant
19,Victoria-Fraserview,0.054949,Pizza Place,Gas Station,Sandwich Place,Convenience Store,Park,Restaurant,Post Office,Pharmacy,Pet Store,Cosmetics Shop
20,West End,0.182945,Farmers Market,Bakery,Restaurant,Park,Bookstore,Convenience Store,Sushi Restaurant,Greek Restaurant,Falafel Restaurant,Café
21,West Point Grey,0.088557,Bakery,Harbor / Marina,Sushi Restaurant,Park,Beach,Bank,Pizza Place,Bus Stop,Café,Sandwich Place


In [29]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 4, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Kerrisdale,0.087442,Golf Course,Bus Stop,Spanish Restaurant,Park,Café,Supermarket,Gift Shop,Pool,Grocery Store,Gourmet Shop


In [30]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 5, vancouver_merged.columns[[0] + [5] + list(range(7, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Crime per Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Shaughnessy,0.129775,Park,Coffee Shop,Bus Stop,Bank,Sporting Goods Shop,Sushi Restaurant,Bubble Tea Shop,Malay Restaurant,Sandwich Place,Garden
16,South Cambie,0.139272,Coffee Shop,Park,Garden,Grocery Store,Bank,Sandwich Place,Sushi Restaurant,Chinese Restaurant,Seafood Restaurant,Bubble Tea Shop


In [35]:
vancouver_merged_2 = vancouver_merged[vancouver_merged["Cluster Labels"].isin([5, 2,4])]
vancouver_merged_2

Unnamed: 0,Neighbourhood,Latitude,Longitude,Population,Total Crime,Crime per Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Kerrisdale,49.220985,-123.159548,13975,1222,0.087442,4,Golf Course,Bus Stop,Spanish Restaurant,Park,Café,Supermarket,Gift Shop,Pool,Grocery Store,Gourmet Shop
13,Renfrew-Collingwood,49.248577,-123.040179,51530,5315,0.103144,2,Park,Pizza Place,Bus Stop,Bookstore,Plaza,Chinese Restaurant,Business Service,Bus Station,Deli / Bodega,Malay Restaurant
15,Shaughnessy,49.246305,-123.138405,8430,1094,0.129775,5,Park,Coffee Shop,Bus Stop,Bank,Sporting Goods Shop,Sushi Restaurant,Bubble Tea Shop,Malay Restaurant,Sandwich Place,Garden
16,South Cambie,49.246464,-123.121603,7970,1110,0.139272,5,Coffee Shop,Park,Garden,Grocery Store,Bank,Sandwich Place,Sushi Restaurant,Chinese Restaurant,Seafood Restaurant,Bubble Tea Shop
