# Segmenting and Clustering Neighborhoods in Toronto

<h3>Introduction</h3>
<p>
This is an assignment for the Introduction to Artifical Intelligence course (SOFE 3720U). Within this we will be explore how to segement and cluster the neighborhoods in Toronto. 
</p>

<h3>Import Statements</h3>

In [10]:
from dotenv import load_dotenv
from dotenv import dotenv_values

import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
import geojson

import requests
from pandas import json_normalize

import folium

from bs4 import BeautifulSoup as bs

print('Libraries Imported')

Libraries Imported


<h3>Week 1 - Foursquare API</h3>
<p>
Within this section we will be using the Foursquare API to find latitude, longitude, and venues within the Toronto area.
</p>

<h4>
Setting up Foursquare API
</h4>

In [11]:
#Import the hidden values within the .env file, these values are keys used to access API
config = dotenv_values(".env")
#Assign url variable to initialize API
url = "https://api.foursquare.com/v3/places/nearby"
#Assign header which will allow us to access the website by passing through keys 
headers = {"Accept": "application/json","Authorization": config["API_KEY"]}
#Create the request statement which allows to freely use API
response = requests.request("GET", url, headers=headers)

#Initialize and define findNearbyVenues function
def findNearbyVenues(location, categories, limit):
    #Assign url variable which is based on the parameters passed through function call
    url = "https://api.foursquare.com/v3/places/search?" + "categories=" + categories + "&near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    #Create the response statement from requesting from API
    response = requests.request("GET", url, headers=headers)
    #Return the result if the code was successful
    if(response.status_code == 200):
        return response.json()
    #Return false if the code didn't work
    else:
        return False
    
print('API initialize and custom function created')

API initialize and custom function created


<h4>
Creating dataframe using the function created to use Foursquare API
</h4>

In [12]:
#Assign lat and long for Toronto
latitude = 43.6532 
longitude = -79.3832
#Assign result variable use the custom function with the parameters below
results = findNearbyVenues(location = ["Toronto", "ON"], categories="17000", limit="50")
#Normalize the results to be able to changed
venuesdf = json_normalize(results['results'], max_level=3)
#Drop unnecessary columns
venuesdf.drop(venuesdf.columns[[0,1,2,3,5,8,9,10,11,12,13,17,18,19,20,22]], axis=1, inplace=True)
#Display first five rows
venuesdf.head()

Unnamed: 0,name,geocodes.main.latitude,geocodes.main.longitude,location.locality,location.neighborhood,location.postcode,related_places.parent.name
0,Fiesta Farms,43.668877,-79.420664,Toronto,[Christie Pitts],M6G 3B6,
1,Yorkdale,43.725902,-79.453193,Toronto,[Downsview],M6A 2T9,
2,Uniqlo ユニクロ,43.726446,-79.450564,Toronto,"[Lawrence Heights, Toronto, ON]",M6A 2T9,Yorkdale
3,Loblaws,43.661965,-79.379811,Toronto,,M5B 1J2,
4,CF Toronto Eaton Centre,43.653807,-79.380056,Toronto,[Eaton Centre],M5B 2H1,


In [13]:
#Create the map which is based on the coordinates for Toronto
map_creation = folium.Map(location=[latitude, longitude], zoom_start=10)
#Display Map
map_creation

<h3>Week 2 - Prepare your data</h3>
<p>
Within this section we will be using the provide source to create a large dataframe which contains the neccessary information for the choosen correlations
</p>

In [14]:
url = 'https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050'
temp = requests.get(url)
data = temp.text
soup = bs(data,'html.parser')
wiki = soup.find('table')
df = pd.read_html(str(wiki))[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [15]:
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace=True)
df.index = range(len(df))
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


In [16]:
dfPostalCodes = pd.read_csv('Geospatial_Coordinates.csv')
dfPostalCodes.rename(columns={'Postal Code':'Postcode'}, inplace=True)
dfMerge = pd.merge(df, dfPostalCodes, on='Postcode')
dfMerge.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Heights,43.718518,-79.464763
4,M6A,North York,Lawrence Manor,43.718518,-79.464763


In [22]:
with open('Crime_Rates.geojson') as f:
    data = geojson.load(f)
dfCrime=pd.json_normalize(data["features"])
dfCrime=dfCrime.drop(columns=["type", "geometry.type", "geometry.coordinates", "properties.OBJECTID", "properties.Shape__Area", "properties.Shape__Length"])
dfCrime.rename(columns={'properties.Neighbourhood':'Neighbourhood'}, inplace=True)
dfCrime.head()

Unnamed: 0,Neighbourhood,properties.Hood_ID,properties.Population,properties.Assault_2014,properties.Assault_2015,properties.Assault_2016,properties.Assault_2017,properties.Assault_2018,properties.Assault_2019,properties.Assault_AVG,properties.Assault_CHG,properties.Assault_Rate_2019,properties.AutoTheft_2014,properties.AutoTheft_2015,properties.AutoTheft_2016,properties.AutoTheft_2017,properties.AutoTheft_2018,properties.AutoTheft_2019,properties.AutoTheft_AVG,properties.AutoTheft_CHG,properties.AutoTheft_Rate_2019,properties.BreakandEnter_2014,properties.BreakandEnter_2015,properties.BreakandEnter_2016,properties.BreakandEnter_2017,properties.BreakandEnter_2018,properties.BreakandEnter_2019,properties.BreakandEnter_AVG,properties.BreakandEnter_CHG,properties.BreakandEnter_Rate_2019,properties.Homicide_2014,properties.Homicide_2015,properties.Homicide_2016,properties.Homicide_2017,properties.Homicide_2018,properties.Homicide_2019,properties.Homicide_AVG,properties.Homicide_CHG,properties.Homicide_Rate_2019,properties.Robbery_2014,properties.Robbery_2015,properties.Robbery_2016,properties.Robbery_2017,properties.Robbery_2018,properties.Robbery_2019,properties.Robbery_AVG,properties.Robbery_CHG,properties.Robbery_Rate_2019,properties.TheftOver_2014,properties.TheftOver_2015,properties.TheftOver_2016,properties.TheftOver_2017,properties.TheftOver_2018,properties.TheftOver_2019,properties.TheftOver_AVG,properties.TheftOver_CHG,properties.TheftOver_Rate_2019
0,Yonge-St.Clair,97,12528,20,29,39,27,34,37,31.0,0.09,295.3,2,3,7,2,6,6,4.3,0.0,47.9,37,20,12,19,24,28,23.3,0.17,223.5,0,0,0,0,0,0,0.0,0.0,0.0,6,5,6,8,5,4,5.7,-0.2,31.9,4,5,8,0,3,6,4.3,1.0,47.9
1,York University Heights,27,27593,271,296,361,344,357,370,333.2,0.04,1340.9,105,100,105,92,92,144,106.3,0.57,521.9,107,139,98,105,122,108,113.2,-0.11,391.4,1,0,2,1,1,0,0.8,-1.0,0.0,59,84,70,75,88,79,75.8,-0.1,286.3,30,46,37,39,38,28,36.3,-0.26,101.5
2,Lansing-Westgate,38,16164,44,80,68,85,75,72,70.7,-0.04,445.4,19,22,27,26,16,32,23.7,1.0,198.0,34,27,41,42,50,39,38.8,-0.22,241.3,0,0,0,0,10,0,1.7,-1.0,0.0,11,5,9,17,35,11,14.7,-0.69,68.1,4,5,5,11,6,11,7.0,0.83,68.1
3,Yorkdale-Glen Park,31,14804,106,136,174,161,175,209,160.2,0.19,1411.8,63,53,41,52,63,61,55.5,-0.03,412.1,51,57,66,58,64,84,63.3,0.31,567.4,1,1,1,1,2,1,1.2,-0.5,6.8,23,21,24,35,44,42,31.5,-0.05,283.7,23,14,26,23,20,29,22.5,0.45,195.9
4,Stonegate-Queensway,16,25051,88,71,76,95,87,82,83.2,-0.06,327.3,34,29,12,32,31,34,28.7,0.1,135.7,71,45,49,49,39,64,52.8,0.64,255.5,0,0,0,0,0,0,0.0,0.0,0.0,21,14,16,26,25,22,20.7,-0.12,87.8,7,8,4,6,7,4,6.0,-0.43,16.0
