# Segmenting and Clustering Neighborhoods in Toronto, Canada

## Aim

This notebook is going to get information of neighborhoods in Toronto, Canada and it will then demonstrate those locations in the map.
Given list of postal codes at <a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">this link (from Wikipedia)</a>

## Libraries

1. Pandas - To work with dataframe
2. Numpy - To work with arrays and numbers
3. BeautifulSoup - To effectively extract data from HTML format files
4. MatplotLib - To plot charts
5. Folium - To draw maps
6. Geopy - To convert a given address to latitude and longitute values

## Package installation

You will need to install geopy and folium if your computer doesn't have these packages.

```
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
```





In [57]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests


# !conda install -c conda-forge geopy --yes # I put a comment sign here as the Geopy package was already installed.
# !conda install -c conda-forge folium=0.5.0 --yes # I put a comment sign here as the Folium package was already installed.

Getting data from the list of postal codes and store to a dataframe

In [65]:
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

source = requests.get(link).text
soup = BeautifulSoup(source, 'lxml')
table = soup.find('table', class_="wikitable")

# column names for dataframe
column_names = ['Postcode', 'Borough', 'Neighborhood'] 

# init a new dataframe
df = pd.DataFrame(columns=column_names)

for row in table.findAll("tr"):
    items = row.findAll('td')
    if len(items) == 3: # to remove header row of a table
        if items[1].find(text=True) != 'Not assigned': # ignore non-assigned borough
            df = df.append({'Postcode':  items[0].find(text=True),
                            'Borough': items[1].find(text=True),
                            'Neighborhood': items[2].find(text=True)}, ignore_index=True) # append row data to the dataframe
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


Put neighborhoods together if they belong to the same Postcode and Borough

In [66]:
df =df.groupby(['Postcode','Borough'], as_index=False).agg('sum')
df.shape

(103, 3)

In [67]:
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,RougeMalvern
1,M1C,Scarborough,Highland CreekRouge HillPort Union
2,M1E,Scarborough,Guildwood\nMorningsideWest Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Getting geographical coordinates accordingly to postal codes. Given geographical data at <a href="http://cocl.us/Geospatial_data">this link</a>
We can use Google Maps Geocoding API instead but Google charges for API use, or we can use Geocoder Python package <a href="https://geocoder.readthedocs.io/index.html">(link)</a>, this package however have been developed and being considered unreliable. I am therefore using data from the Geospatial data (CSV).



In [68]:
geospatial_data_df = pd.read_csv("https://cocl.us/Geospatial_data") 

geospatial_data_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


A column named "Postal Code" should be changed to "Postcode" in order to be consistent with the column in df dataframe.

In [69]:
geospatial_data_df.rename({'Postal Code': 'Postcode'}, axis=1, inplace=True)

geospatial_data_df.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging two dataframes based on their "Postcode" for data preparation.

In [70]:
merged_df = pd.merge(df, geospatial_data_df, on='Postcode')
merged_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,RougeMalvern,43.806686,-79.194353
1,M1C,Scarborough,Highland CreekRouge HillPort Union,43.784535,-79.160497
2,M1E,Scarborough,Guildwood\nMorningsideWest Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Preparing a final dataframe for map visualization. I am going to draw only those boroughs contain the word "Toronto". Some boroughs are therefore ignored in this final dataframe.

In [71]:
# create new dataframe contains only the word Toronto
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# init the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
    
for lat, long, borough, neighborhood in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Borough'], merged_df['Neighborhood']):
    if 'Toronto' in borough: # to take only boroughs contain the word 'Toronto'
        neighborhoods = neighborhoods.append({'Borough':  borough,
                                          'Neighborhood': neighborhood,
                                          'Latitude': lat,
                                          'Longitude': long}, ignore_index=True)
    

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,East Toronto,The Beaches,43.676357,-79.293031
1,East Toronto,The Danforth West\nRiverdale,43.679557,-79.352188
2,East Toronto,The Beaches West\nIndia Bazaar,43.668999,-79.315572
3,East Toronto,Studio District,43.659526,-79.340923
4,Central Toronto,Lawrence Park,43.72802,-79.38879


## Map visualization

In [73]:
# !conda install -c conda-forge folium=0.5.0 --yes # uncomment if you wish to install folium package

import folium

In [74]:
from geopy.geocoders import Nominatim

In [75]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [76]:
# The code was removed by Watson Studio for sharing.

Getting Toronto's lat and long values.

In [77]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent=my_email) # my_email here is used to API calling, due to security I hided by an above hidden_cell.
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Canada are 43.653963, -79.387207.


### Here we are, the map

In [80]:

# create a map
map_clusters = folium.Map(location=[latitude,longitude], zoom_start=11)

for lat, long, borough, neighborhood in zip(neighborhoods['Latitude'],
                                           neighborhoods['Longitude'],
                                           neighborhoods['Borough'],
                                           neighborhoods['Neighborhood']):
    label = folium.Popup(str(borough) + ' borough cluster - ' + str(neighborhood), parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color="green",
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Enjoy!