# For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.
2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

In [59]:
# import all the dependencies
import numpy as np 
import pandas as pd 
from bs4 import BeautifulSoup
import requests
import geocoder
from geopy.geocoders import Nominatim 
import folium
print('Libraries imported.')

Libraries imported.


In [49]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'html.parser')
#soup

Find the raw table inside the webpage

In [None]:
table=soup.find("table")
table_rows = table.tbody.find_all("tr")
#table
#table_rows

# Processing 
1. Cleaning

In [39]:
res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    
    # Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    if row != [] and row[1] != "Not assigned\n":
        # If a cell has a borough but a "Not assigned\n" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned\n" in row[2]: 
            row[2] = row[1]
        res.append(row)

2. Defin the dataframe columns

In [40]:
df = pd.DataFrame(res, columns =  ['PostalCode','Borough', 'Neighborhood'] )
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"


3. remove '\n' in dataframe

In [41]:
df = df.replace('\n','',regex=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [48]:
print(df.shape)

(103, 3)


4. Get the latitude and the longitude coordinates of each neighborhood.

In [51]:
geo = pd.read_csv('http://cocl.us/Geospatial_data')
#geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [53]:
df_final = df.set_index('PostalCode').join(geo.set_index('Postal Code'))
df_final = df_final.sample(frac=1).reset_index(drop=True)
df_final.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Scarborough,Woburn,43.770992,-79.216917
1,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321
3,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
4,Scarborough,Scarborough Village,43.744734,-79.239476


# Explore and cluster the neighborhoods in Toronto

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.
2. to generate maps to visualize your neighborhoods and how they cluster together.

In [57]:
#get geographical coordinates of toronto
address = 'Toronto, ON'

geolocator = Nominatim(user_agent='toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 43.6534817, -79.3839347.


visualize Toronto the neighborhoods

In [61]:
#create map of toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(df_final['Latitude'], df_final['Longitude'], df_final['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto