## Segmenting and Clustering Neighborhoods in Toronto
#### For this assignment, we will explore and cluster the neighborhoods in Toronto.

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

Import all the required libraries

In [7]:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [9]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)

Using BeautifulSoup we scrap the downloaded wikipedia page

In [10]:
soup = BeautifulSoup(response.text, 'html.parser')

We prepare the data frame and group the data

In [11]:
canada_df = pd.DataFrame(columns=['PostalCode', 'Borough', 'Neighborhood'])
canada_df.head()

table = soup.find("table", {"class": "wikitable sortable"})
idx = 0
for row in  table.findAll("tr")[1:]:
    currentRow=row.findAll('td')
    neighborhoodLower = currentRow[2].text.rstrip("\n\r").lower()
    canada_df.loc[idx] = [currentRow[0].text,
                          currentRow[1].text.rstrip("\n\r"),
                          currentRow[2].text.rstrip("\n\r") if neighborhoodLower != 'not assigned' else currentRow[1].text.rstrip("\n\r")]
    idx = idx + 1
f  = { 'Borough': 'first', 'Neighborhood': lambda x: ', '.join(x)}
f2 = ['Borough']
canada_df = canada_df[canada_df['Borough']!= 'Not assigned']
groupdf = canada_df.groupby(['PostalCode']).agg(f).reset_index()

In [14]:
groupdf.head()
groupdf.shape

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Get the locations

In [17]:
dataframe = pd.read_csv('http://cocl.us/Geospatial_data')
dataframe = dataframe.rename(columns={'Postal Code':'PostalCode'})
dataframe.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [22]:
newdf = pd.merge(groupdf, dataframe, on='PostalCode', how='outer')

In [33]:
canada_locs = newdf[newdf['Borough'].str.contains('Toronto')].reset_index(drop=True)

Creare a map

In [36]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(canada_locs['Latitude'], canada_locs['Longitude'], canada_locs['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

