## Scrapper

The first thing that we need to do is build a scrapper that will scrap :https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M 

before we begin we have to import all of the important packages that we will be using

In [1]:
import pandas as pd
import numpy as np
import json 
import requests
from bs4 import BeautifulSoup

This step we built a web scrapper that will identify the table part of the wiki page so that it can then be turned into a 
dataframe using pandas read_html method. We then drop the rows that have a Not assigned Borough and combine the rows that have the same Postcode and replace all not assnged neighbourhoods with the value of the borough

In [51]:
def scrapper(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content,'html.parser')
    results = soup.findAll("table", {"class": "wikitable sortable"})
    return str(results[0])
a = scrapper('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [64]:
#data cleaning 
df = pd.read_html(a)[0]
df = df.drop(df[df.Borough == 'Not assigned'].index)
agg_function = {'Borough':'first','Neighbourhood':', '.join}
df = df.groupby(df['Postcode']).agg(agg_function).reset_index()
df['Neighbourhood'] = df.apply(
    lambda row: row['Borough'] if row['Neighbourhood'] == 'Not assigned' else row['Neighbourhood'],axis=1)
df = df.rename(columns={'Postcode':'PostalCode'})
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [53]:
df.shape

(103, 3)

## Part 2 

This is the next part of the capstone using: http://cocl.us/Geospatial_data 

In [65]:
geo_df = pd.read_csv('http://cocl.us/Geospatial_data ')
geo_df = geo_df.rename(columns={'Postal Code':'PostalCode'})
geo_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now we will combine the two dataframes using the postal code to produce a new dataframe

In [66]:
merge_df = pd.merge(left=df,right=geo_df,how='left', left_on='PostalCode',right_on='PostalCode')
merge_df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437


## Part 3

in this next part what we are going to be doing is explore and cluster the neighborhoods in Toronto. To do that we will need to import other packages.

In [69]:
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2019.8.28-0, anaconda/win-64::openssl-1.1.1d-he774522_2
  - anaconda/win-64::openssl-1.1.1d-he774522_2, defaults/win-64::ca-certificates-2019.8.28-0
  - anaconda/win-64::ca-certificates-2019.8.28-0, defaults/win-64::openssl-1.1.1d-he774522_2
  - defaults/win-64::ca-certificates-2019.8.28-0, defaults/win-64::openssl-1.1.1d-he774522_2done

## Package Plan ##

  environment location: C:\Users\William James Ngana\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                           

Now that all the libraries have been imported, the next step is to our credentials from out Foursquare developer account

In [70]:
CLIENT_ID = 'lol cant show you this' # your Foursquare ID
CLIENT_SECRET = 'lol this too' # your Foursquare Secret
VERSION = '20200205'
LIMIT = 30

Now we are definening the instance of the geocoder to find the latitude and longitude of Toronto

In [72]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="6_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


This step I create a map of Toronto with the neighbourhoods superimposed ontop

In [75]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(merge_df['Latitude'], merge_df['Longitude'], merge_df['Borough'], merge_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto