# Applied Data Science Capstone
<i>IBM Data Science Coursera Course</i>

## Jupyter Notebook for Capstone Project
By Xander Mol

### Week 3 Peer Graded Assignment

### Step 1: Create dataframe of neighborhoods in Toronto from Wikipedia page

In [1]:
#Imports
import pandas as pd

#Scrape table from website using Pandas
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
neigh=pd.read_html(url, header=0)[0]
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [2]:
#Get original dimensions
neigh.shape

(288, 3)

In [3]:
#Drop all rows with Neighbourhood is Not assigned
neigh.drop(neigh[neigh.Borough == 'Not assigned'].index, inplace=True)
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [4]:
#New dimensions
neigh.shape

(211, 3)

In [5]:
#Aggregate per postcode, seperating with comma
neigh = neigh.groupby(['Postcode' , 'Borough'], as_index=False).agg( ','.join)
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [10]:
#New dimensions
neigh.shape

(103, 3)

In [6]:
#Change Not assigned neighbourhood to name of Borough
neigh.loc[neigh['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = neigh['Borough']
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
#Check on final dimension
neigh.shape

(103, 3)

### Step 2: Get the latitude and the longitude coordinates of each neighborhood

In [29]:
#Imports and installs
!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          59 KB

The following NEW packages will be INSTALLED:

    geocoder: 1.38.1-py_1 conda-forge
    ratelim:  0.1.6-py_2  conda-forge


Downloading and Extracting Packages
ratelim-0.1.6        | 6 KB      | ##################################### | 100% 
geocoder-1.38.1      | 53 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [42]:
for postal_code in neigh['Postcode']:

    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    neigh.loc[neigh['Postcode'] == postal_code, 'Latitude'] = lat_lng_coords[0]
    neigh.loc[neigh['Postcode'] == postal_code, 'Longitude'] = lat_lng_coords[1]
    
neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


In [30]:
#Download CSV file provided as cross check
latlongcheck = pd.read_csv('https://cocl.us/Geospatial_data')
latlongcheck.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<b>NB: Obtained lat/long values from geocoder.arcgis deviate slightly from checkfile from delivered CSV. However, differences are small, so decided to use own obtained long/lat values instead of delivered CSV file</b>

Complete resulting dataframe:

In [43]:
neigh

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.785730,-79.158750
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.765690,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.217590
4,M1H,Scarborough,Cedarbrae,43.769688,-79.239440
5,M1J,Scarborough,Scarborough Village,43.743125,-79.231750
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.726245,-79.263670
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.713133,-79.285055
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.723575,-79.234976
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.696665,-79.260163
