### Import packages

In [None]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Get data

Get data from page of Wikipedia about the postal codes of Toronto and storage in **data** object.

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(url).text

### Parse data

Parse data with beautiful soup function.

In [54]:
soup = BeautifulSoup(data, 'html5lib')

### Find table

Use **soup.find()** to find and storage the tables with necessary data.

In [55]:
table_contents=[]
table=soup.find('table')

### Find data and format the table

Find the table elements, format the data and adrress it to the correct column in our dataframe.
We start to read every row of our table. If text is **Not assigned**, we pass. If not, we adress the **PostalCode**, **Borough** and **Neighborhood** information.

In [56]:
# Find datas
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

In [57]:
# Create and clean data
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

### Explore results

Print the 5 top rows of dataframe.

In [58]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### Demonstrate our dataframe dimensions

In [59]:
df.shape

(103, 3)

### Import GeoSpatial Dataset

Using the URL provided by the course, we will get the information about latitude and longitude of postal codes.

In [60]:
lat_lng_coords = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')

### Get the latitude and longitude 

We have to get the latitude and longitude information to every postal code in our dataset. So, let's merge df and lat_lng_coords dataset.

In [61]:
df = df.merge(lat_lng_coords, left_on='PostalCode', right_on='Postal Code', how='left')

In [65]:
df = df.drop(columns = ['Postal Code'])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
