**<h2>Importing Libraries**

In [0]:
import numpy  as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt


**<h2>Using beautifulsoup for scraping**

In [0]:
url=('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
page=requests.get(url)

soup=BeautifulSoup(page.content,'html.parser')

**<h2>Getting the table**

We observe that the lines of html code containing the table have tags like 'table' with the class name 'wikitable sortable'.We will use this information to scrape the table first.


In [4]:
table=soup.find('table',class_='wikitable sortable')
print(table)

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td>

**<h2>Getting the rows**

Next,we can notice that a new row starts with the tag 'tr' and the column entry is through 'td'. We use this to scrape the postalcodes , boroughlist and neighbourhood list. 

In [0]:
postalCodeList = []
boroughList = []
neighbourhoodList = []

In [0]:
for row in table.find_all('tr'):
  cells=row.find_all('td')
  if(len(cells)>0):#as we are skipping the header
    postalCodeList.append(cells[0].text.rstrip('\n')) #the rstrip helps avoid the newline charachter
    boroughList.append(cells[1].text.rstrip('\n'))
    neighbourhoodList.append(cells[2].text.rstrip('\n'))


In [7]:
toronto_df = pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList,
                           "Neighbourhood": neighbourhoodList})

toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


**<h2> Dropping values**

We'll be dropping of the Borough Values which aren't assigned.

In [8]:
toronto_df.drop(toronto_df[toronto_df['Borough']=='Not assigned'].index,axis=0,inplace=True)
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


**<h2>Grouping Data**

Next we will be grouping the data on the basis of same Postal codes.

In [9]:
toronto_df=toronto_df.groupby(["PostalCode", "Borough"], as_index=False).agg(lambda x: ", ".join(x))
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


**<h2>Removing the same 'Not assigned from Neighbourhood'**



In [11]:
x=list(toronto_df['Neighbourhood'])
y=list(toronto_df['Borough'])
for i in range(len(x)):
  if(x[i]=='Not assigned'):
    x[i]=y[i]
toronto_df['Neighbourhood']=x
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [12]:
toronto_df.shape

(103, 3)

**<h2>Adding Location Data**

Uploading the data.

In [1]:
from google.colab import files
uploaded = files.upload()

Saving Geospatial_Coordinates.csv to Geospatial_Coordinates.csv


In [14]:
import io
df = pd.read_csv(io.StringIO(uploaded['Geospatial_Coordinates.csv'].decode('utf-8')))
df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**Renaming the postal code column so as to perform merge function**

In [16]:
df.rename(columns={"Postal Code":"PostalCode"},inplace=True)
df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**<h2>Merging**

We will be merging the two data frames : toronto_df and df.

In [17]:
new_data=pd.merge(toronto_df,df)
new_data.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
