## **Get the latitude and the longitude of each neighborhood - Question 2**
Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

## Scrape the data from Wikipedia Page - Question 1
Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
* If a cell has a borough but a **Not assigned** neighborhood, then the neighborhood will be the same as the borough.
* Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the **.shape** method to print the number of rows of your dataframe.

In [1]:
pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 20.0MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/02/fb/1c65691a9aeb7bd6ac2aa505b84cb8b49ac29c976411c6ab3659425e045f/soupsieve-2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.1
Note: you may need to restart the kernel to use updated packages.


In [2]:
# import the library we use to open URLs
import urllib.request
# import the BeautifulSoup library so we can parse HTML and XML documents
from bs4 import BeautifulSoup

In [3]:
# specify which URL/web page we are going to be scraping
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
# open the url using urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)
# parse the HTML from our URL into the BeautifulSoup python built-in parser
soup = BeautifulSoup(page, "html.parser")

In [4]:
soup.title.string

'List of postal codes of Canada: M - Wikipedia'

In [5]:
#scrape the table from thw wikipedia page
wiki_table=soup.find('table', class_='wikitable sortable')

A=[] #Postal Code
B=[] #Borough
C=[] #Neghborhood

for row in wiki_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True).rstrip())
        B.append(cells[1].find(text=True).rstrip())
        C.append(cells[2].find(text=True).rstrip())


In [30]:
import pandas as pd
#read the table into pandas df TorontoPosts
TorontoPosts=pd.DataFrame(A,columns=['Postal Code'])
TorontoPosts['Borough']=B
TorontoPosts['Neghborhood']=C
#drop rows where Borough is unassigned
TorontoPosts = TorontoPosts[~TorontoPosts['Borough'].isin(['Not assigned'])]
#replace unassigned values of Neghborhood with Borough
TorontoPosts[TorontoPosts['Neghborhood'].isin(['Not assigned'])].replace(to_replace = TorontoPosts['Neghborhood'],value = TorontoPosts['Borough'], inplace=True)

#reset index
TorontoPosts.reset_index(drop=True, inplace=True)
#no need to append Neghborhoods as all postal code values are unique
TorontoPosts.head()

Unnamed: 0,Postal Code,Borough,Neghborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [36]:
print('The shape of the DataFrame is: {}.'.format(TorontoPosts.shape))

The shape of the DataFrame is: (103, 3).


## **Get the latitude and the longitude of each neighborhood - Question 2**
Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In [32]:
#not used eventually
def get_geocode(postal):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [38]:
geo_data=pd.read_csv('https://cocl.us/Geospatial_data')
print('The shape of the DataFrame is: {}.'.format(geo_data.shape))
geo_data.head()


The shape of the DataFrame is: (103, 3).


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [37]:
Merged_df = pd.merge(TorontoPosts, geo_data, on = 'Postal Code')
print('The shape of the DataFrame is: {}.'.format(Merged_df.shape))
Merged_df.head()

The shape of the DataFrame is: (103, 5).


Unnamed: 0,Postal Code,Borough,Neghborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
