# Segmenting and Clustering Neighborhoods in Toronto - Part 2
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

* [Step 1: Load the data to the dataframe](#1)<br>
* [Step 2: Process the data](#2)<br>
    * [2.1. Ignore rows with a Borough that is "Not assigned"](#21)<br>
    * [2.2. Combine neighborhoods that are in the same Borough](#22)<br>
    * [2.3. Name "Not assigned" neighbourhood after the Borough](#23)<br>
* [Step 3. Load and include the geospatial data](#3)<br>

## Step 1: Load the data to a dataframe<a id='1'></a>

In [54]:
#Load necessary libraries
import pandas as pd
import requests

In [185]:
from bs4 import BeautifulSoup
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
soup = BeautifulSoup(requests.get(url).text,'lxml')
My_table = soup.find('table',{'class':'wikitable sortable'})

My_table
table_rows = My_table.find_all('tr')

t=[]
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.rstrip('\n') for tr in td]
    t.append(row)
df=pd.DataFrame(t)

df.columns=['Postcode','Borough','Neighbourhood']

df.drop([0],axis=0,inplace=True)

df.reset_index()

df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


#### A different way to load the table

<details><summary>Click here to see...</summary>
<p>


```python
#Use pd.read_html to load the table
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

req = requests.get(url, headers=header)
tables = pd.read_html(req.text)

#load tables to a dataframe
df=pd.DataFrame(tables[0])

df.head()

```

</p>
</details>

## Step 2. Process the data<a id='2'></a>

### 2.1. Ignore rows with a Borough that is "Not assigned"<a id='21'></a>

In [181]:
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace = True)

#re-index the dataframe
df = df.reset_index(drop=True)

df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


### 2.2. Combine neighborhoods that are in the same Borough<a id='22'></a>

In [182]:
df_new = df.groupby("Postcode").agg(lambda x:', '.join(set(x)))
df_new = df_new.reset_index()
df_new.loc[51:55]

Unnamed: 0,Postcode,Borough,Neighbourhood
51,M4X,Downtown Toronto,"Cabbagetown, St. James Town"
52,M4Y,Downtown Toronto,Church and Wellesley
53,M5A,Downtown Toronto,"Harbourfront, Regent Park"
54,M5B,Downtown Toronto,"Ryerson, Garden District"
55,M5C,Downtown Toronto,St. James Town


### 2.3. Name "Not assigned" neighbourhood after the Borough<a id='23'></a>

In [183]:
for index, row in df_new.iterrows():
    if df_new.loc[index, 'Neighbourhood'] == "Not assigned":
        df_new.loc[index, 'Neighbourhood'] = df_new.loc[index, 'Borough']
        print(index)

85


In [184]:
df_new.shape

(103, 3)

## Step 3. Load and include the geospatial data<a id='3'></a>

In [192]:
g_data = pd.read_csv("https://cocl.us/Geospatial_data")
g_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [193]:
df_g = df_new
df_g['Latitude'] = g_data['Latitude'].values
df_g['Longitude'] = g_data['Longitude'].values
df_g.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Port Union, Highland Creek, Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Morningside, Guildwood, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
