## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

### Part I
1. Start by creating a new Notebook for this assignment.
1. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe like the one shown below:

In [1]:
import requests
import csv
import pandas as pd
import json
from pandas.io.json import json_normalize
import numpy as np


In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url, header = 0)[0]

df.head()

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.
* If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.
* Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

Filter out 'Not assigned' values:

In [8]:
filtered_df = df[df['Borough'] != 'Not assigned']

filtered_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Group similar postal codes:

In [13]:
df_grouped = filtered_df.groupby(['Postal Code', 'Borough'], as_index = False).agg(lambda x: ", ".join(x))
df_grouped['Neighborhood'] = df_grouped['Neighbourhood'].str.replace('/', ',')

df_grouped.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge","Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek","Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill","Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn,Woburn
4,M1H,Scarborough,Cedarbrae,Cedarbrae


In [14]:
df_grouped.shape

(103, 4)

✅ Part one 

### Part II

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

In [16]:
geospatial = pd.read_csv('https://cocl.us/Geospatial_data')
geospatial

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [18]:
# I will use filtered_df to avoid merged neighbourhoods

df_coords = pd.concat([filtered_df, geospatial], axis = 1)
df_coords.dropna(inplace = True)

df_coords

Unnamed: 0,Postal Code,Borough,Neighbourhood,Postal Code.1,Latitude,Longitude
2,M3A,North York,Parkwoods,M1E,43.763573,-79.188711
3,M4A,North York,Victoria Village,M1G,43.770992,-79.216917
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",M1H,43.773136,-79.239476
5,M6A,North York,"Lawrence Manor, Lawrence Heights",M1J,43.744734,-79.239476
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",M1K,43.727929,-79.262029
...,...,...,...,...,...,...
95,M6N,York,"Runnymede, The Junction North",M9C,43.643515,-79.577201
98,M9N,York,Weston,M9N,43.706876,-79.518188
99,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",M9P,43.696319,-79.532242
100,M2P,North York,York Mills West,M9R,43.688905,-79.554724


Install folium library for map visualization:

In [19]:
! pip install folium==0.5.0
import folium # plotting library

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 8.7 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=eec0bca8f50294a2a418ca0efdb78e70ec771b674f1b8abcccb994cbcf94d548
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0


### Part III

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
* to generate maps to visualize your neighborhoods and how they cluster together. 

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

In [35]:
! pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 8.4 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


Google Maps requires money which I am not comfortable with since I'll be publishing this Notebook at GitHub and keys might fly off. So, I'll use OpenStreetMaps.

It would be nice if people who run this course would be more helpful. Right now they seem to be pretty useless. I have to google, check stackoverflow, and read official documentation to submit this assignment. I could've done that without the course, right? IBM just failed a little bit.

In [44]:

#url = 'https://maps.googleapis.com/maps/api/geocode/json'
#params = {'sensor': 'false', 'address': 'Toronto, Ontario'}
#r = requests.get(url, params = params)

#print(r.json())

#results = r.json()['results']
#location = results[0]['geometry']['location']

#latitude = location['lat']
#longitude = location['lng']

import geocoder
address = 'Toronto, Ontario'
g = geocoder.osm(address)

print(g.latlng)

latitude = g.lat
longitude = g.lng

print('The geograpical coordinate of Toronto City are ' + str(latitude) + ', ' + str(longitude))

[43.6534817, -79.3839347]
The geograpical coordinate of Toronto City are 43.6534817, -79.3839347


### Part IV

Let's create a map using folium

In [56]:

# create map of Toronto using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start = 12)

for lat, lng, borough, neighbourhood in zip(df_coords['Latitude'], df_coords['Longitude'], df_coords['Borough'], df_coords['Neighbourhood']):
    label = neighbourhood + ', ' + borough
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 8,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = 'white',
        fill_opacity = 0.3,
        parse_html = True).add_to(map)  
    
map

### Part V

Now it's time to categorize and cluster the neighbourhoods.

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
* to generate maps to visualize your neighborhoods and how they cluster together. 

In [57]:
ny_data = df_coords[df_coords['Borough'] == 'North York'].reset_index(drop = True)
ny_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Postal Code.1,Latitude,Longitude
0,M3A,North York,Parkwoods,M1E,43.763573,-79.188711
1,M4A,North York,Victoria Village,M1G,43.770992,-79.216917
2,M6A,North York,"Lawrence Manor, Lawrence Heights",M1J,43.744734,-79.239476
3,M3B,North York,Don Mills,M1R,43.750072,-79.295849
4,M6B,North York,Glencairn,M1V,43.815252,-79.284577


In [58]:
address_ny = 'North York, Toronto'
g_ny = geocoder.osm(address_ny)

print(g_ny.latlng)

latitude_ny = g_ny.lat
longitude_ny = g_ny.lng


[43.7543263, -79.44911696639593]


In [59]:
# create map of Scarborough using latitude and longitude values
map_ny = folium.Map(location=[latitude_ny, longitude_ny], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(ny_data['Latitude'], ny_data['Longitude'], ny_data['Borough'], ny_data['Neighbourhood']):
    label = neighbourhood + ', ' + borough
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 8,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = 'white',
        fill_opacity = 0.3,
        parse_html = True).add_to(map_ny)
    
map_ny

#### Foursquare

I will fetch all the information about the neighborhood from Foursquare using explore and venues API calls

In [95]:
# The code was removed by Watson Studio for sharing.

{'meta': {'code': 200, 'requestId': '6009c777574e826918c83d9d'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Scarborough Village',
  'headerFullLocation': 'Scarborough Village, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 8,
  'suggestedBounds': {'ne': {'lat': 43.768072604500006,
    'lng': -79.18249216787879},
   'sw': {'lat': 43.7590725955, 'lng': -79.1949308321212}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4beee041e24d20a1cd857314',
       'name': 'RBC Royal Bank',
       'location': {'address': '4374 KINGSTON RD',
        'crossStreet': 'Kingston & Lawrence',
        'lat': 43.76678992471017,
        'lng': -79.19115118872593,
        '

In [78]:
result['response']

{'suggestedFilters': {'header': 'Tap to show:',
  'filters': [{'name': 'Open now', 'key': 'openNow'}]},
 'headerLocation': 'Scarborough Village',
 'headerFullLocation': 'Scarborough Village, Toronto',
 'headerLocationGranularity': 'neighborhood',
 'totalResults': 8,
 'suggestedBounds': {'ne': {'lat': 43.768072604500006,
   'lng': -79.18249216787879},
  'sw': {'lat': 43.7590725955, 'lng': -79.1949308321212}},
 'groups': [{'type': 'Recommended Places',
   'name': 'recommended',
   'items': [{'reasons': {'count': 0,
      'items': [{'summary': 'This spot is popular',
        'type': 'general',
        'reasonName': 'globalInteractionReason'}]},
     'venue': {'id': '4beee041e24d20a1cd857314',
      'name': 'RBC Royal Bank',
      'location': {'address': '4374 KINGSTON RD',
       'crossStreet': 'Kingston & Lawrence',
       'lat': 43.76678992471017,
       'lng': -79.19115118872593,
       'labeledLatLngs': [{'label': 'display',
         'lat': 43.76678992471017,
         'lng': -79.19115

Get all the venues in the proximity of this neighborhood

In [116]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    

def venues_to_df(venues):
    prox_venues = pd.json_normalize(venues) # flatten JSON

    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    prox_venues = prox_venues.loc[:, filtered_columns]

    prox_venues['venue.categories'] = prox_venues.apply(get_category_type, axis = 1)
    prox_venues.columns = [col.split(".")[-1] for col in prox_venues.columns]

    return prox_venues

def get_venues(names, latitudes, longitudes):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        results = venues_from_foursquare(lat, lng)
        venues_list.append([(
            name, 
            lat, 
            lng, 
            row['venue']['name'], 
            row['venue']['location']['lat'], 
            row['venue']['location']['lng'],  
            row['venue']['categories'][0]['name']) for row in results['response']['groups'][0]['items']])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    return(nearby_venues)

    #return venues_to_df(result['response']['groups'][0]['items'])



In [100]:
nearby_venues = venues_to_df(result['response']['groups'][0]['items'])
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,RBC Royal Bank,Bank,43.76679,-79.191151
1,G & G Electronics,Electronics Store,43.765309,-79.191537
2,Sail Sushi,Restaurant,43.765951,-79.191275
3,Big Bite Burrito,Mexican Restaurant,43.766299,-79.19072
4,Enterprise Rent-A-Car,Rental Car Location,43.764076,-79.193406


In [90]:
nearby_venues.shape

(8, 4)

In order to easily get the venues in one function call, I extract it to the function. Now I can run it for all the boroughs in the loop

In [117]:
ny_venues = get_venues(names = ny_data['Neighbourhood'], latitudes = ny_data['Latitude'], longitudes = ny_data['Longitude'])
ny_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
1,Parkwoods,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
2,Parkwoods,43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant
3,Parkwoods,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
4,Parkwoods,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location


In [118]:
ny_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",30,30,30,30,30,30
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",30,30,30,30,30,30
Don Mills,6,6,6,6,6,6
Downsview,52,52,52,52,52,52
"Fairview, Henry Farm, Oriole",30,30,30,30,30,30
Glencairn,3,3,3,3,3,3
Hillcrest Village,19,19,19,19,19,19
Humber Summit,5,5,5,5,5,5
"Humberlea, Emery",6,6,6,6,6,6


In [119]:
len(ny_venues['Venue Category'].unique())

122