# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this section, I want to help people to open a restaurant at Toronto. So, how can we find the best place opening a restaurant? The best way is opening close to area which has lots of population. Each city has some neighborhoods and each neighborhood has distance to others. We need to segment and cluster their neighborhoods defining which area has more neighborhoods to figure out where has most populations.

I will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen.

## Data <a name="data"></a>

### A description of the data and how it will be used to solve the problem.

To make good use of the foursquare location data, the labs done for New York and Toronto segmented and clustered neighborhoods gave me good examples for new ideas.

From the results on New York and Toronto, the lab especially tells 10 most common venues in each district. After transforming the data to Json files, it is easy to use Pandas to transform them into DataFrame. Then select the required data columns to appear and get the information we want. Geopy library is used to get the latitude and longitude values. With the locations the maps can be created and we will have a direct understanding of the neighborhoods and the purposed areas. The Folium library is also used to show maps on different requirements.

### Get Data from Wiki

In [2]:
## import libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [3]:
quote_page = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
## use bs4 scraping content from url
res = requests.get(quote_page)
soup = BeautifulSoup(res.content,'lxml')
# get tables
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]

# get a record from table
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Clean Data

In [4]:
## drop column which Borough is not assigned
df_bor = df[~df.Borough.isin(['Not assigned'])]
df_bor.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [5]:
## combine same Postcode into one column
df_combine = df_bor.groupby(['Postcode','Borough'])['Neighbourhood'].apply(','.join).reset_index()
df_combine.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [6]:
## assign same value of Borough to Neighbourhood which is not assigned
df_combine.loc[df_combine['Neighbourhood']=="Not assigned",'Neighbourhood'] = df_combine.loc[df_combine['Neighbourhood']=="Not assigned",'Borough']
df_combine.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


## Conclusion <a name="conclusion"></a>

In [7]:
## show shape of the data
df_combine.shape

(103, 3)

### assign geospatial data to the dataframe

In [8]:
## read geospatial data from csv
df_geo = pd.read_csv('Geospatial_Coordinates.csv')
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
## assign lat,long to postcode dataframe
df_post = pd.merge(df_combine,df_geo,how='left',left_on='Postcode',right_on='Postal Code')
df_post.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [10]:
df_post.drop('Postal Code',axis=1,inplace=True)

In [11]:
df_post.shape

(103, 5)

In [12]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [13]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


### Create map

In [14]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_post['Latitude'], df_post['Longitude'], df_post['Borough'], df_post['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Results and Discussion <a name="results"></a>

As we can see in the map, here is 103 neighborhoods locate at Toronto. When we zoom in the map, we will see there is 6 neighborhoods between Queen Street and Union Street. This area has the most number of neighborhoods, so the population of this area should be Top 5 in Toronto. As I said in the introduction, we want to find a area which has the most populations in Toronto. There is the best place to open a restaurant. 