# Capstone Project - The Battle of Neighborhoods (Week 2)
### Author - Ruoyu Yan

## 1. A description of the problem and a discussion of the background.

### 1.1 Problem Statement
#### Which neighborhood in Toronto should be picked for one of my clients for purchasing a new house or condo?

### 1.2 Background Discussion 
#### Choosing a location when looking for a home is very important! After all, you can always update or fix your house, but you can't easily change its location and the vibe of the community! In this project, I would like to create a hypothetical business scenario with an aim of finding a compatible location for one of my clients.

### 1.3 Who is my audience and why would they care?

#### Although many zones already have their own labels, such as 'China Town', 'Little Italy', however, the overall perceived quality of a community is highly subjective to different clients, because they will have their own needs and expectation. Also, in this project, the 'hypothetical client' has never lived in Toronto before, so it will be very convenient for him if I could provide a smart 'recommendation system' for him based on his own expectation and rating of various aspects, the data retrieved from FourSquare as well as a recommendation engine designed and customized for him. 

## 2. A description of the data and how it will be used to solve the problem.

#### In this project, I will use raw data provided by wikipedia, geographical coordinates,data retrieved from FourSquare and a dataset provided by the customer as his own expectation/rating.



#### 2.1 Data from Wikipedia contains a list of postal codes of Canada. It will be retreived by scraping a table from the website. This data is used to create a geographical segmentation of Toronto based on postal code, and it will be linked with the actual coordinates later. Note that the methodology used in this subsection overlaps with the previous project, so feel free to skip through it if you are already familiar with it.

In [1]:
!pip install bs4
!pip install lxml

Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/cb/a1/c698cf319e9cfed6b17376281bd0efc6bfc8465698f54170ef60a485ab5d/beautifulsoup4-4.8.2-py3-none-any.whl (106kB)
[K     |████████████████████████████████| 112kB 11.9MB/s eta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jupyterlab/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.8.2 bs

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib.request
import folium

#### Obtain wikipedia table and perform data wrangling.

In [3]:
#Obtain the Wikipedia article as a local copy.
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
request = urllib.request.urlopen(url)
wiki_article = request.read().decode()

with open('List_of_postal_codes_of_Canada:_M.html', 'w') as fo:
    fo.write(wiki_article)
    

# Load article, use beautiful soup to get the tables.
wiki_article = open('List_of_postal_codes_of_Canada:_M.html').read()
soup = BeautifulSoup(wiki_article, 'html.parser')
tables = soup.find_all('table', class_='sortable')

# Search through all the tables, identify the table with the header we want.
for table in tables:
    all_tables = table.find_all('th')
    header = [th.text.strip() for th in all_tables]
    if header[:5] == ['Postcode', 'Borough', 'Neighborhood']:
        break

# Extract the columns we want and write to a semicolon-delimited text file.
with open('List_of_postal_codes_of_Canada:_M.txt', 'w') as fo:
    for tr in table.find_all('tr'):
        tds = tr.find_all('td')
        if not tds:
            continue
        Postcode, Borough, Neighborhood = [td.text.strip() for td in tds[:4]]
        
        print('; '.join([Postcode, Borough, Neighborhood]), file=fo)

# Load to pandas dataframe        
df = pd.read_table('List_of_postal_codes_of_Canada:_M.txt', delimiter = ';', header = None)
df.columns = ['PostCode', 'Borough', 'Neighborhood']

# Ignore not assigned borough
df1 = df[df['Borough'] != ' Not assigned']
df1.reset_index(inplace = True, drop = True)

# Assign borough value to neighborhood if the neighborhood is not assigned.
position = 0
neigh_list = []
for i,j in zip(df1['Borough'], df1['Neighborhood']):
    if j == ' Not assigned':
        neigh_list.append(i)
    else:
        neigh_list.append(j)

post_list = df1['PostCode'].tolist()
br_list = df1['Borough'].tolist()

df1 = pd.DataFrame([post_list, br_list, neigh_list]).T
df1.columns = ['PostCode', 'Borough', 'Neighborhood']
df1.head()

# Group dataframe by PostCode and combine neighborhood values.
borough_list =[] 
Neighborhood_list = []

for item in df1['Borough']:
    item_new = str(item)[1:] + ':'
    borough_list.append(item_new)

for item in df1['Neighborhood']:
    item_new = str(item)[1:] + ':'
    Neighborhood_list.append(item_new)
    
PostCode_list = df1['PostCode'].tolist()

df2 = pd.DataFrame([PostCode_list, borough_list, Neighborhood_list]).T
df2.columns = ['PostCode', 'Borough', 'Neighborhood']

new_df = df2.groupby('PostCode').sum()

borough_list=[]
Neighborhood_list = []
PostCode_list = new_df.index.tolist()

for item in new_df['Borough']:
    item_new = np.unique(np.array(str(item).split(':')))[1]
    borough_list.append(item_new)

for item in new_df['Neighborhood']:
    item_new = str(np.array(str(item).split(':'))[:-1].tolist())[1:][:-1].replace("'","")
    Neighborhood_list.append(item_new)

df3 = pd.DataFrame([PostCode_list, borough_list, Neighborhood_list]).T
df3.columns = ['PostCode', 'Borough', 'Neighborhood']
df3    

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


#### 2.2 Geographical coordinates. This dataset was provided by the previous project. It contains the coordinates for various postal code. The aim of this dataset is to link postal code with coordinates. Note that the methodology used in this subsection overlaps with the previous project, so feel free to skip through it if you are already familiar with it.

In [4]:
# Load the csv file from source.
df_co = pd.read_csv('https://cocl.us/Geospatial_data')

lat_list = []
long_list = []

for PostCode_target in df3['PostCode']:
    for PostCode, Latitude, Longitude in zip (df_co['Postal Code'],
                                                 df_co['Latitude'],
                                               df_co['Longitude']):
        if PostCode_target == PostCode:
            lat_list.append(Latitude)
            long_list.append(Longitude)

# Add coordinates information to the pandas dataframe.
Final_df = pd.DataFrame([df3['PostCode'].tolist(), 
                         df3['Borough'].tolist(), 
                         df3['Neighborhood'].tolist(),
                         lat_list,
                         long_list]).T
Final_df.columns = ['PostCode','Borough','Neighborhood', 'Latitude', 'Longitude']
Final_df

Unnamed: 0,PostCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.8067,-79.1944
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.7845,-79.1605
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7636,-79.1887
3,M1G,Scarborough,Woburn,43.771,-79.2169
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395
...,...,...,...,...,...
98,M9N,York,Weston,43.7069,-79.5182
99,M9P,Etobicoke,Westmount,43.6963,-79.5322
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.6889,-79.5547
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.7394,-79.5884


#### 2.3 Data retrieved from FourSquare will be used to evaluate each zone of interest. For each postal code area, obtain the frequency of occurrence of interesting venues by further data wrangling and preparation. Note that the methodology used in this subsection overlaps with the previous project, so feel free to skip through it if you are already familiar with it.

In [5]:
import requests # library to handle requests

import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [8]:
# Extract latitude and longitude of downtown Toronto
df_to = Final_df
lat = Final_df[Final_df['Borough']=='Downtown Toronto'].iloc[0].Latitude
long = Final_df[Final_df['Borough']=='Downtown Toronto'].iloc[0].Longitude

print('The geograpical coordinate of downtown Toronto City are {}, {}.'.format(lat, long))

# create map of Toronto using latitude and longitude values.
map_to = folium.Map(location=[lat, long], zoom_start=10)

# add markers to map
for lat, lng, borough, postcode in zip(df_to['Latitude'], df_to['Longitude'], df_to['Borough'], 
                                           df_to['PostCode']):
    label = '{}, {}'.format(postcode, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_to)  
    
map_to

The geograpical coordinate of downtown Toronto City are 43.6795626, -79.37752940000001.


### Picture will look like this
![alt text](https://user-images.githubusercontent.com/59368572/72224432-84df6c00-3572-11ea-8c15-174ba128be23.png)


In [9]:
from pandas.io.json import json_normalize

# Enter the following information. The actual info was removed before sharing since it is sensitive. please see output.
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = ''
VERSION = '20200112' # Foursquare API version

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 300 # define radius

neighborhood_latitude = df_to[df_to['PostCode']=='M5C'].Latitude.iloc[0]
neighborhood_longitude = df_to[df_to['PostCode']=='M5C'].Longitude.iloc[0]

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

# Retrieve information from FourSquare
# results = requests.get(url).json()

# This function is from applied data science course materials, which will be utilized by this assignment.
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)       
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    
    nearby_venues.columns = ['PostCode', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    return(nearby_venues)

to_venues = getNearbyVenues(names=df_to['PostCode'],
                                   latitudes=df_to['Latitude'],
                                   longitudes=df_to['Longitude'])

In [12]:
# one hot encoding
to_onehot = pd.get_dummies(to_venues[['Venue Category']], prefix="", prefix_sep="")

# add postcal code column back to dataframe
to_onehot['PostCode'] = to_venues['PostCode'] 

# move postal code column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

to_onehot.head()

Unnamed: 0,PostCode,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
#Group rows by PostCode and take the mean of the frequency of occurrence of each category
to_grouped = to_onehot.groupby('PostCode').mean().reset_index()
to_grouped

Unnamed: 0,PostCode,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,M9N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
95,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### 2.4 Dataset provided by the customer as his own expectation/rating will be used to create a customized recommendation engine. Detailed data processing will be provided in the next project. Note that the metrics that are important to this customer might have broad and vague definition, so it will be carefully mapped to corresponding categories retrieved from FourSquare in the next project.

In [14]:
columns = ['Customer_Name', 'Cafe', 'Health Care', 'School', 'Gym', 'Full Score']
Rating = ['John', 7, 10, 9, 6, 10]
df_rating = pd.DataFrame([Rating])
df_rating.columns = columns
df_rating.set_index('Customer_Name')
df_rating

Unnamed: 0,Customer_Name,Cafe,Health Care,School,Gym,Full Score
0,John,7,10,9,6,10


####  Interpret customer’s expectation with venues more precisely. For example, café could be café and coffee shop, so both of them should be taken into consideration. Or, school could be college, high school, university, etc. These alternative key words should be identified and considered.

In [119]:
# Explore other possible key words that also mean the same metric that the customer cares about.
key_words = to_grouped.columns.tolist()
cafe_alt = []
health_alt=[]
school_alt = []
gym_alt = []

for i in key_words:
    if 'Coffee' in str(i):
        cafe_alt.append(i)
    elif 'Cafe' in str(i):
        cafe_alt.append(i)
    elif 'Café' in str(i):
        cafe_alt.append(i)


for i in key_words:
    if 'Hospital' in str(i):
        health_alt.append(i)
    elif 'Clinic' in str(i):
        health_alt.append(i)
    elif 'Pharmacy' in str(i):
        health_alt.append(i)

for i in key_words:
    if 'University' in str(i):
        school_alt.append(i)
    elif 'College' in str(i):
        school_alt.append(i)
    elif 'school' in str(i):
        school_alt.append(i)
    elif 'college' in str(i):
        school_alt.append(i)

for i in key_words:
    if 'Gym' in str(i):
        gym_alt.append(i)
    if 'Fitness' in str(i):
        gym_alt.append(i)


health_r = [10]* len(health_alt)
cafe_r = [7]* len(cafe_alt)
school_r = [9]*len(school_alt)
gym_r = [6] * len(gym_alt)

new_metric = (cafe_alt + health_alt + school_alt + gym_alt)

new_r = (cafe_r + health_r + school_r + gym_r)

extra_rating = pd.DataFrame([new_metric, new_r])
extra_rating


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Gym,College Rec Center,College Stadium,Climbing Gym,College Gym,Gym,Gym / Fitness Center,Gym / Fitness Center,Gym Pool
1,7,7,7,7,10,10,9,9,9,9,9,6,6,6,6,6,6


#### Adjust a few errors with redundancy

In [120]:
extra_rating.columns = extra_rating.iloc[0].tolist()
extra_rating.drop(extra_rating.index[0], inplace = True)
new_rating = extra_rating.drop(extra_rating.columns[[8, 15]], axis = 1) 
new_rating['Gym / Fitness Center']=6
new_rating

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center
1,7,7,7,7,10,10,9,9,9,9,6,6,6,6


#### convert it to a relative rating score, which will be used as a scaling factor to weight the venues later

In [128]:
# As there are four main categories for the customers, we divide each score by 40 to get a relative score.
# This is because if we divide by the sum of all scores, we could potentially over dilute
# the importance of a metric by the number of subcategories present in that category.
rating_r = new_rating/40
rating_r

Unnamed: 0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center
1,0.175,0.175,0.175,0.175,0.25,0.25,0.225,0.225,0.225,0.225,0.15,0.15,0.15,0.15


#### Filter the dataset based on customers’ expectation to reduce computational complexity.

In [134]:
filtered_columns = rating_r.columns.tolist()
filtered_columns.append('PostCode')
to_filtered = to_grouped [filtered_columns]
to_filtered

Unnamed: 0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center,PostCode
0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M1B
1,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M1C
2,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M1E
3,0.0,0.0,0.666667,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M1G
4,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M1H
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M9N
95,0.0,0.0,0.142857,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M9P
96,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M9R
97,0.0,0.0,0.000000,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,M9V


#### Weighting the venues based on the actual frequency of occurrence of each venue as well as the customers’ personal preference (the rating dataset). This will be performed by doing linear algebra operations of the venue matrix.

In [167]:
weighted_to = (to_filtered.iloc[:,0:14].values) * (rating_r.values)
weighted_df = pd.DataFrame(weighted_to)
weighted_df.columns = to_filtered.columns.tolist()[:-1]
weighted_df['PostCode'] = to_filtered['PostCode']
weighted_df

Unnamed: 0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center,PostCode
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M1B
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M1C
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M1E
3,0,0,0.116667,0,0,0,0,0,0,0,0,0,0,0,M1G
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M1H
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M9N
95,0,0,0.025,0,0,0,0,0,0,0,0,0,0,0,M9P
96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,M9R
97,0,0,0,0,0,0.0277778,0,0,0,0,0,0,0,0,M9V


#### Generate recommended candidate venues for the customer

In [168]:
weighted_df.set_index('PostCode', inplace = True)
weighted_df

Unnamed: 0_level_0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center
PostCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
M1B,0,0,0,0,0,0,0,0,0,0,0,0,0,0
M1C,0,0,0,0,0,0,0,0,0,0,0,0,0,0
M1E,0,0,0,0,0,0,0,0,0,0,0,0,0,0
M1G,0,0,0.116667,0,0,0,0,0,0,0,0,0,0,0
M1H,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
M9N,0,0,0,0,0,0,0,0,0,0,0,0,0,0
M9P,0,0,0.025,0,0,0,0,0,0,0,0,0,0,0
M9R,0,0,0,0,0,0,0,0,0,0,0,0,0,0
M9V,0,0,0,0,0,0.0277778,0,0,0,0,0,0,0,0


In [169]:
# Calculate the overall score for each postal code area using sum.
weighted_df['Score'] = weighted_df.sum(axis=1)
weighted_df

Unnamed: 0_level_0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center,Score
PostCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
M1B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
M1C,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
M1E,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
M1G,0,0,0.116667,0,0,0,0,0,0,0,0,0,0,0,0.116667
M1H,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
M9N,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
M9P,0,0,0.025,0,0,0,0,0,0,0,0,0,0,0,0.025000
M9R,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000000
M9V,0,0,0,0,0,0.0277778,0,0,0,0,0,0,0,0,0.027778


In [179]:
# Sort the postcal code areas by score.
weighted_df_sorted = weighted_df.sort_values(by=['Score'], ascending = False)
weighted_df_sorted.head(10)

Unnamed: 0_level_0,Cafeteria,Café,Coffee Shop,Gaming Cafe,Hospital,Pharmacy,College Arts Building,College Auditorium,College Rec Center,College Stadium,Climbing Gym,Gym,Gym Pool,Gym / Fitness Center,Score
PostCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
M2L,0.175,0.0,0.0,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0.175
M1G,0.0,0.0,0.116667,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0.116667
M1N,0.0,0.04375,0.0,0,0,0.0,0,0.0,0,0.05625,0,0.0,0,0.0,0.1
M9C,0.0,0.025,0.025,0,0,0.0357143,0,0.0,0,0.0,0,0.0,0,0.0,0.085714
M3B,0.0,0.04375,0.0,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0375,0.08125
M8W,0.0,0.0,0.021875,0,0,0.03125,0,0.0,0,0.0,0,0.01875,0,0.0,0.071875
M7A,0.0,0.00460526,0.0506579,0,0,0.0,0,0.00592105,0,0.0,0,0.00789474,0,0.0,0.069079
M2R,0.0,0.0,0.025,0,0,0.0357143,0,0.0,0,0.0,0,0.0,0,0.0,0.060714
M1V,0.0,0.0,0.0583333,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0.058333
M4J,0.0,0.0,0.0583333,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0.058333


#### Visualize the top 10 areas on the map, marker size represent score. Higher score has bigger size on the map to guide the eyes.

In [213]:
# Extract latitude and longitude of downtown Toronto

lat_list =Final_df.loc[Final_df['PostCode'].isin(weighted_df_sorted.index.tolist()[0:10])].Latitude
long_list = lat =Final_df.loc[Final_df['PostCode'].isin(weighted_df_sorted.index.tolist()[0:10])].Longitude

lat = lat_list.iloc[0]
long = long_list.iloc[0]

In [214]:
# Get corresponding info for visualization
selected_df = Final_df.loc[Final_df['PostCode'].isin(weighted_df_sorted.index.tolist()[0:10])]
selected_df
Score = []
for i in selected_df['PostCode']:
    Score.append(weighted_df_sorted.loc[i]['Score'])

selected_df['Score'] = Score
selected_df['Score']
radius_list = (selected_df['Score'].values)*80

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [215]:
from folium.features import DivIcon

# create map of Toronto using latitude and longitude values.
map_to = folium.Map(location=[lat, long], zoom_start=10)
selected_df = Final_df.loc[Final_df['PostCode'].isin(weighted_df_sorted.index.tolist()[0:10])]

# add markers to map
for lat, lng, borough, postcode,radius in zip(lat_list,
                                              long_list, 
                                              selected_df['Borough'], 
                                              selected_df['PostCode'],
                                              radius_list):
    label = '{}, {}'.format(postcode, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=radius,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False,
    ).add_to(map_to)  
    
map_to

### Picture will look like this
![alt text](https://user-images.githubusercontent.com/59368572/72234365-8f2a5600-35c4-11ea-83ae-2fe27fd02c4a.png)