# The Battle of Neighborhoods: Introduction/Business Problem

## Introduction

The main objective of this project is to help people in exploring better facilities around their neighborhood. It will help people in making a smart and efficient decision on choosing excellent neighborhoods out of numbers of other neighborhoods in Scarborough, Toronto.

Many people are migrating to many different states of Canada and need a lot of research for sound housing prices and reputed schools for their children. This project is for those people who are looking for better neighborhoods. For ease of accessing to Cafe, School, Supermarket, medical shops, grocery shops, mall, theatre, hospital, like-minded people, etc.

This project intends to create an analysis of features for people who are migrating to Scarborough in search of the best neighborhood as a relative analysis between neighborhoods.

It will help people to get the awareness of the area and neighborhood before moving to a new city, state, country or place for their work or to start a new fresh life.


## Problem Which Tried to Solve

The major goal of this project is to recommend a better neighborhood in a new city for the person who is moving there. Social presence in society in terms of like-minded people. Connectivity to the airport, bus stand, city center, markets and other daily needs things nearby. More precisely, it aim to answer-

1. What are the amenities are available if a person buys a house in a particular neighbourhood?
2. If a person has preference of some amenities (shopping mall, supermarket, etc.) which are the neighbourhoods where such amenities can be found?

## The Location

Scarborough is a favorite destination for new immigrants in Canada to reside in. As a result, it is one of the most distinct and multicultural areas in the Greater Toronto Area, being home to various religious groups and places of worship. Although immigration has become a hot topic over the past few years with more governments seeking more constraints on immigrants and refugees, the general trend of immigration into Canada has been one of the rises.

## Foursquare API

This project would use Four-square API as its prime data collecting source. It contains data of millions of places, notably their places API which provides the capacity to perform location search, location sharing and details about a business.

## Workflow

Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to HTTP request restrictions, the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 500.

## Libraries used

1. Pandas: To create and manipulate data frames.

1. Folium: Python visualization library would be used to visualize the neighborhood cluster distribution of using an interactive leaflet map.
1. Scikit Learn: To import k-means clustering.
1. JSON: Library to handle JSON files.
1. XML: To separate data from presentation and XML stores data in plain text format.
1. Geocoder: To retrieve Location Data.
1. Beautiful Soup and Requests: To scrap and library to handle HTTP requests.
1. Matplotlib: To Python Plotting Module.

__P.S. - I've tried to make this notebook self sufficient (complete with markdowns and comments) so that ppt/report is barely required, so please grade the 2nd part of assignment based on this notebook itself.__

# Week 5 - Final Capstone Project: The Battle of Neighborhoods

## 1. Importing required libraries

In [1]:
# importing required libraries
import pandas as pd
import requests
import numpy as np
import geocoder
import folium
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import xml
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
from bs4 import BeautifulSoup


## 2. Data extraction and cleaning
Scraping List of Postal Codes of Given Wikipedia [Page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M).

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html = requests.get(url).content
df_list = pd.read_html(html)
df_list
df = df_list[0]
display(df)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [3]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
df = df.query('Borough != "Not assigned"')
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [4]:
df = df.reset_index()[["Postal Code", "Borough", "Neighbourhood"]]
display(df.head())
print(df.shape)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


(103, 3)


In [5]:
# fetch lat long corresponding to postal code
def get_latlong(postal_code):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lati_long_coords = g.latlng
    return lati_long_coords[0], lati_long_coords[1]
    
get_latlong('M3A')

(43.75245000000007, -79.32990999999998)

In [6]:
df[["lat", "long"]] = pd.DataFrame(df["Postal Code"].apply(get_latlong).to_list(), columns=['lat', 'long'])
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,lat,long
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


In [7]:
address = 'Scarborough,Toronto'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude_x = location.latitude
longitude_y = location.longitude
print('The Geograpical Co-ordinate of Scarborough,Toronto are {}, {}.'.format(latitude_x, longitude_y))

The Geograpical Co-ordinate of Scarborough,Toronto are 43.7729744, -79.2576479.


## 3. Map of Scarborough
Exploring Scarborough and identifying all its neighbourhood.

In [8]:
map_Scarborough = folium.Map(location=[latitude_x, longitude_y], zoom_start=10)

for lat, lng, nei in zip(df['lat'], df['long'], df['Neighbourhood']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Scarborough)  
    
map_Scarborough

## 4. Explore Scarborough Neighbourhood

In [9]:
from secrets import CLIENT_ID, CLIENT_SECRET

VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value


radius = 700 
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_x, 
    longitude_y, 
    radius, 
    LIMIT)
results = requests.get(url).json()


In [10]:
# from pprint import pprint as pp
# pp(results)

In [11]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.columns

Index(['referralId', 'reasons.count', 'reasons.items', 'venue.id',
       'venue.name', 'venue.location.address', 'venue.location.crossStreet',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.labeledLatLngs', 'venue.location.distance',
       'venue.location.postalCode', 'venue.location.cc', 'venue.location.city',
       'venue.location.state', 'venue.location.country',
       'venue.location.formattedAddress', 'venue.categories',
       'venue.photos.count', 'venue.photos.groups',
       'venue.location.neighborhood', 'venue.venuePage.id'],
      dtype='object')

In [12]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## 5. Get nearby venues/locations and their categories

In [13]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Disney Store,"[{'id': '4bf58dd8d48988d1f3941735', 'name': 'T...",43.775537,-79.256833
1,SEPHORA,"[{'id': '4bf58dd8d48988d10c951735', 'name': 'C...",43.775017,-79.258109
2,Coliseum Scarborough Cinemas,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",43.775995,-79.255649
3,Tommy Hilfiger,"[{'id': '4bf58dd8d48988d103951735', 'name': 'C...",43.776015,-79.257369
4,Shoppers Drug Mart,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",43.773305,-79.251662


In [14]:
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Disney Store,Toy / Game Store,43.775537,-79.256833
1,SEPHORA,Cosmetics Shop,43.775017,-79.258109
2,Coliseum Scarborough Cinemas,Movie Theater,43.775995,-79.255649
3,Tommy Hilfiger,Clothing Store,43.776015,-79.257369
4,Shoppers Drug Mart,Pharmacy,43.773305,-79.251662


In [15]:
# Top 10 Categories
a = pd.Series(nearby_venues.categories)
a.value_counts()[:10]

Clothing Store            8
Coffee Shop               5
Restaurant                5
Sandwich Place            2
Pharmacy                  2
Gas Station               2
Department Store          2
Furniture / Home Store    2
Intersection              2
Greek Restaurant          1
Name: categories, dtype: int64

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making GET request
        try:
            venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        except Exception as e:
            print(name, lat, lng)
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
# Nearby Venues
Scarborough_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['lat'],
                                   longitudes=df['long']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [18]:
print('There are {} Uniques Categories.'.format(len(Scarborough_venues['Venue Category'].unique())))
Scarborough_venues.groupby('Neighborhood').count().head()

There are 307 Uniques Categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,20,20,20,20,20,20
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Wilson Heights, Downsview North",13,13,13,13,13,13
Bayview Village,6,6,6,6,6,6
"Bedford Park, Lawrence Manor East",24,24,24,24,24,24


In [19]:
from pprint import pprint as pp
pp(Scarborough_venues['Venue Category'].unique())

array(['Park', 'Pet Store', 'Food & Drink Shop', 'Burger Joint',
       'Middle Eastern Restaurant', 'Portuguese Restaurant',
       'Coffee Shop', 'Intersection', 'Pizza Place', 'Bakery',
       'Breakfast Spot', 'Yoga Studio', 'Spa', 'Restaurant',
       'Italian Restaurant', 'Thai Restaurant', 'Event Space',
       'Liquor Store', 'Farmers Market', 'Distribution Center',
       'Historic Site', 'Pub', 'Dessert Shop', 'Chocolate Shop',
       'Gym / Fitness Center', 'Theater', 'Pool', 'Café',
       'Performing Arts Venue', 'Mediterranean Restaurant',
       'Tech Startup', 'Food Truck', 'French Restaurant',
       'German Restaurant', 'Sandwich Place', 'Animal Shelter',
       'Furniture / Home Store', 'Gastropub', 'Shoe Store', 'Karaoke Bar',
       'Art Gallery', 'Gym Pool', 'Brewery', 'Cosmetics Shop', 'Diner',
       'Asian Restaurant', 'Electronics Store', 'Sushi Restaurant',
       'Indian Restaurant', 'Beer Store', 'Skating Rink', 'Bank',
       'Pharmacy', 'Fast Food Restaur

## 5. Getting different venue counts 

In [20]:
# one hot encoding
Scarborough_onehot = pd.get_dummies(Scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Scarborough_onehot['Neighborhood'] = Scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Scarborough_onehot.columns[-1]] + list(Scarborough_onehot.columns[:-1])
Scarborough_onehot = Scarborough_onehot[fixed_columns]
Scarborough_grouped = Scarborough_onehot.groupby('Neighborhood').mean().reset_index()
Scarborough_onehot.head(5)

Unnamed: 0,Zoo Exhibit,Accessories Store,Adult Boutique,African Restaurant,Airport,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
num_top_venues = 5
for hood in Scarborough_grouped['Neighborhood']:
    print("---- "+hood+" ----")
    temp =Scarborough_grouped[Scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Agincourt ----
                venue  freq
0       Shopping Mall  0.10
1      Breakfast Spot  0.05
2  Chinese Restaurant  0.05
3        Skating Rink  0.05
4                Park  0.05


---- Alderwood, Long Branch ----
         venue  freq
0          Pub  0.14
1  Pizza Place  0.14
2          Gym  0.14
3  Gas Station  0.14
4     Pharmacy  0.14


---- Bathurst Manor, Wilson Heights, Downsview North ----
                      venue  freq
0               Coffee Shop  0.15
1  Mediterranean Restaurant  0.08
2               Men's Store  0.08
3                      Park  0.08
4          Sushi Restaurant  0.08


---- Bayview Village ----
              venue  freq
0              Park  0.17
1  Asian Restaurant  0.17
2       Flower Shop  0.17
3             Trail  0.17
4           Dog Run  0.17


---- Bedford Park, Lawrence Manor East ----
                venue  freq
0      Sandwich Place  0.08
1         Coffee Shop  0.08
2           Pet Store  0.08
3  Italian Restaurant  0.08
4     Thai Restau

                  venue  freq
0  Fast Food Restaurant  0.09
1            Restaurant  0.09
2                Bakery  0.06
3               Brewery  0.06
4           Coffee Shop  0.06


---- Islington Avenue, Humber Valley Village ----
                 venue  freq
0             Pharmacy  0.18
1    Convenience Store  0.09
2  Japanese Restaurant  0.09
3                 Café  0.09
4         Skating Rink  0.09


---- Kennedy Park, Ionview, East Birchmount Park ----
                venue  freq
0      Discount Store  0.33
1  Chinese Restaurant  0.17
2    Department Store  0.17
3         Bus Station  0.17
4         Coffee Shop  0.17


---- Kensington Market, Chinatown, Grange Park ----
                           venue  freq
0                           Café  0.08
1                    Coffee Shop  0.06
2  Vegetarian / Vegan Restaurant  0.06
3                            Bar  0.04
4                    Yoga Studio  0.03


---- Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens ---

                  venue  freq
0           Coffee Shop  0.07
1           Pizza Place  0.06
2   Japanese Restaurant  0.05
3        Sandwich Place  0.05
4  Fast Food Restaurant  0.05


---- Willowdale, Willowdale West ----
           venue  freq
0           Park  0.14
1    Coffee Shop  0.14
2  Grocery Store  0.14
3     Baby Store  0.14
4       Pharmacy  0.14


---- Woburn ----
                  venue  freq
0    Chinese Restaurant  0.25
1  Fast Food Restaurant  0.25
2           Coffee Shop  0.25
3                  Park  0.25
4                Museum  0.00


---- Woodbine Heights ----
            venue  freq
0   Grocery Store  0.08
1        Bus Line  0.08
2   Metro Station  0.08
3            Café  0.04
4  Breakfast Spot  0.04


---- York Mills West ----
            venue  freq
0      Restaurant  0.19
1     Coffee Shop  0.14
2             Gym  0.10
3  Sandwich Place  0.05
4   Metro Station  0.05


---- York Mills, Silver Hills ----
             venue  freq
0     Concert Hall   0.5
1          

## 6. Most Common Venues near neighbourhood

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Scarborough_grouped['Neighborhood']

for ind in np.arange(Scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Shopping Mall,Pizza Place,Dim Sum Restaurant,Bubble Tea Shop,Breakfast Spot,Skating Rink,Supermarket,Sushi Restaurant,Latin American Restaurant,Sandwich Place
1,"Alderwood, Long Branch",Gas Station,Gym,Print Shop,Pub,Pizza Place,Coffee Shop,Pharmacy,Doner Restaurant,Donut Shop,Dry Cleaner
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Men's Store,Mediterranean Restaurant,Sushi Restaurant,Fried Chicken Joint,Sandwich Place,Park,Intersection,Restaurant,Middle Eastern Restaurant
3,Bayview Village,Dog Run,Flower Shop,Park,Trail,Gas Station,Asian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Pet Store,Italian Restaurant,Sandwich Place,Pharmacy,Café,Sushi Restaurant,Juice Bar,Thai Restaurant,Restaurant


## Q1. Checking Neighbourhood preference based on nearby amenities

In this section, lets try to answer Q1 - What are the amenities which are available if a person decides to buy a house in a particular neighbourhood?

Now, let's assume someone is looking to buy property in Agincourt, they can view which are the most common amenities that are offered around that area.

In [24]:
# IMPORTANT CELL 1
neighborhoods_venues_sorted.query("Neighborhood == 'Agincourt'")

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Shopping Mall,Pizza Place,Dim Sum Restaurant,Bubble Tea Shop,Breakfast Spot,Skating Rink,Supermarket,Sushi Restaurant,Latin American Restaurant,Sandwich Place


In [25]:
agin_neighbourhood = Scarborough_venues.query("Neighborhood == 'Agincourt'").sort_values(by="Venue Category")
agin_neighbourhood["Venue Category"] = agin_neighbourhood["Venue Category"].astype('category')
agin_neighbourhood["cat_code"] = agin_neighbourhood["Venue Category"].cat.codes
agin_neighbourhood

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,cat_code
2557,Agincourt,43.79452,-79.26708,Commander Badminton,43.793546,-79.269835,Badminton Court,0
2548,Agincourt,43.79452,-79.26708,Aromaz Cake and Pastry 龍騰閣,43.797714,-79.27087,Bakery,1
2549,Agincourt,43.79452,-79.26708,TD Canada Trust,43.78848,-79.269408,Bank,2
2545,Agincourt,43.79452,-79.26708,Panagio's Breakfast & Lunch,43.79237,-79.260203,Breakfast Spot,3
2558,Agincourt,43.79452,-79.26708,Real Fruit Bubble Tea 真果茶坊,43.797208,-79.271523,Bubble Tea Shop,4
2544,Agincourt,43.79452,-79.26708,Grandeur Palace 華丽宮 (Grandeur Palace 華麗宮),43.797885,-79.270585,Chinese Restaurant,5
2559,Agincourt,43.79452,-79.26708,Tim Hortons,43.798307,-79.272655,Coffee Shop,6
2562,Agincourt,43.79452,-79.26708,Fountain Cuisine 奉天一品,43.78836,-79.26821,Dim Sum Restaurant,7
2555,Agincourt,43.79452,-79.26708,El Pulgarcito,43.792648,-79.259208,Latin American Restaurant,8
2560,Agincourt,43.79452,-79.26708,Agincourt Park,43.791383,-79.272092,Park,9


In [26]:

map_agin = folium.Map(location=[agin_neighbourhood["Neighborhood Latitude"].unique()[0], 
                                agin_neighbourhood["Neighborhood Longitude"].unique()[0]], zoom_start=15)

uniq_venues = len(agin_neighbourhood["Venue Category"].unique())
x = np.arange(uniq_venues)
colors_array = cm.rainbow(np.linspace(0, 1, uniq_venues))
rainbow = [colors.rgb2hex(i) for i in colors_array]


for lat, long, cat, code in zip(agin_neighbourhood['Venue Latitude'], 
                                agin_neighbourhood['Venue Longitude'], 
                                agin_neighbourhood['Venue Category'], 
                                agin_neighbourhood['cat_code']):
    
    label = '{}'.format(cat)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=rainbow[code-1],
        fill=True,
        fill_color=rainbow[code-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_agin)  
    

map_agin



## Checking out neighbourhood based on amenitites preference
In this section, lets try to answer Q2 - If a person has preference of some amenities (shopping mall, supermarket, etc.) which are the neighbourhoods where such amenities can be found?

Now, lets assume a person is looking to buy a house in a neighbourhood where certain amenities are nearby, which are the neighbourhood that satisfies their condition?

In [27]:
# IMPORTANT CELL 2
requirements = ['Pub', 'Park']

In [28]:
neighbourhoods = []


from pprint import pprint as pp
for idx, nbrhd, *amenities in neighborhoods_venues_sorted.itertuples():
    if set(requirements).issubset(amenities):
        neighbourhoods.append(nbrhd)
print(neighbourhoods)

['Regent Park, Harbourfront', 'Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park']


In [29]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted["Neighborhood"].isin(neighbourhoods)]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,"Regent Park, Harbourfront",Coffee Shop,Restaurant,Park,Bakery,Italian Restaurant,Theater,Café,Pub,Thai Restaurant,Breakfast Spot
78,"Summerhill West, Rathnelly, South Hill, Forest...",Skating Rink,Coffee Shop,Tennis Court,Liquor Store,Restaurant,Fried Chicken Joint,Light Rail Station,Bagel Shop,Park,Pub


In [30]:
#  plot neighbourhoods on map
df2 = df[df.Neighbourhood.isin(neighbourhoods)]
display(df2)

map_Scarborough = folium.Map(location=[latitude_x, longitude_y], zoom_start=11)

for nbrhd, lat, long in zip(df2.Neighbourhood, df2.lat, df2.long):
    label = '{}'.format(nbrhd)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=15,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Scarborough)  
    
map_Scarborough   


Unnamed: 0,Postal Code,Borough,Neighbourhood,lat,long
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
86,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.68568,-79.40237


# Conclusion
In this project I tried to answer where should a person buy a house based on their preference of nearby amenities and what are amenities available nearby if they choose to buy in a particular neighbourhood.

You are free to play around by changing neighbourhood in cell labelled as - # IMPORTANT CELL 1 for question 1 and for question 2 change requirements in cell labelled as - # IMPORTANT CELL 2

