# The Battle of Neighborhoods (Week 1)

## Determine the best area to open a new restaurant in Toronto using location data 

I will be using this notebook to show how to determine the best area to open a new restaurant in Toronto using location data from Foursquare API

### **Target Audience** : New business owner(s) who want to open a restaurant in Toronto area

### **Introduction**

The purpose of this project is to explore the various neighborhoods in Toronto using location data to help Target Audience make an informated decision about the area in which they want to open a new restaurant business. This project will provide them data about competion, neighborhood, population etc.,

### **Business Problem:**

Toronto is the provincial capital of Ontario and the most populous city in Canada, with a population of 2,731,571 as of 2016. Current to 2016, the Toronto census metropolitan area (CMA), of which the majority is within the Greater Toronto Area (GTA), held a population of 5,928,040, making it Canada's most populous CMA. The city is the anchor of the Golden Horseshoe, an urban agglomeration of 9,245,438 people (as of 2016) surrounding the western end of Lake Ontario. Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world

Toronto is a prominent centre for music,theatre,motion picture production,and television production, and is home to the headquarters of Canada's major national broadcast networks and media outlets. Its varied cultural institutions, which include numerous museums and galleries, festivals and public events, entertainment districts, national historic sites, and sports activities, attract over 43 million tourists each year.

Toronto encompasses a geographical area formerly administered by many separate municipalities. These municipalities have each developed a distinct history and identity over the years, and their names remain in common use among Torontonians. Former municipalities include East York, Etobicoke, Forest Hill, Mimico, North York, Parkdale, Scarborough, Swansea, Weston and York. Throughout the city there exist hundreds of small neighbourhoods and some larger neighbourhoods covering a few square kilometres.

Diverse population and a vast geographical area means that there will be intense competition among businesses to make maximum profit by attracting a lot of customers. This project will help solve that problem using location data by enabling new business owner(s) with insights about various neighborhoods, competition, population etc.,




### **Data**

For this project, I will be using the following datasets 

1. Neighborhoods in Toronto - I'll be scrapping this data set from wikipedia -- https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. This contains the Neighborhoods & the corresponding Borough and Postalcode in Toronto


2. Restaurants in Toronto (using Foursquare API) - https://foursquare.com/explore?mode=url&ne=44.418088%2C-78.362732&q=Restaurant&sw=42.742978%2C-80.554504


In [6]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import time
import json
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
from sklearn.cluster import KMeans
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)



Solving environment: - 

In [7]:
# define the dataframe columns
column_names = ['Postal_Code','Borough', 'Neighborhood'] 
toronto_nebr_df = pd.DataFrame(columns=column_names)



In [None]:
# Scrape neighborhood data from wiki
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urlopen(wiki)
soup = BeautifulSoup(page, "lxml")
_tbl = soup.find('table', class_='wikitable sortable')


In [9]:
_postal_code=[]
_borough=[]
_neighborhood=[]
for _row in _tbl.findAll("tr"):
    cells = _row.findAll('td')
    if len(cells)==3: #Only extract table body not heading
        _postal_code.append(cells[0].find(text=True))
        _borough.append(cells[1].find(text=True))
        _neighborhood.append(cells[2].find(text=True))

        
#Adding Data to toronto_nebr_df DataFrame
toronto_nebr_df['Postal_Code']=_postal_code
toronto_nebr_df['Borough']=_borough
toronto_nebr_df['Neighborhood']=_neighborhood

toronto_nebr_df



Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [10]:
toronto_nebr_df.shape

(287, 3)

#### Data Cleaning
Drop rows if Borough is Not Assigned
Reset Index

In [11]:
toronto_nebr_df = toronto_nebr_df.drop(toronto_nebr_df[toronto_nebr_df['Borough'].str.contains("Not assigned")==True].index, axis=0, inplace=False)

toronto_nebr_df.index = pd.RangeIndex(len(toronto_nebr_df.index))
toronto_nebr_df



Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Not assigned
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [12]:
# Assign Borough value to Neighborhood if Neighborhood value is "Not Assigned"
toronto_nebr_df1=toronto_nebr_df

for row_index,row in toronto_nebr_df.iterrows():
    if((toronto_nebr_df.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned') or (toronto_nebr_df.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned\n')):
        toronto_nebr_df1.loc[row_index,['Neighborhood']] = toronto_nebr_df1.loc[row_index,['Borough']].values.astype('str') 
        
toronto_nebr_df1


Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Queen's Park
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [13]:
# Narrowing down only to four borough - East, West, Centrail and Downtown Toronto
# Ungroup dataset if more than 1 neighborhood is found in the same row
column = ['Postal_Code','Borough', 'Neighborhood'] 
toronto_nebr_df_ungrp = pd.DataFrame(columns=column_names)

toronto_nebr_df_ungrp = toronto_nebr_df1.drop(toronto_nebr_df1[toronto_nebr_df1['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)


toronto_nebr_df_ungrp.index = pd.RangeIndex(len(toronto_nebr_df_ungrp.index))
toronto_nebr_df_ungrp

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M5A,Downtown Toronto,Harbourfront
1,M7A,Downtown Toronto,Queen's Park
2,M5B,Downtown Toronto,Ryerson
3,M5B,Downtown Toronto,Garden District
4,M5C,Downtown Toronto,St. James Town
5,M4E,East Toronto,The Beaches
6,M5E,Downtown Toronto,Berczy Park
7,M5G,Downtown Toronto,Central Bay Street
8,M6G,Downtown Toronto,Christie
9,M5H,Downtown Toronto,Adelaide


In [None]:
# Geocode locations
geolocator = Nominatim(scheme='http', user_agent="ES1234")
for row_index, item in toronto_nebr_df_ungrp.iterrows():    
    list1 = toronto_nebr_df_ungrp.loc[[row_index],['Neighborhood']].values.astype('str')
    loc = ' , Toronto, Ontario, Canada'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    print(item)
    location = geolocator.geocode(list1 , limit = 15)
    if(location is not None):
        toronto_nebr_df_ungrp.loc[toronto_nebr_df_ungrp.index[row_index], 'Latitude'] = location.latitude
        toronto_nebr_df_ungrp.loc[toronto_nebr_df_ungrp.index[row_index], 'Longitude'] = location.longitude
        

In [18]:
toronto_nebr_df_ungrp.head()

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015
1,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034
2,M5B,Downtown Toronto,Ryerson,43.658469,-79.378993
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114
4,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704
