# Applied Data Analytics - Battle of Neighbourhoods Project

__Table of contents:__
1. Introduction
2. Data
3. Methodology
4. Results
5. Discussion
6. Conclusion

__1. Introduction__

'Fitpro' is an american based fitness gym chain with over 100 gyms in US. 'Fitpro' is looking to expand their business to new, emerging locations, where gym industry is not fully established. <br>
The main market they are interested in is currently Canada, specifically Toronto. They want to expand their chain by one new facility in the Toronto area. <br>
Given the increasing interest in health and sport in the recent years, they are aware that the industry is highly competitive. As such, their first Canada-based gym should be located in an area with small competition. To identify the best location in Toronto, they've approached our data anlalytics team to help them find the best neighbourhood in Toronto to establish new gym of 'Fitpro' chain. <br>
Per the above, the business problem we are aiming to adress in the below analysis is as follows: "What is the best neighbourhood in Toronto to open a new gyn, taking into consideration the least number of other gyms in the area.

__2. Data__

For the purpose of our analysis, we obtained the data of all Toronto neighbourhoods using the most common and easily accesible source, i.e. wikipedia website. We formatted and cleaned the data for our purposes, per the below.

Let's obtain data of all postal codes in Canada staring with letter 'M' (i.e. including Toronto neighbourhoods).

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import urllib.request
import numpy as np

%config IPCompleter.greedy = True

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
table_from_top = 1
wikipedia_page = 'List_of_postal_codes_of_Canada'
trace = False

In [3]:
wikipedia_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'.format(wikipedia_page)
page = requests.get(wikipedia_url)
soup = BeautifulSoup(page.content, 'lxml')
tables = soup.find_all('table', {'class': 'wikitable'})
table = tables[table_from_top - 1]

In [4]:
feature_names = []

header_row = table.find('tr')
for header in header_row.find_all('th'):
    feature_name = ' '.join(header.find_all(text=True))
    feature_name.replace('\n', '')
    feature_names.append(feature_name)

'Postal Code'

'Borough'

'Neighbourhood'

In [5]:
def has_coords(tag):
    if tag.has_attr('class'):
        if tag['class'][0] == 'latitude' or tag['class'][0] == 'longitude':
            return True
    return False

def get_coords(child):
    coords = []
    for coord in child.find_all(has_coords):
        coords.append(coord.string)
    if coords:
        if trace:
            return 'C = {}'.format(' '.join(coords))
        else:
            return ' '.join(coords)
    else:
        return ''

samples = []
sample_rows = table.find_all('tr')[1:]
for sample_row in sample_rows:
    features = []
    for feature_col in sample_row.find_all('td'):
        feature_value = ''
        text = feature_col.string
        if text:
            if trace:
                features.append('T = {}'.format(text))
            else:
                features.append(text)
            continue
        
        for child in feature_col.children:
            if child.name == 'span':
                if child.has_attr('class'):
                    if child['class'] == 'display:none':
                        continue
                if child.find_all(has_coords):
                    feature_value = get_coords(child)
                    if feature_value:
                        break
                    else:
                        continue
            if child.name == 'sup':
                continue
            if child.name == 'a':
                if child.string[0] == '[':
                    continue            
            if child.name == 'a':
                if trace:
                    feature_value = 'A = {}'.format(child.string)
                else:
                    feature_value = child.string
                break
            if child.name == 'font':
                if trace:
                    feature_value = 'F = {}'.format(child.string)
                else:
                    feature_value = child.string
                break
            try:
                # feature_value = '' for any tags not covered above
                content = child.contents
            except AttributeError:
                # Handle whitespace between child tags, treated as a child string
                if child.isspace():
                    continue
                if trace:
                    feature_value = 'E = {}'.format(child)
                else:
                    feature_value = child
                break
        features.append(feature_value)
    samples.append(dict(zip(feature_names, features)))

In [6]:
canada = pd.DataFrame(samples)
canada.head()

Unnamed: 0,Postal Code\n,Borough\n,Neighbourhood\n
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


We will format the table so we can perform further analysis.

In [7]:
canada.rename(columns={'Postal Code\n': 'Postal Code', 'Borough\n': 'Borough', 'Neighbourhood\n': 'Neighbourhood'}, inplace=True)
canada.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


In [8]:
canada['Postal Code'] = canada['Postal Code'].str.replace(r'\n', '')
canada['Borough'] = canada['Borough'].str.replace(r'\n', '')
canada['Neighbourhood'] = canada['Neighbourhood'].str.replace(r'\n', '')
canada.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Now, we will choose only Toronto neighbourhoods, filtering the data for 'Toronto' in Borough column.

In [9]:
toronto = canada[canada['Borough'].str.contains('Toronto')]
toronto=toronto.reset_index(drop=True)
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M5A,Downtown Toronto,"Regent Park, Harbourfront"
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
2,M5B,Downtown Toronto,"Garden District, Ryerson"
3,M5C,Downtown Toronto,St. James Town
4,M4E,East Toronto,The Beaches
5,M5E,Downtown Toronto,Berczy Park
6,M5G,Downtown Toronto,Central Bay Street
7,M6G,Downtown Toronto,Christie
8,M5H,Downtown Toronto,"Richmond, Adelaide, King"
9,M6H,West Toronto,"Dufferin, Dovercourt Village"


In [10]:
toronto.shape

(39, 3)

In [11]:
test=toronto.groupby('Borough').count()
test

Unnamed: 0_level_0,Postal Code,Neighbourhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Central Toronto,9,9
Downtown Toronto,19,19
East Toronto,5,5
West Toronto,6,6


Now that we obtain the list of potential neighbourhoods in Toronto, let's add coordinates using geocoder function.

In [12]:
!pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 8.1 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [13]:
import geocoder

In [14]:
Postal_Code = toronto['Postal Code']

In [15]:
latitude=[]
longitude=[]
for code in toronto['Postal Code']:
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(code))
    print(code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(code))
        print(code, g.latlng)
    latlng = g.latlng
    latitude.append(latlng[0])
    longitude.append(latlng[1])

M5A [43.65512000000007, -79.36263999999994]
M7A [43.66253000000006, -79.39187999999996]
M5B [43.65739000000008, -79.37803999999994]
M5C [43.65215000000006, -79.37586999999996]
M4E [43.67709000000008, -79.29546999999997]
M5E [43.64536000000004, -79.37305999999995]
M5G [43.65609000000006, -79.38492999999994]
M6G [43.668690000000026, -79.42070999999999]
M5H [43.64970000000005, -79.38257999999996]
M6H [43.665050000000065, -79.43890999999996]
M5J [43.64285000000007, -79.38075999999995]
M6J [43.64848000000006, -79.41773999999998]
M4K [43.68375000000003, -79.35511999999994]
M5K [43.64710000000008, -79.38152999999994]
M6K [43.639410000000055, -79.42675999999994]
M4L [43.667970000000025, -79.31466999999998]
M5L [43.64840000000004, -79.37913999999995]
M4M [43.66213000000005, -79.33496999999994]
M4N [43.72843000000006, -79.38712999999996]
M5N [43.71208000000007, -79.41847999999999]
M4P [43.71276000000006, -79.38850999999994]
M5P [43.69479000000007, -79.41439999999994]
M6P [43.659730000000025, -79

In [17]:
toronto['Latitude'] = latitude
toronto['Longitude'] = longitude
toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
4,M4E,East Toronto,The Beaches,43.67709,-79.29547


Let's visualize the data. For that ourpose, we will use folium and json packages.

In [18]:
!pip install folium

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [26]:
import folium
import json
from geopy.geocoders import Nominatim 

In [27]:
address = 'Toronto'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [28]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, borough, neighborhood in zip(toronto['Latitude'], toronto['Longitude'], toronto['Borough'], toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<folium.vector_layers.CircleMarker at 0x7f31020aea90>

<folium.vector_layers.CircleMarker at 0x7f31003ef1d0>

<folium.vector_layers.CircleMarker at 0x7f3130623e50>

<folium.vector_layers.CircleMarker at 0x7f31003f2050>

<folium.vector_layers.CircleMarker at 0x7f31003f2290>

<folium.vector_layers.CircleMarker at 0x7f31003f2250>

<folium.vector_layers.CircleMarker at 0x7f31003f2490>

<folium.vector_layers.CircleMarker at 0x7f31003f2910>

<folium.vector_layers.CircleMarker at 0x7f31003f28d0>

<folium.vector_layers.CircleMarker at 0x7f31003f2d90>

<folium.vector_layers.CircleMarker at 0x7f3100386050>

<folium.vector_layers.CircleMarker at 0x7f3100386290>

<folium.vector_layers.CircleMarker at 0x7f31003f2e90>

<folium.vector_layers.CircleMarker at 0x7f3100386690>

<folium.vector_layers.CircleMarker at 0x7f3100386950>

<folium.vector_layers.CircleMarker at 0x7f31003867d0>

<folium.vector_layers.CircleMarker at 0x7f3100386b50>

<folium.vector_layers.CircleMarker at 0x7f3100386e10>

<folium.vector_layers.CircleMarker at 0x7f3100386810>

<folium.vector_layers.CircleMarker at 0x7f3102f316d0>

<folium.vector_layers.CircleMarker at 0x7f310038c390>

<folium.vector_layers.CircleMarker at 0x7f310038c490>

<folium.vector_layers.CircleMarker at 0x7f310038c650>

<folium.vector_layers.CircleMarker at 0x7f310038c890>

<folium.vector_layers.CircleMarker at 0x7f310038cb90>

<folium.vector_layers.CircleMarker at 0x7f310038cb50>

<folium.vector_layers.CircleMarker at 0x7f3100392110>

<folium.vector_layers.CircleMarker at 0x7f3100392390>

<folium.vector_layers.CircleMarker at 0x7f310038cd50>

<folium.vector_layers.CircleMarker at 0x7f3100392810>

<folium.vector_layers.CircleMarker at 0x7f31003928d0>

<folium.vector_layers.CircleMarker at 0x7f3100392a50>

<folium.vector_layers.CircleMarker at 0x7f3100392b90>

<folium.vector_layers.CircleMarker at 0x7f310039f190>

<folium.vector_layers.CircleMarker at 0x7f3100392f10>

<folium.vector_layers.CircleMarker at 0x7f310039f290>

<folium.vector_layers.CircleMarker at 0x7f310039f3d0>

<folium.vector_layers.CircleMarker at 0x7f310039f150>

<folium.vector_layers.CircleMarker at 0x7f310039fad0>

Now, let's leverage Foursquare API data to obtain list and locations of all venues in Toronto.

In [29]:
CLIENT_ID = 'FUD1ZSKEBF30D3424K0LBAZCUWDSLSQQPLYFCAVQ3EQSUJR2' # your Foursquare ID
CLIENT_SECRET = '0PXRI5XYRPT32PR4YDHC2AKNGTX5O0MX42OOPXN2XDZIB2WU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FUD1ZSKEBF30D3424K0LBAZCUWDSLSQQPLYFCAVQ3EQSUJR2
CLIENT_SECRET:0PXRI5XYRPT32PR4YDHC2AKNGTX5O0MX42OOPXN2XDZIB2WU


We will leverage below previously created function to obtain all venues located in Toronto.

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
toronto_venues = getNearbyVenues(names=toronto['Neighbourhood'],
                                   latitudes=toronto['Latitude'],
                                   longitudes=toronto['Longitude']
                                  )


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


After obtaining the list of all venues in Toronto, we will filter the results to include places relevant to our business problem. Since the investor is interested in opening new location for his gym chain, we will include venues that contain __'Gym'__ in the 'Venue category' for the purpose of our analysis.<br>
Additionally, for the purpose of our analysis, we will include places which might potentially impact attendance at the gym, i.e. places which are considered natural competitors of gyms, that is any venues containing __'Park', 'Sport', 'Pool', 'Stadium', 'Studio'__.

In [32]:
pd.set_option('display.max_rows', None)
categories=toronto_venues.groupby(['Venue Category']).count()
categories

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,1,1,1,1,1,1
Adult Boutique,2,2,2,2,2,2
American Restaurant,25,25,25,25,25,25
Antique Shop,2,2,2,2,2,2
Aquarium,2,2,2,2,2,2
Art Gallery,13,13,13,13,13,13
Art Museum,1,1,1,1,1,1
Arts & Crafts Store,5,5,5,5,5,5
Asian Restaurant,19,19,19,19,19,19
Athletics & Sports,2,2,2,2,2,2


In [33]:
pd.set_option('display.max_rows', 10)

In [34]:
toronto_venues_gym=toronto_venues[toronto_venues['Venue Category'].str.contains('Pool|Gym|Park|Sport|Stadium|Studio', na=False)]
toronto_venues_gym=toronto_venues_gym.reset_index(drop=True)
toronto_venues_gym.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65512,-79.36264,The Yoga Lounge,43.655515,-79.364955,Yoga Studio
1,"Regent Park, Harbourfront",43.65512,-79.36264,The Extension Room,43.653313,-79.359725,Gym / Fitness Center
2,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,Queen's Park,43.663946,-79.39218,Park
3,"Garden District, Ryerson",43.65739,-79.37804,306 Yonge Street - Jordan Store,43.656495,-79.381015,Sporting Goods Shop
4,"Garden District, Ryerson",43.65739,-79.37804,Hard Candy Fitness,43.659556,-79.38244,Gym / Fitness Center


Let's visualize the data.

In [36]:
map_toronto_venues_gym = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, neighbourhood in zip(toronto_venues_gym['Venue Latitude'], toronto_venues_gym['Venue Longitude'], toronto_venues_gym['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_venues_gym)  
    
map_toronto_venues_gym

<folium.vector_layers.CircleMarker at 0x7f310020ec90>

<folium.vector_layers.CircleMarker at 0x7f310020eb90>

<folium.vector_layers.CircleMarker at 0x7f3100313190>

<folium.vector_layers.CircleMarker at 0x7f3100313790>

<folium.vector_layers.CircleMarker at 0x7f310029cb10>

<folium.vector_layers.CircleMarker at 0x7f310029cbd0>

<folium.vector_layers.CircleMarker at 0x7f310029cd90>

<folium.vector_layers.CircleMarker at 0x7f310020e890>

<folium.vector_layers.CircleMarker at 0x7f3100313c10>

<folium.vector_layers.CircleMarker at 0x7f3100208410>

<folium.vector_layers.CircleMarker at 0x7f3100290190>

<folium.vector_layers.CircleMarker at 0x7f3100290350>

<folium.vector_layers.CircleMarker at 0x7f3100365e10>

<folium.vector_layers.CircleMarker at 0x7f3100377dd0>

<folium.vector_layers.CircleMarker at 0x7f3100377bd0>

<folium.vector_layers.CircleMarker at 0x7f3101cc3e90>

<folium.vector_layers.CircleMarker at 0x7f3100377e90>

<folium.vector_layers.CircleMarker at 0x7f3101d3ad10>

<folium.vector_layers.CircleMarker at 0x7f3100276210>

<folium.vector_layers.CircleMarker at 0x7f3100276650>

<folium.vector_layers.CircleMarker at 0x7f31002768d0>

<folium.vector_layers.CircleMarker at 0x7f310039fa90>

<folium.vector_layers.CircleMarker at 0x7f3100276b10>

<folium.vector_layers.CircleMarker at 0x7f310027b150>

<folium.vector_layers.CircleMarker at 0x7f3100276f50>

<folium.vector_layers.CircleMarker at 0x7f310027b8d0>

<folium.vector_layers.CircleMarker at 0x7f310027b810>

<folium.vector_layers.CircleMarker at 0x7f310027b710>

<folium.vector_layers.CircleMarker at 0x7f310027bed0>

<folium.vector_layers.CircleMarker at 0x7f31001fefd0>

<folium.vector_layers.CircleMarker at 0x7f31001fed50>

<folium.vector_layers.CircleMarker at 0x7f31001fe450>

<folium.vector_layers.CircleMarker at 0x7f31001fe110>

<folium.vector_layers.CircleMarker at 0x7f31001fe310>

<folium.vector_layers.CircleMarker at 0x7f31001fec10>

<folium.vector_layers.CircleMarker at 0x7f31001fea50>

<folium.vector_layers.CircleMarker at 0x7f310027bc90>

<folium.vector_layers.CircleMarker at 0x7f31002650d0>

<folium.vector_layers.CircleMarker at 0x7f31002657d0>

<folium.vector_layers.CircleMarker at 0x7f31002659d0>

<folium.vector_layers.CircleMarker at 0x7f3100265350>

<folium.vector_layers.CircleMarker at 0x7f3100265410>

<folium.vector_layers.CircleMarker at 0x7f3100251290>

<folium.vector_layers.CircleMarker at 0x7f3100251510>

<folium.vector_layers.CircleMarker at 0x7f3100251650>

<folium.vector_layers.CircleMarker at 0x7f3100251850>

<folium.vector_layers.CircleMarker at 0x7f31002511d0>

<folium.vector_layers.CircleMarker at 0x7f3100265c50>

<folium.vector_layers.CircleMarker at 0x7f3100268590>

<folium.vector_layers.CircleMarker at 0x7f3100268110>

<folium.vector_layers.CircleMarker at 0x7f3100268810>

<folium.vector_layers.CircleMarker at 0x7f3100268f10>

<folium.vector_layers.CircleMarker at 0x7f310026b410>

<folium.vector_layers.CircleMarker at 0x7f3100265e10>

<folium.vector_layers.CircleMarker at 0x7f310026b7d0>

<folium.vector_layers.CircleMarker at 0x7f310026b210>

<folium.vector_layers.CircleMarker at 0x7f310026b650>

<folium.vector_layers.CircleMarker at 0x7f310026bfd0>

<folium.vector_layers.CircleMarker at 0x7f310026e090>

<folium.vector_layers.CircleMarker at 0x7f310026e5d0>

<folium.vector_layers.CircleMarker at 0x7f3100268050>

<folium.vector_layers.CircleMarker at 0x7f310026e850>

<folium.vector_layers.CircleMarker at 0x7f310026eb90>

<folium.vector_layers.CircleMarker at 0x7f310026e210>

<folium.vector_layers.CircleMarker at 0x7f310026ee90>

<folium.vector_layers.CircleMarker at 0x7f310035b3d0>

<folium.vector_layers.CircleMarker at 0x7f310035b110>

<folium.vector_layers.CircleMarker at 0x7f310035b990>

<folium.vector_layers.CircleMarker at 0x7f310031a890>

<folium.vector_layers.CircleMarker at 0x7f310031ab10>

<folium.vector_layers.CircleMarker at 0x7f310031a710>

<folium.vector_layers.CircleMarker at 0x7f3101c41e50>

<folium.vector_layers.CircleMarker at 0x7f3101c41750>

<folium.vector_layers.CircleMarker at 0x7f3101c41e10>

<folium.vector_layers.CircleMarker at 0x7f310032a5d0>

<folium.vector_layers.CircleMarker at 0x7f310032a150>

<folium.vector_layers.CircleMarker at 0x7f310035be10>

<folium.vector_layers.CircleMarker at 0x7f31002dc310>

<folium.vector_layers.CircleMarker at 0x7f3102523550>

<folium.vector_layers.CircleMarker at 0x7f31002a3910>

<folium.vector_layers.CircleMarker at 0x7f3102523990>

<folium.vector_layers.CircleMarker at 0x7f31002a39d0>

<folium.vector_layers.CircleMarker at 0x7f31002a30d0>

<folium.vector_layers.CircleMarker at 0x7f31002a3550>

<folium.vector_layers.CircleMarker at 0x7f31002e2990>

<folium.vector_layers.CircleMarker at 0x7f31020a1d50>

<folium.vector_layers.CircleMarker at 0x7f31002e2ed0>

<folium.vector_layers.CircleMarker at 0x7f310031fed0>

<folium.vector_layers.CircleMarker at 0x7f310031fbd0>

<folium.vector_layers.CircleMarker at 0x7f31003499d0>

<folium.vector_layers.CircleMarker at 0x7f31003491d0>

<folium.vector_layers.CircleMarker at 0x7f31003493d0>

<folium.vector_layers.CircleMarker at 0x7f31003495d0>

<folium.vector_layers.CircleMarker at 0x7f3100349910>

<folium.vector_layers.CircleMarker at 0x7f3100325610>

<folium.vector_layers.CircleMarker at 0x7f3100349850>

<folium.vector_layers.CircleMarker at 0x7f3100325a50>

<folium.vector_layers.CircleMarker at 0x7f3100241c50>

<folium.vector_layers.CircleMarker at 0x7f31003498d0>

<folium.vector_layers.CircleMarker at 0x7f3100241ed0>

<folium.vector_layers.CircleMarker at 0x7f3100241e10>

<folium.vector_layers.CircleMarker at 0x7f310023d990>

<folium.vector_layers.CircleMarker at 0x7f310023d310>

<folium.vector_layers.CircleMarker at 0x7f310023dd50>

<folium.vector_layers.CircleMarker at 0x7f31002cced0>

<folium.vector_layers.CircleMarker at 0x7f31002ccb10>

<folium.vector_layers.CircleMarker at 0x7f31002cc550>

<folium.vector_layers.CircleMarker at 0x7f310030fd90>

<folium.vector_layers.CircleMarker at 0x7f31002cc7d0>

<folium.vector_layers.CircleMarker at 0x7f310030fa90>

<folium.vector_layers.CircleMarker at 0x7f310030f7d0>

After a quick glance at the map we can see there are potentially places that are better suited for a new gym than other. We will perform formal data analysis to be sure. Let's do a little formatting on our table to include only data that we will be using for further analysis.

In [37]:
toronto_gym=toronto_venues_gym[['Neighbourhood', 'Venue', 'Venue Category']]
toronto_gym

Unnamed: 0,Neighbourhood,Venue,Venue Category
0,"Regent Park, Harbourfront",The Yoga Lounge,Yoga Studio
1,"Regent Park, Harbourfront",The Extension Room,Gym / Fitness Center
2,"Queen's Park, Ontario Provincial Government",Queen's Park,Park
3,"Garden District, Ryerson",306 Yonge Street - Jordan Store,Sporting Goods Shop
4,"Garden District, Ryerson",Hard Candy Fitness,Gym / Fitness Center
...,...,...,...
106,"Business reply mail Processing Centre, South C...",Cardio-Go,Gym
107,"Business reply mail Processing Centre, South C...",Equinox Bay Street,Gym
108,"Business reply mail Processing Centre, South C...",The Cambridge Club,Gym
109,"Business reply mail Processing Centre, South C...",Osgoode Hall Park,Park


__3. Methodology__

In this project we will direct our efforts on detecting neighbourhoods in Toronto that have low sport area presence, particularly those with low number of existing fyms. 

In first step we have collected the required data: location and type (category) of every venue in Toronto. We have also identified if the particular venue is a gym or some other sport area.

Second step in our analysis will be detecting which neighbourhoods are least populated with sport areas as a whole and and focus our attention on those areas with lowest number.

In third and final step we will identiify whether a sport area is a gym or some other sport location to minimize the number of promising locations presented ot our stakeholders. 

__4. Results__

Let's first review general overview of the data we gathered.

In [38]:
toronto_gym

Unnamed: 0,Neighbourhood,Venue,Venue Category
0,"Regent Park, Harbourfront",The Yoga Lounge,Yoga Studio
1,"Regent Park, Harbourfront",The Extension Room,Gym / Fitness Center
2,"Queen's Park, Ontario Provincial Government",Queen's Park,Park
3,"Garden District, Ryerson",306 Yonge Street - Jordan Store,Sporting Goods Shop
4,"Garden District, Ryerson",Hard Candy Fitness,Gym / Fitness Center
...,...,...,...
106,"Business reply mail Processing Centre, South C...",Cardio-Go,Gym
107,"Business reply mail Processing Centre, South C...",Equinox Bay Street,Gym
108,"Business reply mail Processing Centre, South C...",The Cambridge Club,Gym
109,"Business reply mail Processing Centre, South C...",Osgoode Hall Park,Park


In [39]:
print('Total number of Neighbourhoods with sport areas in Toronto:', toronto_gym['Neighbourhood'].nunique())
print('Total number of sport areas in Toronto:', len(toronto_gym['Venue Category']))
print('Average number of sport areas in neighborhood:', len(toronto_gym['Venue Category'])/toronto_gym['Neighbourhood'].nunique())

Total number of Neighbourhoods with sport areas in Toronto: 35
Total number of sport areas in Toronto: 111
Average number of sport areas in neighborhood: 3.1714285714285713


In [40]:
print('List of all sport areas')
print('-----------------------')
for r in toronto_gym['Venue']:
    print(r)
print('...')

List of all sport areas
-----------------------
The Yoga Lounge
The Extension Room
Queen's Park
306 Yonge Street - Jordan Store
Hard Candy Fitness
GoodLife Fitness Toronto 137 Yonge Street
Berczy Park
Wynn Fitness
Glen Stewart Park
Berczy Park
Bikram Yoga Centre
Spirit of Hockey
College Park Area
Hard Candy Fitness
Queens Club
Adelaide Club Toronto
Equinox Bay Street
The Cambridge Club
GoodLife Fitness Toronto 137 Yonge Street
Toronto Athletic Club
Dovercourt Park
Wallace Emerson Park
Wallace Emerson Community Centre
Wallace Emerson Gym
Batl Backyard Axe Throwing League
Real Sports Apparel
Scotiabank Arena
Roundhouse Park
Harbour Square Park
RS - Real Sports
YogaSpace
Trinity Bellwoods Park
Charles Sauriol Parkette
Equinox Bay Street
Adelaide Club Toronto
Scotiabank Arena
Real Sports Apparel
Spirit of Hockey
Reebok Crossfit Liberty Village
Lamport Stadium
Gyan yoga
Joe Rockhead's Climbing Gym
Pia Bouman School
Lisgar Park
System Fitness
Woodbine Park
Measurement Park
Equinox Bay Street

To decide which neighbourhood is best to create new sport business, gym specifically, let's see which areas are least populated with sport venues.

In [41]:
pd.set_option('display.max_rows', None)
gym_count2=toronto_gym.groupby('Neighbourhood').count()
gym_count1=gym_count2[['Venue']]
gym_count=gym_count1.sort_values(by=['Venue'])
gym_count

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
"Kensington Market, Chinatown, Grange Park",1
"High Park, The Junction South",1
"Queen's Park, Ontario Provincial Government",1
"Forest Hill North & West, Forest Hill Road Park",1
Christie,1
"Moore Park, Summerhill East",1
The Beaches,1
"The Danforth West, Riverdale",1
"St. James Town, Cabbagetown",2
Davisville,2



We can see that there are several neighbourhoods with only 1 sport area. Let's only analyze that data.

In [47]:
pd.set_option('display.max_rows', 10)

In [48]:
gym_count_small = gym_count.loc[gym_count['Venue'] == 1]
gym_count_small

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
"Kensington Market, Chinatown, Grange Park",1
"High Park, The Junction South",1
"Queen's Park, Ontario Provincial Government",1
"Forest Hill North & West, Forest Hill Road Park",1
Christie,1
"Moore Park, Summerhill East",1
The Beaches,1
"The Danforth West, Riverdale",1


Now since we have our potential candidates for preferable locations, we will revert back to our overall data and check, whether all of those venues are gyms or some other sport areas. <br>
The assumption is that nearby gyms are potentially greater competitors than outside park or yoga studio. 

In [49]:
toronto_temp=toronto_gym[toronto_gym['Neighbourhood'].str.match('Kensington Market, Chinatown, Grange Park|High Park, The Junction South|Queen\'s Park, Ontario Provincial Government|Forest Hill North & West, Forest Hill Road Park|Christie|Moore Park, Summerhill East|The Beaches|The Danforth West, Riverdale', na=False)]
toronto_temp

Unnamed: 0,Neighbourhood,Venue,Venue Category
2,"Queen's Park, Ontario Provincial Government",Queen's Park,Park
8,The Beaches,Glen Stewart Park,Park
14,Christie,Queens Club,Athletics & Sports
32,"The Danforth West, Riverdale",Charles Sauriol Parkette,Park
59,"Forest Hill North & West, Forest Hill Road Park",Suydam Park,Park
60,"High Park, The Junction South",Lithuania Park,Park
76,"Moore Park, Summerhill East",Pure Fitness,Gym
77,"Kensington Market, Chinatown, Grange Park",Grange Park,Park


Finally, let's eliminate neighbourhoods with indoor sport venues.

In [50]:
toronto_final=toronto_temp[toronto_temp['Venue Category'].str.contains('Park')]
toronto_final

Unnamed: 0,Neighbourhood,Venue,Venue Category
2,"Queen's Park, Ontario Provincial Government",Queen's Park,Park
8,The Beaches,Glen Stewart Park,Park
32,"The Danforth West, Riverdale",Charles Sauriol Parkette,Park
59,"Forest Hill North & West, Forest Hill Road Park",Suydam Park,Park
60,"High Park, The Junction South",Lithuania Park,Park
77,"Kensington Market, Chinatown, Grange Park",Grange Park,Park


In [52]:
print('The number of neighbourhoods suited for new gym:', toronto_final['Neighbourhood'].nunique())

The number of neighbourhoods suited for new gym: 6


Let's visualize potential locations.

In [53]:
toronto_final_for_map=toronto_venues_gym[toronto_venues_gym['Neighbourhood'].str.match('Kensington Market, Chinatown, Grange Park|High Park, The Junction South|Queen\'s Park, Ontario Provincial Government|Forest Hill North & West, Forest Hill Road Park|The Beaches|The Danforth West, Riverdale', na=False)]
toronto_final_for_map

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,Queen's Park,43.663946,-79.39218,Park
8,The Beaches,43.67709,-79.29547,Glen Stewart Park,43.675278,-79.294647,Park
32,"The Danforth West, Riverdale",43.68375,-79.35512,Charles Sauriol Parkette,43.68527,-79.356588,Park
59,"Forest Hill North & West, Forest Hill Road Park",43.69479,-79.4144,Suydam Park,43.69042,-79.4139,Park
60,"High Park, The Junction South",43.65973,-79.46281,Lithuania Park,43.658667,-79.463038,Park
77,"Kensington Market, Chinatown, Grange Park",43.65351,-79.39722,Grange Park,43.652488,-79.392053,Park


In [54]:
map_toronto_final = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, neighbourhood in zip(toronto_final_for_map['Venue Latitude'], toronto_final_for_map['Venue Longitude'], toronto_final_for_map['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_final)  
    
map_toronto_final

<folium.vector_layers.CircleMarker at 0x7f310024e850>

<folium.vector_layers.CircleMarker at 0x7f310024eed0>

<folium.vector_layers.CircleMarker at 0x7f31002544d0>

<folium.vector_layers.CircleMarker at 0x7f3100254d50>

<folium.vector_layers.CircleMarker at 0x7f310025a8d0>

<folium.vector_layers.CircleMarker at 0x7f310025ac50>

__5. Discussion__

Our analysis shows that although there is a great number of sport areas in Toronto, there are still neighbourhoods with no gym facilities. Highest concentration of gyms was detected in the south side of the city, so we focused our attention to to other areas. 

After directing our attention to this more narrow area of interest we first performed analysis to detect neighbourhoods with the smallest population of sport areas. 

Those location candidates were then sorted and we considered only those populated with only 1 sport venue. We categorized the venues and excluded those neighbourhoods that already have indoor sport facility, leaving only locations with only outdoor activities available for the citizens. 

Result of all this is 6 neighbourhoods containing largest number of potential new gyms locations based on lowest number of additional competitors in the area. Please note that the purpose of this analysis was to only provide high-level information on the potential competitors located in Toronto's neighbourhoods and further analysis might be required. However, we can suggest with some level of confidence that proposed neighbourhoods will eliminate the potential treath of compiting with other gymc companies for the market. 

__6. Conlusion__


Purpose of this project was to identify Toronto neighbourhoods with low number of gyms in order to aid stakeholders in narrowing down the search for optimal location for a new gym location. By calculating current number of existing gyms and other sport areas we first narrowed the search to 8 potential locations. Further, after excluding areas with exsiting gyms, we proposed 6 potential neighbourhoods in Yoronto for establishing new gym location. 
Final decission on optimal gym location will be made by stakeholders based on specific characteristics of neighborhoods, taking into consideration additional factors.