# Capstone Project: The Battle of the Neighborhoods

### Written by: Tim Kreutzfeldt

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction: Business Problem

On January 1, 2021, OpenBox Athletics suffered a fire which destroyed large portions of the building and a majority of its gym equipment. As one of the only gyms in Philadelphia which offered structured outdoor workouts throughout the COVID-19 pandemic, the loss of the gym came as a great shock to the Point Breeze neighborhood in which OpenBox Athletics is located. The gym's ownership is now actively looking for a new gym location.

In this project, we will explore the optimal location to open a gym in Philadelphia. Special attention will be given to locations which are not already saturated with gyms. Of additional interest to OpenBox Athletics will be selecting a location which is close to its original location so that the gym can maintain its clientele in the Point Breeze neighborhood.

We will explore this topic in detail using the data science methodology and tools which have been laid out in prior courses in the IBM Data Science Professional Certificate. Specifically, we will attempt to find the optimal location (in terms of Philadelphia city blocks) to be the home of the new OpenBox Athletics.

## Data

The main factors which will influence the location of the optimal new gym location will be the following:

* Proximity to old location of OpenBox Athletics (1931 Washington Ave, Philadelphia, PA 19146)
* Distance from the next closest gym

In order to determine the answers to these questions, the Foursquare API will be used to obtain geographical coordinates and other relevant information about nearby gyms and venues. Also, geographical .SHP files will be loaded to create a spatial grid of the 19146 zip code in which OpenBox Athletics is located. This grid will have a resolution roughly according to the size of a Philadelphia city block.

### Find candidate city blocks

The first step in this process will be to identify candidate locations near the old site of OpenBox Athletics. We will do so using the grid method laid out in the example Coursera projects. The following code block was taken from the example Python project notebook, which defines functinos to convert from Cartesian coordinates to latitude and longitude (and vice versa).

In [1]:
import pyproj

import math

from functools import partial
>>> from pyproj import Proj, transform
>>> proj_4326 = Proj(init="epsg:4326")
>>> proj_3857 = Proj(init="epsg:3857")
>>> transformer = partial(transform, proj_4326, proj_3857)
>>> transformer(12, 12)

pyproj 2 style:

>>> from pyproj import Transformer
>>> transformer = Transformer.from_crs("epsg:4326", "epsg:3857")
>>> transformer.transform(12, 12)



def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [3]:
import numpy as np
import pandas as pd
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install folium
import folium # map rendering library

print('Libraries imported.')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 8.5MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0
Libraries imported.


Let's start with a map of Philadelphia centered on OpenBox Athletics.

In [71]:
address = '1931 Washington Ave, Philadelphia, PA 19146'

geolocator = Nominatim(user_agent="Kreutzfeldt-Coursera-Capstone")
location = geolocator.geocode(address)
oba_latitude = location.latitude
oba_longitude = location.longitude
oba_center = [oba_latitude,oba_latitude]
print('The geograpical coordinate of OBA in Philadelphia are {}, {}.'.format(oba_latitude, oba_longitude))

The geograpical coordinate of OBA in Philadelphia are 39.93897444897959, -75.17527151020408.


Next, let's superimpose a grid of circles onto the map with hypothetical new gym locations (again using the example Python notebook functions). Relative to the example notebook, we will use a reduced size grid to maintain the location of the Point Breeze neighborhood and smaller circles to allow for a more refined assessment of the optimal location.

In [49]:
oba_center_x, oba_center_y = lonlat_to_xy(oba_longitude, oba_latitude) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = oba_center_x - 4000
x_step = 200
y_min = oba_center_y - 4000 - (int(41/k)*k*200 - 8000)/2
y_step = 200 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(41/k)):
    y = y_min + i * y_step
    x_offset = 100 if i%2==0 else 0
    for j in range(0, 41):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(oba_center_x, oba_center_y, x, y)
        if (distance_from_center <= 4001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate locations generated.')

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 

1453 candidate locations generated.


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


The zip code for the Point Breeze neighborhood in Philadelphia is 19146. We will confine our analysis of the optimal gym location to only those that exist in the 19146 zip code. However, to do so, we need to know exactly which points in the grid we just created exist inside of the 19146 boundary.

Fortunately, the geographical boundaries of Philadelphia zip codes are freely available online. We can begin by loading this file into a dataframe and filtering by our zip code of choice..

In [13]:
# tranforming json file into a pandas dataframe library
# !pip install geopandas
import geopandas
from geopandas import GeoDataFrame, GeoSeries
zipCodes = GeoDataFrame.from_file('Zipcodes_Poly.shp')
zipCodes.head()

Unnamed: 0,OBJECTID,CODE,COD,Shape__Are,Shape__Len,geometry
0,1,19120,20,91779700.0,49921.544063,"POLYGON ((-75.11107 40.04682, -75.10943 40.045..."
1,2,19121,21,69598790.0,39534.887217,"POLYGON ((-75.19227 39.99463, -75.19205 39.994..."
2,3,19122,22,35916320.0,24124.645221,"POLYGON ((-75.15406 39.98601, -75.15328 39.985..."
3,4,19123,23,35851750.0,26421.728982,"POLYGON ((-75.15190 39.97056, -75.15150 39.970..."
4,5,19124,24,144808000.0,63658.77042,"POLYGON ((-75.09660 40.04249, -75.09281 40.039..."


In [70]:
zipCodes = zipCodes[zipCodes['CODE'].str.contains("19146")]

Find the grid points earlier which exist within the confines of 19146.

In [68]:
from shapely.geometry import Point, Polygon

lng19146 = []
lat19146 = []
pts = GeoSeries([Point(x, y) for x, y in zip(longitudes, latitudes)])
in_map =  np.array([pts.within(geom) for geom in zipCodes.geometry]).sum(axis=0)
pts = GeoSeries([val for pos,val in enumerate(pts) if in_map[pos]])
for pos,val in enumerate(pts):
    lng19146.append(val.x)
    lat19146.append(val.y)

Now let's plot the grid inside the 19146 zip code.

In [80]:
map_oba = folium.Map(location=oba_center, zoom_start=15)
folium.Marker(oba_center).add_to(map_oba)
for lat, lon in zip(lat19146, lng19146):
    folium.Circle([lat, lon], radius=64, color='blue', fill=False).add_to(map_oba)
map_oba

### Foursquare API

Foursquare Credentials (delete before uploading to GitHub)

In [74]:
CLIENT_ID = 'JJJDGJUFVYKP5GRHKHM11XTTD0NLOUI2ET31AUKUCNFWSEOI' # your Foursquare ID
CLIENT_SECRET = 'Z00SY2FZLRUUFWRX03TQU4GPN4QHIU325WCPIX0PGRPMMPSL' # your Foursquare Secret
GCODE = 'INEKANRQMSF0LNNYGNFZBFRSUYUZBM0RHUCQE2WQPUQSJXTA#_=_'
ACCESS_TOKEN = 'B0LAJKNU24P1IU4TLQYUGCOZ45NKM3WD2AA2NZNWQIG2YVM1' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Now let's get all of the venues near the original OpenBox Athletics containing the words "Gym" or "Fitness"

In [76]:
search_query = 'Gym Fitness'
radius = 1000
print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, oba_latitude, oba_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head(10)

Gym Fitness .... OK!


  if __name__ == '__main__':


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,City Fitness,Gym / Fitness Center,2101 South St.,at 21st St,39.945191,-75.177352,"[{'label': 'display', 'lat': 39.9451908, 'lng'...",714,19146.0,US,Philadelphia,PA,United States,"[2101 South St. (at 21st St), Philadelphia, PA...",,4e3804eecc3f2228972873b0
1,Naval Square Fitness Center,Gym,800 Admirals Way,,39.944479,-75.184431,"[{'label': 'display', 'lat': 39.94447864473365...",993,19146.0,US,Philadelphia,PA,United States,"[800 Admirals Way, Philadelphia, PA 19146]",,4b2e516cf964a5203fde24e3
2,Freehouse Fitness Studio,Gym / Fitness Center,1430 South St,btwn 15th & Broad Sts,39.943556,-75.16665,"[{'label': 'display', 'lat': 39.94355613012257...",895,19146.0,US,Philadelphia,PA,United States,"[1430 South St (btwn 15th & Broad Sts), Philad...",,57658fef498e30d93719e606
3,12FIT Spa & Gym,Gym,1100 S Broad St,,39.937551,-75.167835,"[{'label': 'display', 'lat': 39.93755119592002...",654,19146.0,US,Philadelphia,PA,United States,"[1100 S Broad St, Philadelphia, PA 19146]",,4bec68a875b2c9b67971438d
4,Zarett Rehab Fitness,Physical Therapist,520 S 19th St,,39.94505,-75.17359,"[{'label': 'display', 'lat': 39.94505, 'lng': ...",691,19146.0,US,Philadelphia,PA,United States,"[520 S 19th St, Philadelphia, PA 19146]",,4d38beeca558a1cd4573cb43
5,The Gym at 21st Street,Gym / Fitness Center,,,39.950562,-75.175203,"[{'label': 'display', 'lat': 39.950562, 'lng':...",1289,,US,Philadelphia,PA,United States,"[Philadelphia, PA]",,56c9c74a498e8a0c7460f3f7
6,YMCA,Gym / Fitness Center,1724 Christian St,,39.940744,-75.172143,"[{'label': 'display', 'lat': 39.94074449462739...",331,19146.0,US,Philadelphia,PA,United States,"[1724 Christian St, Philadelphia, PA 19146]",,4be175e3d816c928581cf0d9
7,Daddis MMA and Fitness Academy,Gym / Fitness Center,1719 Washington Ave,,39.93877,-75.172401,"[{'label': 'display', 'lat': 39.93877029418945...",246,19146.0,US,Philadelphia,PA,United States,"[1719 Washington Ave, Philadelphia, PA 19146]",,4e5e7585814d9e0233e0536a
8,The Newport Gym,Gym,1530 Spruce St,,39.947174,-75.167923,"[{'label': 'display', 'lat': 39.94717407226562...",1107,19102.0,US,Philadelphia,PA,United States,"[1530 Spruce St, Philadelphia, PA 19102]",,4c621113fa7bc928a0770d27
9,Daddis Womens Fitness Boot Camp,Gym / Fitness Center,,,39.939198,-75.17114,"[{'label': 'display', 'lat': 39.93919766020181...",353,19146.0,US,Philadelphia,PA,United States,"[Philadelphia, PA 19146]",,4f4ece4ae4b0b0c45d48c889


We will filter the results further to only look at rows which are actually classified as a gym and eliminate the columns which are not useful.

In [77]:
nearby_gyms = dataframe_filtered[dataframe_filtered['categories'].str.contains("Gym")]
nearby_gyms = dataframe_filtered[['name','distance','lat','lng']]
print(nearby_gyms.shape)

(27, 4)


And voila! We now have a dataframe with the 27 closest gyms in the area plus their latitude and longitudes.

Let's see where these are located on the map we made earlier. 
* Candidate gyms locations are denoted by the grid of blue circles.
* Rival gym locations are denoted by red circles.
* The old location of OpenBox Athletics is denoted by the light blue marker.

In [79]:
oba_center = [oba_latitude,oba_longitude]
map_oba = folium.Map(location=oba_center, zoom_start=15)
for lat, lon in zip(lat19146, lng19146):
    folium.Circle([lat, lon], radius=64, color='blue', fill=False).add_to(map_oba)
for lat, lon in zip(nearby_gyms["lat"], nearby_gyms["lng"]):
    folium.Circle([lat, lon], radius=25, color='yellow', fill=True, fill_color='red', fill_opacity=1).add_to(map_oba)
folium.Marker(oba_center).add_to(map_oba)
map_oba

## Methodology

As shown in the above map, the old location of OpenBox Athletics is quite close to several other gyms. What is the best method for OpenBox Athletics to choose its new gym location which will maximize the number of customers?

The answer to this question is not trivial. Assuming the OpenBox Athletics intends only to appeal to residents in the 19146 zip code, one method for choosing a new location would be to maximize the number of grid points for which OpenBox Athletics is the *closest* gym. This would assume that

1. Gym customers choose their gym based solely on the gym which is closest to them.
2. An equal number of 19146 residents live in each grid location.

These assumptions will make determining the optimal new gym location much easier.

To do this, we will begin by calculating the distance of each point in the grid to all 27 other gyms in the area. Then, we will loop through all points in the grid as a hypothetical new location for OpenBox Athletics and count the number of other grid points for which OpenBox Athletics would be the closest gym. Finally, we will determine the location which maximizes this number and plot our results on a heat map.

## Results

First we must loop through each grid location and find the distance to the closest rival gym.

In [117]:
oba_dict = {'Latitude': lat19146, 'Longitude': lng19146}
oba_df = pd.DataFrame(data=oba_dict)
nearby_gyms.head()

Unnamed: 0,name,distance,lat,lng
0,City Fitness,714,39.945191,-75.177352
1,Naval Square Fitness Center,993,39.944479,-75.184431
2,Freehouse Fitness Studio,895,39.943556,-75.16665
3,12FIT Spa & Gym,654,39.937551,-75.167835
4,Zarett Rehab Fitness,691,39.94505,-75.17359


In [129]:
from geopy.distance import geodesic

oba_df["Rival gym distance"] = 99999
oba_df.head()

Unnamed: 0,Latitude,Longitude,Rival gym distance
0,39.94423,-75.165513,99999
1,39.94539,-75.16552,99999
2,39.939005,-75.166788,99999
3,39.940165,-75.166795,99999
4,39.941325,-75.166802,99999


In [130]:
for ii in range(0,len(oba_df)):
    min_distance = 99999
    coordinate1 = (oba_df['Latitude'][ii],oba_df['Longitude'][ii])
    
    for jj in range(0,len(nearby_gyms)):
        coordinate2 = (nearby_gyms['lat'][jj],nearby_gyms['lng'][jj])
        distance = geodesic(coordinate1, coordinate2).miles
        if distance < min_distance:
            min_distance = distance
    oba_df.iloc[ii, 2] = min_distance
    
oba_df.head()

Unnamed: 0,Latitude,Longitude,Rival gym distance
0,39.94423,-75.165513,0.076174
1,39.94539,-75.16552,0.140022
2,39.939005,-75.166788,0.112305
3,39.940165,-75.166795,0.07791
4,39.941325,-75.166802,0.046773


Now that we have the distance to the closest rival gym at each grid point, we need to calculate the distance of each grid point to the other points in the grid. If the distance measured is less than the distance to the closest rival gym, we will add one to the number of blocks for which that location is the closest gym in 19146.

In [134]:
oba_df['Closest gym blocks'] = 0
oba_df.head()

Unnamed: 0,Latitude,Longitude,Rival gym distance,Closest gym blocks
0,39.94423,-75.165513,0.076174,0
1,39.94539,-75.16552,0.140022,0
2,39.939005,-75.166788,0.112305,0
3,39.940165,-75.166795,0.07791,0
4,39.941325,-75.166802,0.046773,0


In [136]:
for ii in range(0,len(oba_df)):
    closest_gym_blocks = 0
    coordinate1 = (oba_df['Latitude'][ii],oba_df['Longitude'][ii])
    print(ii/len(oba_df)*100,'% Complete')
    for jj in range(0,len(oba_df)):
        coordinate2 = (oba_df['Latitude'][jj],oba_df['Longitude'][jj])
        distance = geodesic(coordinate1, coordinate2).miles
        if distance < oba_df['Rival gym distance'][jj]:
            closest_gym_blocks = closest_gym_blocks + 1
    oba_df.iloc[ii, 3] = closest_gym_blocks

0.0
0.0031446540880503146
0.006289308176100629
0.009433962264150943
0.012578616352201259
0.015723270440251572
0.018867924528301886
0.0220125786163522
0.025157232704402517
0.02830188679245283
0.031446540880503145
0.03459119496855346
0.03773584905660377
0.040880503144654086
0.0440251572327044
0.04716981132075472
0.050314465408805034
0.05345911949685535
0.05660377358490566
0.059748427672955975
0.06289308176100629
0.0660377358490566
0.06918238993710692
0.07232704402515723
0.07547169811320754
0.07861635220125786
0.08176100628930817
0.08490566037735849
0.0880503144654088
0.09119496855345911
0.09433962264150944
0.09748427672955975
0.10062893081761007
0.10377358490566038
0.1069182389937107
0.11006289308176101
0.11320754716981132
0.11635220125786164
0.11949685534591195
0.12264150943396226
0.12578616352201258
0.1289308176100629
0.1320754716981132
0.13522012578616352
0.13836477987421383
0.14150943396226415
0.14465408805031446
0.14779874213836477
0.1509433962264151
0.1540880503144654
0.15723270440

Unnamed: 0,Latitude,Longitude,Rival gym distance,Closest gym blocks
0,39.94423,-75.165513,0.076174,3
1,39.94539,-75.16552,0.140022,2
2,39.939005,-75.166788,0.112305,2
3,39.940165,-75.166795,0.07791,4
4,39.941325,-75.166802,0.046773,4


In [146]:
oba_df.describe()

Unnamed: 0,Latitude,Longitude,Rival gym distance,Closest gym blocks
count,318.0,318.0,318.0,318.0
mean,39.939259,-75.183375,0.416459,70.600629
std,0.003963,0.010549,0.316639,45.890567
min,39.930876,-75.204657,0.010715,1.0
25%,39.936045,-75.191556,0.138767,24.5
50%,39.939222,-75.182453,0.342062,74.0
75%,39.942395,-75.174614,0.646027,108.0
max,39.947644,-75.165513,1.222576,155.0


In [200]:
oba_df_optimal = oba_df[oba_df["Closest gym blocks"] > 140]
oba_optimal = [oba_df_optimal.iloc[0,0],oba_df_optimal.iloc[0,1]]
oba_df_optimal.head(20)

Unnamed: 0,Latitude,Longitude,Rival gym distance,Closest gym blocks
185,39.938938,-75.18506,0.383741,145
186,39.940098,-75.185067,0.304132,153
187,39.941258,-75.185075,0.224827,155
188,39.942418,-75.185083,0.146321,154
189,39.943578,-75.18509,0.071313,153
197,39.938353,-75.186361,0.434905,147
198,39.939513,-75.186369,0.357739,151
199,39.940673,-75.186376,0.282174,152
200,39.941833,-75.186384,0.20995,150
201,39.942993,-75.186392,0.146111,147


The optimal new gym location is at (39.941258, -75.185075), which would be the closest gym location for 155 city blocks in the 19146 zip codes. Let's plot our results as a heat map in 19146 to see where this is.

In [193]:
from folium import plugins
from folium.plugins import HeatMap

oba_center = [oba_latitude,oba_longitude]
map_oba = folium.Map(location=oba_center, zoom_start=15)
for lat, lon in zip(lat19146, lng19146):
    folium.Circle([lat, lon], radius=64, color='blue', fill=False).add_to(map_oba)
for lat, lon in zip(nearby_gyms["lat"], nearby_gyms["lng"]):
    folium.Circle([lat, lon], radius=25, color='yellow', fill=True, fill_color='red', fill_opacity=1).add_to(map_oba)
folium.Marker(oba_center).add_to(map_oba)
folium.Marker(oba_optimal, icon=folium.Icon(color='pink', icon='home', prefix='fa')).add_to(map_oba)
heat_data = []
for ii in range(0,len(oba_df)):
    heat_data.append([oba_df.iloc[ii,0],oba_df.iloc[ii,1],oba_df.iloc[ii,3]])
# Plot it on the map
colormap = {0.0: 'pink', 0.3: 'blue', 0.5: 'green',  0.7: 'yellow', 1: 'red'}
HeatMap(heat_data, min_opacity = 0, max_val = 20,gradient=colormap).add_to(map_oba)
map_oba

In [179]:
oba_df[oba_df["Closest gym blocks"] == 155]

Unnamed: 0,Latitude,Longitude,Rival gym distance,Closest gym blocks
187,39.941258,-75.185075,0.224827,155


As shown in the pink house icon in the map above, the optimal new location for OpenBox Athletics is at the intersection of 25th Street and Kimball Street.

## Discussion

The results showed that the optimal location for the new site of OpenBox Athletics is on the block of 25th Street and Kimball Street. Several caveats of this result warrant additional discussion.

First, not all locations will be able to host a new gym. The 19146 zip code is heavily residential, especially in the northern neighborhood of Graduate Hospital. This neighborhood may be limited in the amount of commercial space that is available for new businesses to open. This is why a heat map visualization of grid locations is useful. Clearly, there is a group of city blocks between Grays Ferry Avenue and 24th Street which would serve as good locations for a gym to attract the most amount of customers. Fortunately, the intersection of Washington Avenue and 25th Street--just south of the optimal location calculated from the data-- is home to a lot of commercial real estate. As such, this block is a much better candidate for a gym location. A gym at 25th and Washington would be the closest gym for 154 city blocks in the 19146 zip code (only one fewer than 25th and Kimball).

Another thing to consider is the two assumptions which were made about customer gym choices. Specifically, it was assumed that an equal number of people live at each city block and that all of these people are equally likely to enroll as a member at OpenBox Athletics if it is the closest gym to them. Of course, our results neglect that some areas are more populous than others in Philadelphia. They also neglect that demographics of certain areas may make them more or less inclined to purchase a gym membership. Both of these assumptions could be accounted for with more data. Housing data showing the number of residents that are located at each city block and survey data showing how likely each 19146 demographic group is to buy a gym membership would be very useful in modifying these findings.

## Conclusion

In this report, we used several data science tools to find the optimal location for a new gym in the 19146 zip code of Philadelphia. Geographical data analysis techniques in Python allowed us to create an equally sized grid of locations in the 19146 zip code. We also used the Foursquare API to locate other gyms in the area and determine how close they were to various locations in the neighborhood. Finally, we used a heat map via the Folium Python package to visualize which gym locations would attract the most customers based on being the closest gym to each 