## TYK Battle of the Neighborhoods

#### Toni Krowisz  June 2020
Rows with confidential data have been removed.

## Problem Description and Background


One of our clients <em><b>LiveLong</b></em> has petitioned our firm to gather and analyze some research for healthy food cafes. They are requesting some recommendations on the best cities in which such cafes have the potential to be profitable and have a positive impact in that community. <em><b>LiveLong' s</b></em> goal is to target two cities for launching their initial health food cafes, and checking the results of revenue and positive impact in the community. The criteria is to target neighborhoods that have larger concentrations of venues for healthy choice activities with lowest rate of competition. 

## Data Description and Approach

Results from a study posted by Adam McCann, February 10, 2020  <em><b>Healthiest and Unhealthiest Cities in America</b></em> provides rankings for the healthiest and most unhealthy cities. A further analysis will be done to make a determination of which two cities may be chosen for analysis, according to their total health score rankings. 

The top 5 healthiest cities , along with the 5 lowest ranking healthy cities will be chosen from this previous study. 
Within these cities, we will gather Foursquare location data for venues that represent healthy choice activities, i.e. parks with bike and/or hiking trails, gyms, and/or organic food stores, along with other types of cafes in the area. The healthy choice activities will give insight to the potential market in the neighborhoods. The other types of cafes in the area will give insight to potential competitors. 

Of these venues, we will gather data to understand visitor ratings.
The location data and other descriptive information will be visually presented on a map. 

Two locations will be recommended based on the concentration of nearby healthy choice activity venues, and the ratings of potential competitors. It is most likely that people who frequent healthy choice activity venues may be the best market for a healthy foods café.


In [8]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


import folium # plotting library


print('Libraries imported.')

Libraries imported.


In [9]:


# pull in the dataset to view what city(ies) may be used

df_healthy_cities = pd.read_csv("rank_healthy_cities.csv")
df_healthy_cities

Unnamed: 0,overall_rank,city,total_health_score,health_care_rank,food_rank,fitness_rank,green_space_rank
0,1,"San Francisco, CA",73.99,29,1,4,1
1,2,"Seattle, WA",70.62,19,4,3,2
2,3,"San Diego, CA",70.01,25,3,1,8
3,4,"Portland, OR",65.66,61,6,16,3
4,5,"Washington, DC",63.87,47,9,26,5
...,...,...,...,...,...,...,...
169,170,"Memphis, TN",29.64,166,155,169,160
170,171,"Shreveport, LA",27.42,165,171,171,165
171,172,"Gulfport, MS",24.82,171,172,167,174
172,173,"Laredo, TX",24.06,151,170,174,156


In [10]:
# take the top 5 and the bottom 5 
df_healthy_cities.sort_values(by = 'total_health_score', ascending =True)

top5 = df_healthy_cities.head(5)
top5.head()

Unnamed: 0,overall_rank,city,total_health_score,health_care_rank,food_rank,fitness_rank,green_space_rank
0,1,"San Francisco, CA",73.99,29,1,4,1
1,2,"Seattle, WA",70.62,19,4,3,2
2,3,"San Diego, CA",70.01,25,3,1,8
3,4,"Portland, OR",65.66,61,6,16,3
4,5,"Washington, DC",63.87,47,9,26,5


In [11]:
bottom5 = df_healthy_cities.tail(5)
bottom5.head()

Unnamed: 0,overall_rank,city,total_health_score,health_care_rank,food_rank,fitness_rank,green_space_rank
169,170,"Memphis, TN",29.64,166,155,169,160
170,171,"Shreveport, LA",27.42,165,171,171,165
171,172,"Gulfport, MS",24.82,171,172,167,174
172,173,"Laredo, TX",24.06,151,170,174,156
173,174,"Brownsville, TX",21.41,174,174,173,173


In [12]:
city_scores = top5.append(bottom5)
city_scores

Unnamed: 0,overall_rank,city,total_health_score,health_care_rank,food_rank,fitness_rank,green_space_rank
0,1,"San Francisco, CA",73.99,29,1,4,1
1,2,"Seattle, WA",70.62,19,4,3,2
2,3,"San Diego, CA",70.01,25,3,1,8
3,4,"Portland, OR",65.66,61,6,16,3
4,5,"Washington, DC",63.87,47,9,26,5
169,170,"Memphis, TN",29.64,166,155,169,160
170,171,"Shreveport, LA",27.42,165,171,171,165
171,172,"Gulfport, MS",24.82,171,172,167,174
172,173,"Laredo, TX",24.06,151,170,174,156
173,174,"Brownsville, TX",21.41,174,174,173,173


In [13]:
#reset index to begin at zero again, instead of 1, due to dropping the first row
city_scores.reset_index(drop=True,inplace=True)
city_scores.index

RangeIndex(start=0, stop=10, step=1)

In [14]:
city_scores

Unnamed: 0,overall_rank,city,total_health_score,health_care_rank,food_rank,fitness_rank,green_space_rank
0,1,"San Francisco, CA",73.99,29,1,4,1
1,2,"Seattle, WA",70.62,19,4,3,2
2,3,"San Diego, CA",70.01,25,3,1,8
3,4,"Portland, OR",65.66,61,6,16,3
4,5,"Washington, DC",63.87,47,9,26,5
5,170,"Memphis, TN",29.64,166,155,169,160
6,171,"Shreveport, LA",27.42,165,171,171,165
7,172,"Gulfport, MS",24.82,171,172,167,174
8,173,"Laredo, TX",24.06,151,170,174,156
9,174,"Brownsville, TX",21.41,174,174,173,173


## Get Sample Data from Foursquare

In [15]:
address = 'Washington, DC'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

38.8949855 -77.0365708


In [19]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef0484a882fc7001bd497d9'},
 'response': {'venues': [{'id': '504f874fe4b06643340e17b5',
    'name': 'Hains Point Bike Loop',
    'location': {'address': 'Ohio Dr.',
     'lat': 38.87703055070882,
     'lng': -77.0267238038791,
     'labeledLatLngs': [{'label': 'display',
       'lat': 38.87703055070882,
       'lng': -77.0267238038791}],
     'distance': 2173,
     'postalCode': '20024',
     'cc': 'US',
     'city': 'Washington',
     'state': 'D.C.',
     'country': 'United States',
     'formattedAddress': ['Ohio Dr.',
      'Washington, D.C. 20024',
      'United States']},
    'categories': [{'id': '56aa371be4b08b9a8d57355e',
      'name': 'Bike Trail',
      'pluralName': 'Bike Trails',
      'shortName': 'Bike Trail',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/hikingtrail_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1592805791',
    'hasPerk': False},
   {'id': '4ae207f4f964a5

In [20]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
washington_biketrails = pd.json_normalize(venues)
washington_biketrails.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id,location.crossStreet,location.neighborhood
0,504f874fe4b06643340e17b5,Hains Point Bike Loop,"[{'id': '56aa371be4b08b9a8d57355e', 'name': 'B...",v-1592805791,False,Ohio Dr.,38.877031,-77.026724,"[{'label': 'display', 'lat': 38.87703055070882...",2173,20024,US,Washington,D.C.,United States,"[Ohio Dr., Washington, D.C. 20024, United States]",,,
1,4ae207f4f964a520f78921e3,Bike And Roll DC,"[{'id': '4e4c9077bd41f78e849722f9', 'name': 'B...",v-1592805791,False,955 L'Enfant Plaza SW,38.893518,-77.027766,"[{'label': 'display', 'lat': 38.89351844386055...",780,20024,US,Washington,D.C.,United States,"[955 L'Enfant Plaza SW, Washington, D.C. 20024...",75708136.0,,
2,4ff22829e4b013bcfc266bbf,Bike The Sites,"[{'id': '56aa371be4b08b9a8d573520', 'name': 'T...",v-1592805791,False,1100 Pennsylvania Ave NW,38.893586,-77.027916,"[{'label': 'display', 'lat': 38.89358625157055...",765,20004,US,Washington,D.C.,United States,"[1100 Pennsylvania Ave NW, Washington, D.C. 20...",,,
3,5a8ac09047f8767d3767c550,Johnny Love Bike Tours,"[{'id': '56aa371be4b08b9a8d573520', 'name': 'T...",v-1592805791,False,1440 G Street NW,38.897836,-77.032821,"[{'label': 'display', 'lat': 38.8978361, 'lng'...",454,20005,US,Washington,D.C.,United States,"[1440 G Street NW, Washington, D.C. 20005, Uni...",482246980.0,,
4,4db03c831e729fcc563b77c2,Capital Bikeshare - 19th & E St NW,"[{'id': '4e4c9077bd41f78e849722f9', 'name': 'B...",v-1592805791,False,1900 E St NW,38.894672,-77.044189,"[{'label': 'display', 'lat': 38.89467239379883...",660,20415,US,Washington,D.C.,United States,"[1900 E St NW (at 19th St NW), Washington, D.C...",,at 19th St NW,


In [21]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in washington_biketrails.columns if col.startswith('location.')] + ['id']
dataframe_filtered = washington_biketrails.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Hains Point Bike Loop,Bike Trail,Ohio Dr.,38.877031,-77.026724,"[{'label': 'display', 'lat': 38.87703055070882...",2173,20024.0,US,Washington,D.C.,United States,"[Ohio Dr., Washington, D.C. 20024, United States]",,,504f874fe4b06643340e17b5
1,Bike And Roll DC,Bike Rental / Bike Share,955 L'Enfant Plaza SW,38.893518,-77.027766,"[{'label': 'display', 'lat': 38.89351844386055...",780,20024.0,US,Washington,D.C.,United States,"[955 L'Enfant Plaza SW, Washington, D.C. 20024...",,,4ae207f4f964a520f78921e3
2,Bike The Sites,Tour Provider,1100 Pennsylvania Ave NW,38.893586,-77.027916,"[{'label': 'display', 'lat': 38.89358625157055...",765,20004.0,US,Washington,D.C.,United States,"[1100 Pennsylvania Ave NW, Washington, D.C. 20...",,,4ff22829e4b013bcfc266bbf
3,Johnny Love Bike Tours,Tour Provider,1440 G Street NW,38.897836,-77.032821,"[{'label': 'display', 'lat': 38.8978361, 'lng'...",454,20005.0,US,Washington,D.C.,United States,"[1440 G Street NW, Washington, D.C. 20005, Uni...",,,5a8ac09047f8767d3767c550
4,Capital Bikeshare - 19th & E St NW,Bike Rental / Bike Share,1900 E St NW,38.894672,-77.044189,"[{'label': 'display', 'lat': 38.89467239379883...",660,20415.0,US,Washington,D.C.,United States,"[1900 E St NW (at 19th St NW), Washington, D.C...",at 19th St NW,,4db03c831e729fcc563b77c2
5,Pennsylvania Avenue Bike Lane,Bike Trail,,38.8959,-77.030924,"[{'label': 'display', 'lat': 38.89589978583692...",499,20004.0,US,Washington,D.C.,United States,"[Washington, D.C. 20004, United States]",,,53f9f418498e85885794ad1d
6,Bike Law,Lawyer,1810 Florida Ave NW #1,38.916214,-77.042274,"[{'label': 'display', 'lat': 38.91621398925781...",2414,20009.0,US,Washington,WA,United States,"[1810 Florida Ave NW #1, Washington, WA 20009,...",,,5d4a838c176e1a00086399bb
7,Rock Creek Running Trail,Trail,Rock Creek Trail,38.907834,-77.048961,"[{'label': 'display', 'lat': 38.90783365591662...",1788,20008.0,US,Washington,D.C.,United States,"[Rock Creek Trail, Washington, D.C. 20008, Uni...",,,4bf5176be5eba59318532090
8,The Time Trail Of John Brown,Exhibit,National museum of American History,38.891283,-77.029073,"[{'label': 'display', 'lat': 38.891283, 'lng':...",769,,US,Washington,D.C.,United States,"[National museum of American History, Washingt...",,,4e385791149579ccbaf7003b
9,Capital Bikeshare - Smithsonian-National Mall ...,Bike Rental / Bike Share,Jefferson Dr SW,38.889041,-77.028574,"[{'label': 'display', 'lat': 38.88904148619195...",958,20024.0,US,Washington,D.C.,United States,"[Jefferson Dr SW (at 12th St SW), Washington, ...",at 12th St SW,,4f7ae687e4b0794c2ab12818


## Get location data for fitness center

In [24]:
results2 = requests.get(url2).json()
results2

{'meta': {'code': 200, 'requestId': '5ef04c29b1cac0001b49e4d3'},
 'response': {'venues': [{'id': '4ff87c19e4b0f1c619140693',
    'name': 'Fitness Center',
    'location': {'address': '800 16th St NW',
     'crossStreet': 'at Hay-Adams',
     'lat': 38.900516510009766,
     'lng': -77.03700256347656,
     'labeledLatLngs': [{'label': 'display',
       'lat': 38.900516510009766,
       'lng': -77.03700256347656}],
     'distance': 616,
     'postalCode': '20006',
     'cc': 'US',
     'city': 'Washington',
     'state': 'D.C.',
     'country': 'United States',
     'formattedAddress': ['800 16th St NW (at Hay-Adams)',
      'Washington, D.C. 20006',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d176941735',
      'name': 'Gym',
      'pluralName': 'Gyms',
      'shortName': 'Gym',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/gym_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1592806566',
    'hasPerk': False},
 

In [25]:
# assign relevant part of JSON to venues
venues2 = results2['response']['venues']

# tranform venues into a dataframe
washington_fitness = pd.json_normalize(venues2)
washington_fitness.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,4ff87c19e4b0f1c619140693,Fitness Center,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1592806566,False,800 16th St NW,at Hay-Adams,38.900517,-77.037003,"[{'label': 'display', 'lat': 38.90051651000976...",616,20006.0,US,Washington,D.C.,United States,"[800 16th St NW (at Hay-Adams), Washington, D...."
1,517d2bdad86c8832970bf497,St. Regis Fitness Center,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",v-1592806566,False,923 16th St NW,,38.902239,-77.036026,"[{'label': 'display', 'lat': 38.90223857843994...",808,20006.0,US,Washington,D.C.,United States,"[923 16th St NW, Washington, D.C. 20006, Unite..."
2,51498783e4b09c775e5aae72,Frankin Square Fitness Center,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1592806566,False,"1300 I Street NW, Washington, DC",,38.901308,-77.031509,"[{'label': 'display', 'lat': 38.90130819353104...",829,,US,Washington,D.C.,United States,"[1300 I Street NW, Washington, DC, Washington,..."
3,4c56a988cc96c9b6091a772e,IMF Fitness Center,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",v-1592806566,False,700 19th St NW,,38.89929,-77.04405,"[{'label': 'display', 'lat': 38.89928970531802...",805,20431.0,US,Washington,D.C.,United States,"[700 19th St NW, Washington, D.C. 20431, Unite..."
4,4d7ff460e6d7721e67860fc7,One to One Fitness Center,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1592806566,False,1750 K St NW,,38.902139,-77.040775,"[{'label': 'display', 'lat': 38.902139, 'lng':...",875,20006.0,US,Washington,D.C.,United States,"[1750 K St NW, Washington, D.C. 20006, United ..."


In [26]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns2 = ['name', 'categories'] + [col for col in washington_fitness.columns if col.startswith('location.')] + ['id']
dataframe_filtered2 = washington_fitness.loc[:, filtered_columns2]


# filter the category for each row
dataframe_filtered2['categories'] = dataframe_filtered2.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered2.columns = [column.split('.')[-1] for column in dataframe_filtered2.columns]

dataframe_filtered2

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Fitness Center,Gym,800 16th St NW,at Hay-Adams,38.900517,-77.037003,"[{'label': 'display', 'lat': 38.90051651000976...",616,20006.0,US,Washington,D.C.,United States,"[800 16th St NW (at Hay-Adams), Washington, D....",4ff87c19e4b0f1c619140693
1,St. Regis Fitness Center,Gym / Fitness Center,923 16th St NW,,38.902239,-77.036026,"[{'label': 'display', 'lat': 38.90223857843994...",808,20006.0,US,Washington,D.C.,United States,"[923 16th St NW, Washington, D.C. 20006, Unite...",517d2bdad86c8832970bf497
2,Frankin Square Fitness Center,Gym,"1300 I Street NW, Washington, DC",,38.901308,-77.031509,"[{'label': 'display', 'lat': 38.90130819353104...",829,,US,Washington,D.C.,United States,"[1300 I Street NW, Washington, DC, Washington,...",51498783e4b09c775e5aae72
3,IMF Fitness Center,Gym / Fitness Center,700 19th St NW,,38.89929,-77.04405,"[{'label': 'display', 'lat': 38.89928970531802...",805,20431.0,US,Washington,D.C.,United States,"[700 19th St NW, Washington, D.C. 20431, Unite...",4c56a988cc96c9b6091a772e
4,One to One Fitness Center,Gym,1750 K St NW,,38.902139,-77.040775,"[{'label': 'display', 'lat': 38.902139, 'lng':...",875,20006.0,US,Washington,D.C.,United States,"[1750 K St NW, Washington, D.C. 20006, United ...",4d7ff460e6d7721e67860fc7
5,Fitness Center @ 1110 Vermont,Gym,1110 Vermont Ave NW,,38.904339,-77.033279,"[{'label': 'display', 'lat': 38.90433883666992...",1079,20005.0,US,Washington,D.C.,United States,"[1110 Vermont Ave NW, Washington, D.C. 20005, ...",51cdfccf498ec004bb286d35
6,Hamilton Fitness Center,Gym / Fitness Center,,,38.90307,-77.032687,"[{'label': 'display', 'lat': 38.90307, 'lng': ...",960,20005.0,US,Washington,D.C.,United States,"[Washington, D.C. 20005, United States]",5c5037778a4cf50039d1d983
7,Metropolitan Square Fitness Center,Gym / Fitness Center,,,38.897549,-77.033307,"[{'label': 'display', 'lat': 38.897549, 'lng':...",401,20005.0,US,Washington,D.C.,United States,"[Washington, D.C. 20005, United States]",5c1b8726cbcdee002cc6ce6e
8,Franklin Tower Fitness Center,Gym / Fitness Center,1200 K St NW,,38.902132,-77.028647,"[{'label': 'display', 'lat': 38.902132, 'lng':...",1050,20005.0,US,Washington,D.C.,United States,"[1200 K St NW, Washington, D.C. 20005, United ...",4e8b20bbb63423c020fdf971
9,1899 L Street Fitness Center,Gym / Fitness Center,1828 L St NW,at 19th,38.903423,-77.042923,"[{'label': 'display', 'lat': 38.90342330932617...",1088,20036.0,US,Washington,D.C.,United States,"[1828 L St NW (at 19th), Washington, D.C. 2003...",4eaaf31ce5fa45480dde8467


## Create Map of Washington, DC with sample data of bike trails and fitness centers in the area. 

In [27]:
washington_biketrails_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map for bike trails in Washington.DC

# add a red circle marker to represent Washington, DC
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Washington',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(washington_biketrails_map)

# add the Bike Trails as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(washington_biketrails_map)
    
# add the Fitness Centers as green circle markers
for lat, lng, label in zip(dataframe_filtered2.lat, dataframe_filtered2.lng, dataframe_filtered2.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(washington_biketrails_map)
    
# display map
washington_biketrails_map