Question: If a client decides to open a gym/fitness centre in New York, which neighbourhood(s) is(are) recommended?

Approach:
Request is issued to fourshare to identify all fitness facilities in New York. They are then grouped together by their locations. With the grouping, we can find out the neighborhoods with more fitness facilities.

Recommendation can be made based on the presence of fitness facilities in the neightborhood and the popularity of the type of fitness facilities in the neighborhood. 

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [5]:
neighborhoods_data = newyork_data['features']

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [12]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [13]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [14]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [15]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [16]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [17]:
CLIENT_ID = 'QLBCFH5LGDGNC1AVIBFEEZA00VO51IGNYMZMNURID3TCYZFC' # your Foursquare ID
CLIENT_SECRET = 'KM5TAV45PNUTMLSAHRFNS50RZ1GTEJK1DVDMOGGQFYGJH1WV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QLBCFH5LGDGNC1AVIBFEEZA00VO51IGNYMZMNURID3TCYZFC
CLIENT_SECRET:KM5TAV45PNUTMLSAHRFNS50RZ1GTEJK1DVDMOGGQFYGJH1WV


In [18]:
manhattan_data.loc[0, 'Neighborhood']

'Marble Hill'

In [19]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


In [56]:
#The correct answer is:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

#CategoryID for gym
categor = '4bf58dd8d48988d175941735'

# https://api.foursquare.com/v2/venues/search?ll=40.7,-74&categoryId=
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    categor,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&categoryId=4bf58dd8d48988d175941735&client_id=QLBCFH5LGDGNC1AVIBFEEZA00VO51IGNYMZMNURID3TCYZFC&client_secret=KM5TAV45PNUTMLSAHRFNS50RZ1GTEJK1DVDMOGGQFYGJH1WV&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100'

In [57]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60cee33f17ce8440bcc91f44'},
 'response': {'headerLocation': 'Marble Hill',
  'headerFullLocation': 'Marble Hill, New York',
  'headerLocationGranularity': 'neighborhood',
  'query': 'gym fitness',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 40.88105078329964,
    'lng': -73.90471933917806},
   'sw': {'lat': 40.87205077429964, 'lng': -73.91659997808156}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4a725fa1f964a520f6da1fe3',
       'name': 'TCR The Club of Riverdale',
       'location': {'address': '2600 Netherland Ave',
        'lat': 40.8786283,
        'lng': -73.9145678,
        'labeledLatLngs': [{'label': 'display',
          'lat': 40.8786283,
          'lng': -73.9145678}],
        'distance': 402,
        'pos

In [58]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [63]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,TCR The Club of Riverdale,Gym / Fitness Center,40.878628,-73.914568
1,Blink Fitness,Gym / Fitness Center,40.877271,-73.905595
2,Bikram Yoga,Yoga Studio,40.876844,-73.906204
3,Planet Fitness,Gym / Fitness Center,40.874088,-73.909137
4,Astral Fitness & Wellness Center,Gym,40.876705,-73.906372
5,24 Hour Fitness,Gym / Fitness Center,40.880592,-73.908255
6,Bronx Boxing,Boxing Gym,40.876646,-73.905927


In [99]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


In [105]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, categor='4bf58dd8d48988d175941735'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            categor,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [106]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [110]:
print(manhattan_venues.shape)
manhattan_venues.head()

(1666, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,TCR The Club of Riverdale,40.878628,-73.914568,Gym / Fitness Center
1,Marble Hill,40.876551,-73.91066,Blink Fitness,40.877271,-73.905595,Gym / Fitness Center
2,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
3,Marble Hill,40.876551,-73.91066,Planet Fitness,40.874088,-73.909137,Gym / Fitness Center
4,Marble Hill,40.876551,-73.91066,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym


In [109]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,33,33,33,33,33,33
Carnegie Hill,52,52,52,52,52,52
Central Harlem,12,12,12,12,12,12
Chelsea,45,45,45,45,45,45
Chinatown,19,19,19,19,19,19
Civic Center,93,93,93,93,93,93
Clinton,54,54,54,54,54,54
East Harlem,9,9,9,9,9,9
East Village,31,31,31,31,31,31
Financial District,100,100,100,100,100,100


In [72]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 37 uniques categories.


In [111]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Athletics & Sports,Bike Shop,Boxing Gym,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,College Gym,Community Center,Cycle Studio,Deli / Bodega,Doctor's Office,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health & Beauty Service,Hospital,Indoor Play Area,Martial Arts School,Massage Studio,Medical Center,Nutritionist,Office,Outdoor Gym,Park,Pilates Studio,Pool,Residential Building (Apartment / Condo),Spa,Spiritual Center,Tennis Court,Track,Weight Loss Center,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [112]:
manhattan_onehot.shape

(1666, 38)

In [113]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Athletics & Sports,Bike Shop,Boxing Gym,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,College Gym,Community Center,Cycle Studio,Deli / Bodega,Doctor's Office,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health & Beauty Service,Hospital,Indoor Play Area,Martial Arts School,Massage Studio,Medical Center,Nutritionist,Office,Outdoor Gym,Park,Pilates Studio,Pool,Residential Building (Apartment / Condo),Spa,Spiritual Center,Tennis Court,Track,Weight Loss Center,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.575758,0.272727,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.326923,0.365385,0.019231,0.019231,0.0,0.0,0.0,0.057692,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.076923
2,Central Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.416667,0.416667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.111111,0.533333,0.022222,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.155556
4,Chinatown,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.315789,0.421053,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263
5,Civic Center,0.0,0.0,0.021505,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.172043,0.451613,0.010753,0.010753,0.0,0.0,0.0,0.096774,0.010753,0.0,0.0,0.0,0.0,0.0,0.053763,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.129032
6,Clinton,0.0,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.388889,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.018519,0.018519,0.0,0.0,0.018519
7,East Harlem,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.444444,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111
8,East Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.193548,0.290323,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.064516,0.0,0.16129,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.16129
9,Financial District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.48,0.43,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07


In [114]:
manhattan_grouped.shape

(40, 38)

In [115]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                  venue  freq
0                   Gym  0.58
1  Gym / Fitness Center  0.27
2            Boxing Gym  0.06
3             Bike Shop  0.03
4        Gymnastics Gym  0.03


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center  0.37
1                   Gym  0.33
2           Yoga Studio  0.08
3   Martial Arts School  0.06
4    Weight Loss Center  0.04


----Central Harlem----
                  venue  freq
0  Gym / Fitness Center  0.42
1                   Gym  0.42
2          Cycle Studio  0.08
3   Martial Arts School  0.08
4        Medical Center  0.00


----Chelsea----
                  venue  freq
0  Gym / Fitness Center  0.53
1           Yoga Studio  0.16
2                   Gym  0.11
3   Martial Arts School  0.09
4          Cycle Studio  0.04


----Chinatown----
                  venue  freq
0  Gym / Fitness Center  0.42
1                   Gym  0.32
2           Yoga Studio  0.11
3   Martial Arts School  0.11
4            Boxi

In [116]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [117]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Gym,Gym / Fitness Center,Boxing Gym,Gymnastics Gym,Bike Shop,Gym Pool,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio
1,Carnegie Hill,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Weight Loss Center,Pilates Studio,Deli / Bodega,Community Center,Gymnastics Gym,Gym Pool
2,Central Harlem,Gym / Fitness Center,Gym,Martial Arts School,Cycle Studio,Yoga Studio,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,College Gym
3,Chelsea,Gym / Fitness Center,Yoga Studio,Gym,Martial Arts School,Cycle Studio,Pilates Studio,Gym Pool,Weight Loss Center,Climbing Gym,Clothing Store
4,Chinatown,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio


In [118]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 1, 3, 3, 0, 3, 3, 0], dtype=int32)

In [119]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Gym / Fitness Center,Yoga Studio,Boxing Gym,Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym
1,Manhattan,Chinatown,40.715618,-73.994279,3,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Gym,Gym / Fitness Center,Pilates Studio,College Gym,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Yoga Studio
3,Manhattan,Inwood,40.867684,-73.92121,3,Pilates Studio,Gym,Gym / Fitness Center,Yoga Studio,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,Health & Beauty Service
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Yoga Studio,Gym,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym


In [120]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [121]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Gym,Gym / Fitness Center,Pilates Studio,College Gym,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Yoga Studio
6,Central Harlem,Gym / Fitness Center,Gym,Martial Arts School,Cycle Studio,Yoga Studio,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,College Gym
9,Yorkville,Gym,Gym / Fitness Center,Pilates Studio,Gymnastics Gym,Martial Arts School,Gym Pool,Boxing Gym,Climbing Gym,Clothing Store,Club House
11,Roosevelt Island,Gym,Gym / Fitness Center,Yoga Studio,Health & Beauty Service,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym
13,Lincoln Square,Gym / Fitness Center,Gym,Martial Arts School,Gym Pool,Cycle Studio,Pilates Studio,Indoor Play Area,Yoga Studio,Residential Building (Apartment / Condo),Climbing Gym
14,Clinton,Gym / Fitness Center,Gym,Yoga Studio,Track,Tennis Court,Building,Residential Building (Apartment / Condo),Chiropractor,Community Center,Doctor's Office
16,Murray Hill,Gym / Fitness Center,Gym,Martial Arts School,Yoga Studio,Spa,Doctor's Office,Pilates Studio,Track,Cycle Studio,Boxing Gym
21,Tribeca,Gym / Fitness Center,Gym,Gym Pool,Yoga Studio,Pilates Studio,Cycle Studio,Gymnastics Gym,Athletics & Sports,Track,Pool
24,West Village,Gym,Gym / Fitness Center,Yoga Studio,Cycle Studio,Track,College Gym,Gym Pool,Doctor's Office,Deli / Bodega,Community Center
28,Battery Park City,Gym,Gym / Fitness Center,Boxing Gym,Gymnastics Gym,Bike Shop,Gym Pool,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio


In [122]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym / Fitness Center,Yoga Studio,Boxing Gym,Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym
5,Manhattanville,Gym / Fitness Center,Climbing Gym,Yoga Studio,Community Center,Gym Pool,Gym,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym
17,Chelsea,Gym / Fitness Center,Yoga Studio,Gym,Martial Arts School,Cycle Studio,Pilates Studio,Gym Pool,Weight Loss Center,Climbing Gym,Clothing Store
33,Midtown South,Gym / Fitness Center,Gym,Yoga Studio,Boxing Gym,Martial Arts School,Medical Center,Health & Beauty Service,Building,Clothing Store,Club House
37,Stuyvesant Town,Gym / Fitness Center,Yoga Studio,Gym,Health & Beauty Service,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym
38,Flatiron,Gym / Fitness Center,Gym,Yoga Studio,Athletics & Sports,Tennis Court,Gym Pool,Bike Shop,Boxing Gym,Building,Chiropractor


In [123]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Morningside Heights,Yoga Studio,Park,Medical Center,College Gym,Gym Pool,Building,Chiropractor,Climbing Gym,Clothing Store,Club House


In [124]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio
3,Inwood,Pilates Studio,Gym,Gym / Fitness Center,Yoga Studio,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,Health & Beauty Service
7,East Harlem,Gym / Fitness Center,Martial Arts School,Yoga Studio,Building,Gym,Community Center,Gym Pool,Doctor's Office,Deli / Bodega,Cycle Studio
8,Upper East Side,Gym / Fitness Center,Gym,Yoga Studio,Doctor's Office,Pilates Studio,Cycle Studio,Martial Arts School,Spa,College Gym,Deli / Bodega
10,Lenox Hill,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Cycle Studio,Martial Arts School,Club House,Spa,Climbing Gym,Clothing Store
12,Upper West Side,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Boxing Gym,Gymnastics Gym,Building,Chiropractor,Climbing Gym,Clothing Store
15,Midtown,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Martial Arts School,Weight Loss Center,Boxing Gym,Chiropractor,Cycle Studio,Hospital
18,Greenwich Village,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Martial Arts School,Boxing Gym,Spa,Medical Center,Cycle Studio,Bike Shop
19,East Village,Gym / Fitness Center,Gym,Pilates Studio,Yoga Studio,Martial Arts School,Outdoor Gym,Weight Loss Center,Track,Bike Shop,Community Center
20,Lower East Side,Gym,Martial Arts School,Yoga Studio,Gym / Fitness Center,Pool,Community Center,College Gym,Doctor's Office,Deli / Bodega,Cycle Studio


In [125]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Hamilton Heights,Yoga Studio,Gym,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym


In [126]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 5, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
