## Applied Data Science Capstone Project

This notebook will be used to code and document the Applied Data Science Capstone Project.
### The Problem: 
The clients, a couple of vegan chefs, are looking for an ideal location to open a vegan restaurant in New York City, particularly in the boroughs of Manhattan or Brooklyn. We will use New York City neighborhoods data and Foursquare location data to analyze the various neighborhoods in these boroughs and to identify an ideal location for a vegan restaurant.

### Data Sources:
#### Neighborhoods Data
New York city has 306 neighborhoods spread out among 5 boroughs. This New York city neighborhoods data will be downloaded from https://geo.nyu.edu/catalog/nyu_2451_34572. 
#### Foursquare Locations Data:
The Foursquare API provides location based experiences with diverse information about venues, users, photos, and check-ins. We will use this API to get information about the venues in the various neighborhoods. The neighborhood coordinates from the neighborhoods dataframe will be used with the Foursquare API to analyze the neighborhoods. 

In [91]:
# import the pandas library as pd
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#import Numpy library as np
import numpy as np

import json # library to handle JSON files
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


print('Libraries Imported')




Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries Imported


## Download and Explore New York City Neighborhoods Data

In [92]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


### Load and explore the data

In [93]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [94]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

All the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [95]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [96]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

### Transform the data into a _pandas_ Dataframe

Create an empty dataframe

In [97]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [98]:
# take a look at the empty dataframe
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [99]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [100]:
# Examine the dataframe
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [101]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Define Foursquare Credentials and Version

In [102]:
CLIENT_ID = 'R52VRKLGEODNTHJDVOND3C1CTBDDC2OCGJE1K0GARU3KM4J4' # your Foursquare ID
CLIENT_SECRET = 'DYGWYPX4YQEYMALZCA0EJOMGSIJXDFI1S443MOVVOHI1EQV0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: R52VRKLGEODNTHJDVOND3C1CTBDDC2OCGJE1K0GARU3KM4J4
CLIENT_SECRET:DYGWYPX4YQEYMALZCA0EJOMGSIJXDFI1S443MOVVOHI1EQV0


In [103]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

### Explore Neighborhoods in each borough

Let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [104]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Use geopy library to get the latitude and longitude values of Manhattan and Brooklyn

In [105]:
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
manhattan_latitude = location.latitude
manhattan_longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(manhattan_latitude, manhattan_longitude))

address = 'Brooklyn, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
brooklyn_latitude = location.latitude
brooklyn_longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(brooklyn_latitude, brooklyn_longitude))



The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.
The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


### Let's create a function to repeat the same process to all the neighborhoods in each borough

In [106]:
def getNearbyVenues(names, latitudes, longitudes, radius=250,LIMIT=100):
    
    #catID = '4bf58dd8d48988d1d3941735,4bf58dd8d48988d102941735,50aa9e744b90af0d42d5de0e,4bf58dd8d48988d1fa941735'
    catID = '4bf58dd8d48988d1d3941735,50aa9e744b90af0d42d5de0e,4bf58dd8d48988d1fa941735,4bf58dd8d48988d102941735,4bf58dd8d48988d175941735,56aa371be4b08b9a8d57355e,4bf58dd8d48988d159941735'
    #4d4b7105d754a06374d81259'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&categoryId={},&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            catID,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new data frame called manhattan_venues

In [107]:
manhattan_venues = getNearbyVenues(names = manhattan_data['Neighborhood'],
                                   latitudes = manhattan_data['Latitude'],
                                   longitudes = manhattan_data['Longitude'])

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Let's check the data frame

In [108]:
manhattan_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Marble Hill Youthmarket,40.874519,-73.910394,Farmers Market
1,Marble Hill,40.876551,-73.91066,Bronx Boxing,40.875671,-73.908355,Boxing Gym
2,Chinatown,40.715618,-73.994279,SKY TING YOGA,40.716469,-73.99502,Yoga Studio
3,Chinatown,40.715618,-73.994279,Manhattan Bridge Pedestrian Path,40.715537,-73.995877,Trail
4,Chinatown,40.715618,-73.994279,Sky Ting Yoga,40.71532,-73.992391,Yoga Studio


Let's chek how many venues were returned for each neighborhood

In [109]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,6,6,6,6,6,6
Carnegie Hill,16,16,16,16,16,16
Central Harlem,8,8,8,8,8,8
Chelsea,8,8,8,8,8,8
Chinatown,8,8,8,8,8,8
Civic Center,40,40,40,40,40,40
Clinton,19,19,19,19,19,19
East Harlem,5,5,5,5,5,5
East Village,17,17,17,17,17,17
Financial District,31,31,31,31,31,31


Let's find out how many unique categories can be curated from all the returned venues

In [110]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 49 uniques categories.


#### Analyze Each Neighborhood

In [111]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")   

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = manhattan_onehot.columns.tolist()
cols.insert(0, cols.pop(cols.index('Neighborhood')))
manhattan_onehot = manhattan_onehot.reindex(columns= cols)
manhattan_onehot.head()

Unnamed: 0,Neighborhood,Acupuncturist,American Restaurant,Athletics & Sports,Bakery,Beer Garden,Boxing Gym,Building,Café,Chaat Place,Chinese Restaurant,Climbing Gym,Coffee Shop,College Gym,Community Center,Cycle Studio,Deli / Bodega,Doctor's Office,Farmers Market,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health & Beauty Service,Health Food Store,Hotel,Indonesian Restaurant,Juice Bar,Korean Restaurant,Martial Arts Dojo,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Non-Profit,Pharmacy,Pilates Studio,Pool,Residential Building (Apartment / Condo),Salad Place,Spa,Spiritual Center,Supplement Shop,Track,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Weight Loss Center,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
4,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


And let's examine the new dataframe size.

In [112]:
manhattan_onehot.shape

(553, 50)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [113]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

Let's confirm the new size

In [114]:
manhattan_grouped.shape

(40, 50)

Let's print each neighborhood along with the top 5 most common venues

In [115]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
           venue  freq
0            Gym  0.67
1       Gym Pool  0.17
2          Trail  0.17
3  Acupuncturist  0.00
4           Pool  0.00


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center  0.38
1                   Gym  0.25
2      Community Center  0.06
3         Deli / Bodega  0.06
4    Turkish Restaurant  0.06


----Central Harlem----
                           venue  freq
0           Gym / Fitness Center  0.38
1              Health Food Store  0.12
2                   Cycle Studio  0.12
3  Vegetarian / Vegan Restaurant  0.12
4                            Gym  0.12


----Chelsea----
                           venue  freq
0                            Gym  0.50
1           Gym / Fitness Center  0.25
2  Vegetarian / Vegan Restaurant  0.12
3                 Pilates Studio  0.12
4                  Acupuncturist  0.00


----Chinatown----
                  venue  freq
0           Yoga Studio  0.38
1  Gym / Fitness Center  0.25
2          

#### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [116]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each neighborhood.

In [117]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
manhattan_venues_sorted = pd.DataFrame(columns=columns)
manhattan_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    manhattan_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

manhattan_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Battery Park City,Gym,Gym Pool,Trail,Yoga Studio,Coffee Shop
1,Carnegie Hill,Gym / Fitness Center,Gym,Gym Pool,Turkish Restaurant,Building
2,Central Harlem,Gym / Fitness Center,Health Food Store,Vegetarian / Vegan Restaurant,Gym,Farmers Market
3,Chelsea,Gym,Gym / Fitness Center,Pilates Studio,Vegetarian / Vegan Restaurant,Yoga Studio
4,Chinatown,Yoga Studio,Gym / Fitness Center,Gym,Trail,Farmers Market
5,Civic Center,Gym,Gym / Fitness Center,Yoga Studio,Martial Arts Dojo,Vegetarian / Vegan Restaurant
6,Clinton,Gym / Fitness Center,Gym,Farmers Market,Vegetarian / Vegan Restaurant,Building
7,East Harlem,Gym / Fitness Center,Martial Arts Dojo,Gym,Yoga Studio,College Gym
8,East Village,Vegetarian / Vegan Restaurant,Gym,Gym / Fitness Center,Health Food Store,American Restaurant
9,Financial District,Gym,Gym / Fitness Center,Yoga Studio,College Gym,Weight Loss Center


### Cluster Neighborhoods

Run k-means to cluster the neighborhoods into 6 clusters

In [118]:
# set number of clusters
kclusters = 6

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,init='random',n_init=100,random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
#print(kmeans.labels_.size)

manhattan_merged = manhattan_data
manhattan_data.shape

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(manhattan_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
# drop rows with NaN values
manhattan_merged.dropna(axis=0,how='any',inplace=True)

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_
manhattan_merged = manhattan_merged.reset_index(drop=True)

manhattan_merged.head() # check the last columns!
manhattan_merged.to_csv("manhattan.csv")

Visualize the clusters

In [119]:
# create map
map_clusters = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine the clusters

In [120]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
1,Chinatown,Yoga Studio,Gym / Fitness Center,Gym,Trail,Farmers Market,0
3,Inwood,Gym,Farmers Market,Pharmacy,Yoga Studio,College Gym,0
6,Central Harlem,Gym / Fitness Center,Health Food Store,Vegetarian / Vegan Restaurant,Gym,Farmers Market,0
9,Yorkville,Gym,Gym / Fitness Center,Boxing Gym,Yoga Studio,College Gym,0
10,Lenox Hill,Gym / Fitness Center,Health Food Store,Gym,Non-Profit,Cycle Studio,0
14,Clinton,Gym / Fitness Center,Gym,Farmers Market,Vegetarian / Vegan Restaurant,Building,0
17,Chelsea,Gym,Gym / Fitness Center,Pilates Studio,Vegetarian / Vegan Restaurant,Yoga Studio,0
23,Soho,Yoga Studio,Gym,Boxing Gym,Cycle Studio,Spa,0
24,West Village,Gym,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Yoga Studio,Trail,0
26,Morningside Heights,Farmers Market,Yoga Studio,Vegetarian / Vegan Restaurant,Track,Gym / Fitness Center,0


In [121]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
7,East Harlem,Gym / Fitness Center,Martial Arts Dojo,Gym,Yoga Studio,College Gym,1
20,Lower East Side,Gym,Pool,Coffee Shop,Gym Pool,Gym / Fitness Center,1


In [122]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
15,Midtown,Gym,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Pilates Studio,Yoga Studio,2
22,Little Italy,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Gym,Yoga Studio,Mexican Restaurant,2
25,Manhattan Valley,Martial Arts Dojo,Gym,Yoga Studio,College Gym,Gym Pool,2


In [123]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
28,Battery Park City,Gym,Gym Pool,Trail,Yoga Studio,Coffee Shop,3
30,Carnegie Hill,Gym / Fitness Center,Gym,Gym Pool,Turkish Restaurant,Building,3


In [124]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
0,Marble Hill,Boxing Gym,Farmers Market,Yoga Studio,College Gym,Gym Pool,4
19,East Village,Vegetarian / Vegan Restaurant,Gym,Gym / Fitness Center,Health Food Store,American Restaurant,4
21,Tribeca,Gym,Yoga Studio,Vegetarian / Vegan Restaurant,Gym / Fitness Center,Cycle Studio,4
37,Stuyvesant Town,Gym / Fitness Center,Yoga Studio,Health & Beauty Service,Gym Pool,Gym,4


In [125]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 5, manhattan_merged.columns[[1] + list(range(4, manhattan_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
2,Washington Heights,Gym,Yoga Studio,Health & Beauty Service,Gym Pool,Gym / Fitness Center,5
4,Hamilton Heights,Yoga Studio,Vegetarian / Vegan Restaurant,Gym / Fitness Center,Farmers Market,Coffee Shop,5
5,Manhattanville,Gym,Climbing Gym,Yoga Studio,College Gym,Gym Pool,5
8,Upper East Side,Gym,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Martial Arts Dojo,Yoga Studio,5
11,Roosevelt Island,Gym / Fitness Center,Yoga Studio,Health & Beauty Service,Gym Pool,Gym,5
12,Upper West Side,Gym / Fitness Center,Gym,Trail,Boxing Gym,Yoga Studio,5
13,Lincoln Square,Gym,Gym / Fitness Center,Cycle Studio,Yoga Studio,Health & Beauty Service,5
16,Murray Hill,Gym,Gym / Fitness Center,Yoga Studio,Vegetarian / Vegan Restaurant,Medical Center,5
18,Greenwich Village,Gym / Fitness Center,Yoga Studio,Juice Bar,Pilates Studio,Vegetarian / Vegan Restaurant,5
27,Gramercy,Gym,Gym / Fitness Center,Yoga Studio,Vegetarian / Vegan Restaurant,Trail,5


Examining the clusters, we see that Cluster 0 and Cluster 5 have neighborhoods that have a good mix of the type of businesses and facilities that we were looking for. So, we can pick a neighborhood in these 2 clusters as location for the vegan restaurant in Manhattan.

Next, we will explore neighborhoods in Brooklyn

In [126]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
#brooklyn_data.head()

In [127]:
brooklyn_venues = getNearbyVenues(names = brooklyn_data['Neighborhood'],
                                   latitudes = brooklyn_data['Latitude'],
                                   longitudes = brooklyn_data['Longitude'])

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [128]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(204, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Brooklyn Market,40.626939,-74.029948,Grocery Store
1,Bay Ridge,40.625801,-74.030621,Crossfit Bridge & Tunnel: Bay Ridge,40.62408,-74.030963,Gym / Fitness Center
2,Bay Ridge,40.625801,-74.030621,Dahn Yoga,40.626374,-74.029975,Yoga Studio
3,Bay Ridge,40.625801,-74.030621,Walk For Choice,40.624844,-74.032347,Trail
4,Bay Ridge,40.625801,-74.030621,Bay Ridge Crossfit,40.624143,-74.030823,Gym / Fitness Center


In [129]:
brooklyn_venues.groupby('Neighborhood').count()
print(brooklyn_venues.groupby('Neighborhood').count().shape)

(48, 6)


In [130]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 28 uniques categories.


In [131]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")   

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = brooklyn_onehot.columns.tolist()
cols.insert(0, cols.pop(cols.index('Neighborhood')))
brooklyn_onehot = brooklyn_onehot.reindex(columns= cols)
brooklyn_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Boxing Gym,Café,Climbing Gym,Cycle Studio,Dance Studio,Farmers Market,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health Food Store,Italian Restaurant,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Office,Park,Pharmacy,Pilates Studio,Residential Building (Apartment / Condo),Supplement Shop,Trail,Vegetarian / Vegan Restaurant,Weight Loss Center,Yoga Studio
0,Bay Ridge,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bay Ridge,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
4,Bay Ridge,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [132]:
brooklyn_onehot.shape

(204, 29)

In [133]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Boxing Gym,Café,Climbing Gym,Cycle Studio,Dance Studio,Farmers Market,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health Food Store,Italian Restaurant,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Office,Park,Pharmacy,Pilates Studio,Residential Building (Apartment / Condo),Supplement Shop,Trail,Vegetarian / Vegan Restaurant,Weight Loss Center,Yoga Studio
0,Bay Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667
1,Bedford Stuyvesant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Boerum Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.25,0.0,0.125
3,Borough Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Brighton Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Brooklyn Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.375
6,Brownsville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
7,Bushwick,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333
8,Carroll Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.111111,0.0,0.222222
9,Clinton Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667


In [134]:
brooklyn_grouped.shape

(48, 29)

In [135]:
num_top_venues = 5

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bay Ridge----
                  venue  freq
0  Gym / Fitness Center  0.33
1           Yoga Studio  0.17
2                 Trail  0.17
3         Grocery Store  0.17
4                   Gym  0.17


----Bedford Stuyvesant----
                  venue  freq
0                   Gym  0.67
1  Gym / Fitness Center  0.33
2   American Restaurant  0.00
3     Martial Arts Dojo  0.00
4    Weight Loss Center  0.00


----Boerum Hill----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.25
1                    Yoga Studio  0.12
2                 Farmers Market  0.12
3                       Pharmacy  0.12
4           Gym / Fitness Center  0.12


----Borough Park----
                           venue  freq
0                 Farmers Market   1.0
1            American Restaurant   0.0
2              Martial Arts Dojo   0.0
3             Weight Loss Center   0.0
4  Vegetarian / Vegan Restaurant   0.0


----Brighton Beach----
                 venue  freq
0       Farmers Market   0

In [136]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brooklyn_venues_sorted = pd.DataFrame(columns=columns)
brooklyn_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    brooklyn_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

brooklyn_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bay Ridge,Gym / Fitness Center,Yoga Studio,Trail,Grocery Store,Gym
1,Bedford Stuyvesant,Gym,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports
2,Boerum Hill,Vegetarian / Vegan Restaurant,Yoga Studio,Pharmacy,Office,Middle Eastern Restaurant
3,Borough Park,Farmers Market,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym
4,Brighton Beach,Farmers Market,Gym,Yoga Studio,Weight Loss Center,Athletics & Sports
5,Brooklyn Heights,Yoga Studio,Vegetarian / Vegan Restaurant,Pilates Studio,Farmers Market,Medical Center
6,Brownsville,Trail,Farmers Market,Yoga Studio,Gymnastics Gym,Athletics & Sports
7,Bushwick,Yoga Studio,Dance Studio,Gym,Weight Loss Center,Athletics & Sports
8,Carroll Gardens,Yoga Studio,Pilates Studio,Farmers Market,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant
9,Clinton Hill,Yoga Studio,Martial Arts Dojo,Gymnastics Gym,Athletics & Sports,Boxing Gym


In [137]:
# set number of clusters
kclusters = 6

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,init='random',n_init=100,random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
#print(kmeans.labels_.size)

brooklyn_merged = brooklyn_data
brooklyn_data.shape

# merge brooklyn_grouped with brooklyn_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(brooklyn_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
#drop rows with NaN values
brooklyn_merged.dropna(axis=0,how='any',inplace=True)

# add clustering labels
brooklyn_merged['Cluster Labels'] = kmeans.labels_
brooklyn_merged = brooklyn_merged.reset_index(drop=True)

brooklyn_merged.head() # check the last columns!
brooklyn_merged.to_csv("brooklyn.csv")

Visualize the clusters

In [138]:
# create map
map_clusters = folium.Map(location=[brooklyn_latitude, brooklyn_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine the clusters

In [139]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]] 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
0,Bay Ridge,Gym / Fitness Center,Yoga Studio,Trail,Grocery Store,Gym,0
2,Greenpoint,Gym / Fitness Center,Yoga Studio,Gymnastics Gym,Vegetarian / Vegan Restaurant,Farmers Market,0
5,Manhattan Terrace,Vegetarian / Vegan Restaurant,Yoga Studio,Gymnastics Gym,Athletics & Sports,Boxing Gym,0
7,Windsor Terrace,Vegetarian / Vegan Restaurant,Dance Studio,Yoga Studio,Gymnastics Gym,Athletics & Sports,0
8,Prospect Heights,Yoga Studio,Farmers Market,Gym / Fitness Center,Weight Loss Center,Athletics & Sports,0
10,Williamsburg,Gym,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,0
13,Brooklyn Heights,Yoga Studio,Vegetarian / Vegan Restaurant,Pilates Studio,Farmers Market,Medical Center,0
14,Cobble Hill,Athletics & Sports,Pilates Studio,Yoga Studio,Gymnastics Gym,Boxing Gym,0
21,Starrett City,Weight Loss Center,Gym,Gym / Fitness Center,Gym Pool,Yoga Studio,0
24,Gerritsen Beach,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,0


In [140]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
1,Sunset Park,Supplement Shop,Yoga Studio,Gymnastics Gym,Athletics & Sports,Boxing Gym,1
15,Carroll Gardens,Yoga Studio,Pilates Studio,Farmers Market,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,1
18,Fort Greene,Yoga Studio,Farmers Market,Weight Loss Center,Athletics & Sports,Boxing Gym,1
20,Cypress Hills,Farmers Market,Martial Arts Dojo,Yoga Studio,Gymnastics Gym,Athletics & Sports,1
31,Georgetown,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,1
39,Paerdegat Basin,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,Café,1
44,Dumbo,Gym,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Yoga Studio,Martial Arts Dojo,1
46,Highland Park,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,1


In [141]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
16,Red Hook,Yoga Studio,Gym / Fitness Center,Martial Arts Dojo,Gymnastics Gym,Athletics & Sports,2
29,Prospect Lefferts Gardens,Yoga Studio,Vegetarian / Vegan Restaurant,Gym,Gymnastics Gym,Athletics & Sports,2
47,Erasmus,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,Café,2


In [142]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
9,Brownsville,Trail,Farmers Market,Yoga Studio,Gymnastics Gym,Athletics & Sports,3
17,Gowanus,Yoga Studio,Gym,Martial Arts Dojo,Gymnastics Gym,Athletics & Sports,3
19,Park Slope,Yoga Studio,Gym,Gym / Fitness Center,Weight Loss Center,Athletics & Sports,3
34,South Side,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Gym,Yoga Studio,Pilates Studio,3


In [143]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
3,Gravesend,Martial Arts Dojo,Gym,Gym / Fitness Center,Yoga Studio,Gymnastics Gym,4
4,Brighton Beach,Farmers Market,Gym,Yoga Studio,Weight Loss Center,Athletics & Sports,4
6,Flatbush,Gym,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,4
11,Bushwick,Yoga Studio,Dance Studio,Gym,Weight Loss Center,Athletics & Sports,4
40,Mill Basin,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,4


In [144]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 5, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
12,Bedford Stuyvesant,Gym,Gym / Fitness Center,Yoga Studio,Weight Loss Center,Athletics & Sports,5
22,Borough Park,Farmers Market,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,5
23,Dyker Heights,Gym,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,5
27,Downtown,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Gym,Supplement Shop,Residential Building (Apartment / Condo),5
30,Midwood,Gym,Yoga Studio,Weight Loss Center,Athletics & Sports,Boxing Gym,5
32,East Williamsburg,Vegetarian / Vegan Restaurant,Café,Yoga Studio,Gymnastics Gym,Athletics & Sports,5


Examining the clusters, we see that Cluster 0 and Cluster 5 have neighborhoods that have a good mix of the type of businesses and facilities that we were looking for. So, we can pick a neighborhood in these 3 clusters as location for the vegan restaurant in Brooklyn.