# Introduction and Business Problem Statement


Manhattan (New York City) is the most populous city in the United States. As a part of the Coursera Applied Data Science Capstone Project by IBM, we are going to examine the Manhattan drinks venues dataset and determine the optimal location.

Our business stakeholders are interested in opening a food or drin place in Manhattan but is also worried about the current pandemic. Therefore the stakeholder has reached out and ask us to research the market and suggest one type of venue to open in Manhattan. According to the Yelp report in September, there were 32,109 closures as of August 31, with 19,590 restaurants across nation have permanently shuttered their doors since March. Yet, there are still new restaurants opening their door against the pandemic. Many studies has found that restaurants work well for delivery and takeout have been able to keep their closure rates lower than others, including food trucks, bakeries and coffee shops. 

Hence, we would like to suggest open a drink venue for our stakeholder.The aim of this project is to provide an optimal location to open a drink venue in New York City under COVID. In this report, we will focus on all neighbourhoods in Manhattan.


Data
1. NYC Boroughs/Neighborhood Geospatial Dataset 
2. Foursquare venue data through the Foursquare API


Let's get a brief overview of the structure of New York City.


# Section 1. Explore the Manhanttan neighborhoods dataset and Foursquare

# 1.1 Manhanttan Neighborhoods

In [2]:
# import all the required libraries
import pandas as pd
import numpy as np
import geopy
import requests
from geopy.geocoders import Nominatim
import json # library to handle JSON files
import wget

print('Libraries imported')

Libraries imported


In [3]:
# Import Folium to display maps
import folium
print('Folium Library imported')

Folium Library imported


In [4]:
# Let's download and explore the above mentioned datasets
# Open the json file containing NYC data and display a feature
with open ('C:\\Users\\czhang\\Coursera\\newyork_data.json') as json_data:
    newyork_data = json.load(json_data) 

In [5]:
#newyork_data

In [6]:
# define the dataframe columns
column_names = ['Borough','Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
neighborhoods_data=newyork_data['features']
#neighborhoods_data

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [9]:
## Storing only the required data in a dataframe from above dictionaries
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = neighborhood_name = data['properties']['name'] 
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
#quickly examine the resulting dataframe
#neighborhoods.Borough.unique()
#neighborhoods = neighborhoods['Borough']=='Manhattan'
#neighborhoods.head()

In [11]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [12]:
manhattan_neighborhoods = neighborhoods.loc[neighborhoods['Borough'] == 'Manhattan']
manhattan_neighborhoods = manhattan_neighborhoods.reset_index(drop=True)
manhattan_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [13]:
#neighborhoods.shape
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(manhattan_neighborhoods['Borough'].unique()),
        manhattan_neighborhoods.shape[0]
    )
)

The dataframe has 1 boroughs and 40 neighborhoods.


In [14]:
#use geopy library to get the latitude and longitude values of new york city
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [15]:
#create a map of new york using the above coordinates
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_neighborhoods['Latitude'], manhattan_neighborhoods['Longitude'], manhattan_neighborhoods['Borough'], manhattan_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

# 1.2 Foursquare API 

Now we are going to utilize the Foursquare API to explore the neighborhoods and segment them.

From Foursquare API documentation, we can find the corresponding coffee shop or cafe in their Venue Categories. The corresponding ID of coffee shop in Foursquare is 4bf58dd8d48988d1e0931735 which is under Restaurant main category. There are  several other categories also inclduing coffee services. 

Tea Room: 4bf58dd8d48988d1dc931735

Bubble Tea Shop: 52e81612bcbc57f1066b7a0c

In [16]:
## Setup credentials for Foursquare API
CLIENT_ID = 'IQS5VBR3AHE155RKVVRSWTJLEYZO2EMLTUFVTUDA02VKO0CW' #Foursquare ID
CLIENT_SECRET = 'RCMZK02KIKE1LCT2NS1BIXIBFQKYYMEVMKAYEK12AP4MRPNY' #Foursquare Secret
VERSION = '20201029' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IQS5VBR3AHE155RKVVRSWTJLEYZO2EMLTUFVTUDA02VKO0CW
CLIENT_SECRET:RCMZK02KIKE1LCT2NS1BIXIBFQKYYMEVMKAYEK12AP4MRPNY


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            categoryId,
            LIMIT)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# 1.2.1 Create a Bubble Tea Shop Venues List

In [18]:
## Create a function for requesting and storing nearby bubble tea Venues for each neighborhood

LIMIT = 100
radius = 500
categoryId = '52e81612bcbc57f1066b7a0c' #bubble tea

bubbletea_venues = getNearbyVenues(names=manhattan_neighborhoods['Neighborhood'],
                                   latitudes=manhattan_neighborhoods['Latitude'],
                                   longitudes=manhattan_neighborhoods['Longitude']
                                  )
print('Neighborhood Venues Downloaded')


Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
Neighborhood Venues Downloaded


In [19]:
#bubbletea_venues.head()
bubbletea_venues.shape

(255, 7)

# 1.2.2 Create a Coffee Shop Venues List

In [20]:
## Create a function for requesting and storing nearby coffee shop Venues for each neighborhood

LIMIT = 100
radius = 500
categoryId = '4bf58dd8d48988d1e0931735 ' #coffee shop

coffee_venues = getNearbyVenues(names=manhattan_neighborhoods['Neighborhood'],
                                   latitudes=manhattan_neighborhoods['Latitude'],
                                   longitudes=manhattan_neighborhoods['Longitude']
                                  )
print('Neighborhood Venues Downloaded')

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
Neighborhood Venues Downloaded


In [21]:
coffee_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
1,Marble Hill,40.876551,-73.91066,Starbucks,40.873755,-73.908613,Coffee Shop
2,Marble Hill,40.876551,-73.91066,Starbucks,40.873234,-73.90873,Coffee Shop
3,Chinatown,40.715618,-73.994279,Yaya Tea Garden,40.716177,-73.994902,Tea Room
4,Chinatown,40.715618,-73.994279,Little Canal,40.714317,-73.990361,Coffee Shop


# 1.2.3 Create a Tea Shop Venues List

In [22]:
## Create a function for requesting and storing nearby tea shop Venues for each neighborhood

LIMIT = 100
radius = 500
categoryId = '4bf58dd8d48988d1dc931735 ' #tea shop

teashop_venues = getNearbyVenues(names=manhattan_neighborhoods['Neighborhood'],
                                   latitudes=manhattan_neighborhoods['Latitude'],
                                   longitudes=manhattan_neighborhoods['Longitude']
                                  )
print('Neighborhood Venues Downloaded')

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
Neighborhood Venues Downloaded


In [23]:
#teashop_venues.head()
teashop_venues.shape

(149, 7)

# 1.2.4 Combine All Three Venues List Together

In this section, we are going to combine all three venues together and clean it up. 

In [24]:
manhattan_venues = pd.concat([bubbletea_venues,coffee_venues,teashop_venues] )

In [25]:
manhattan_venues['Venue Category'].unique()

array(['Bubble Tea Shop', 'Tea Room', 'Chinese Restaurant', 'Bakery',
       'Juice Bar', 'Ice Cream Shop', 'Dessert Shop', 'Café',
       'Cha Chaan Teng', 'Dim Sum Restaurant', 'Dumpling Restaurant',
       'Asian Restaurant', 'Frozen Yogurt Shop', 'Japanese Restaurant',
       'Vietnamese Restaurant', 'Noodle House', 'Cafeteria',
       'Coffee Shop', 'Sandwich Place', 'Food Truck',
       'Indian Chinese Restaurant', 'Pharmacy', 'Cocktail Bar',
       'Accessories Store', 'Gourmet Shop'], dtype=object)

# Section 2. Analyzing Data

# 2.1 Data Cleansing 

After merging all three different venues together, we can see there are still many different combination for the drink venues/restaurants. In the next following steps, we are going to examine the data and only include top five shops (i.e. coffee, tea, cafe, bakery and bubble tea shops). 

In [26]:
d1 = manhattan_venues.groupby(['Venue Category']).size().reset_index(name='Count').sort_values(['Count'], ascending = False)
d1.head(5)

Unnamed: 0,Venue Category,Count
9,Coffee Shop,711
5,Café,627
23,Tea Room,233
3,Bubble Tea Shop,147
2,Bakery,21


Hence, we select the top 3 drinks only category 'Coffee Shop','Tea Room','Bubble Tea Shop' based from the above information.We are excluding cafe shops here because it's not a beverage only type. 

In [28]:
drinks_venues = manhattan_venues.loc[manhattan_venues['Venue Category'].isin(['Tea Room','Coffee Shop','Bubble Tea Shop','Café','Bakery'])]
drinks_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chinatown,40.715618,-73.994279,Teado,40.717184,-73.994883,Bubble Tea Shop
1,Chinatown,40.715618,-73.994279,Kung Fu Tea (功夫茶),40.717119,-73.994562,Bubble Tea Shop
2,Chinatown,40.715618,-73.994279,Vivi Bubble Tea,40.715353,-73.997424,Bubble Tea Shop
3,Chinatown,40.715618,-73.994279,Ten Ren's Tea Time,40.716323,-73.998039,Bubble Tea Shop
4,Chinatown,40.715618,-73.994279,Yaya Tea Garden,40.716177,-73.994902,Tea Room


Let's show the locations of these venues in the Manhantta Island.

In [29]:
#create a map of new york using the above coordinates
map_drinkvenues = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng,neighborhood, category in zip(drinks_venues['Venue Latitude'], 
                                 drinks_venues['Venue Longitude'], 
                                 drinks_venues['Neighborhood'],
                                 drinks_venues['Venue Category']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    if category == 'Tea Room':
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='green',
            fill=True,
            fill_color='#78f76d',
            fill_opacity=0.7,
            parse_html=False).add_to(map_drinkvenues)
    if category == 'Bubble Tea Shop':
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_drinkvenues)
    if category == 'Coffee Shop':
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='yellow',
            fill=True,
            fill_color='#fcba03',
            fill_opacity=0.7,
            parse_html=False).add_to(map_drinkvenues)
    if category == 'Café':
         folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='red',
            fill=True,
            fill_color='#ff3355',
            fill_opacity=0.7,
            parse_html=False).add_to(map_drinkvenues)
    if category == 'Bakery':
         folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='purple',
            fill=True,
            fill_color='#f98fff',
            fill_opacity=0.7,
            parse_html=False).add_to(map_drinkvenues)    
map_drinkvenues

In [30]:
drinks_venues['Venue'].unique()

array(['Teado', 'Kung Fu Tea (功夫茶)', 'Vivi Bubble Tea', ..., 'Tea Magic',
       'Jou Jou Cafe - Financial District',
       'Radiance - Fine Asian Cuisine'], dtype=object)

# 2.2 One Hot Encoding

In [31]:
neighborhoods_onehot = pd.get_dummies(drinks_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neighborhoods_onehot['Neighborhood'] = drinks_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neighborhoods_onehot.columns[-1]] + list(neighborhoods_onehot.columns[:-1])
neighborhoods_onehot = neighborhoods_onehot[fixed_columns]

neighborhoods_onehot.head()

Unnamed: 0,Neighborhood,Bakery,Bubble Tea Shop,Café,Coffee Shop,Tea Room
0,Chinatown,0,1,0,0,0
1,Chinatown,0,1,0,0,0
2,Chinatown,0,1,0,0,0
3,Chinatown,0,1,0,0,0
4,Chinatown,0,0,0,0,1


In [32]:
neighborhoods_onehot.shape

(1739, 6)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
drinks_grouped = neighborhoods_onehot.groupby('Neighborhood').mean().reset_index()
drinks_grouped

Unnamed: 0,Neighborhood,Bakery,Bubble Tea Shop,Café,Coffee Shop,Tea Room
0,Battery Park City,0.0,0.030303,0.363636,0.545455,0.060606
1,Carnegie Hill,0.0,0.027778,0.416667,0.527778,0.027778
2,Central Harlem,0.0,0.0,0.6,0.4,0.0
3,Chelsea,0.0,0.0,0.372549,0.490196,0.137255
4,Chinatown,0.067669,0.233083,0.195489,0.180451,0.323308
5,Civic Center,0.0,0.069767,0.325581,0.55814,0.046512
6,Clinton,0.0,0.054054,0.324324,0.540541,0.081081
7,East Harlem,0.0,0.0,0.714286,0.285714,0.0
8,East Village,0.0,0.092593,0.314815,0.37037,0.222222
9,Financial District,0.0,0.086022,0.290323,0.483871,0.139785


In [87]:
drinks_grouped.shape

(40, 4)

Create a datafrae from above data with ranking on top 3 venues

In [34]:
## Return most common venues for each neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
drinks_venues_sorted = pd.DataFrame(columns=columns)
drinks_venues_sorted['Neighborhood'] = drinks_grouped['Neighborhood']

for ind in np.arange(drinks_grouped.shape[0]):
    drinks_venues_sorted.iloc[ind, 1:] = return_most_common_venues(drinks_grouped.iloc[ind, :], num_top_venues)

drinks_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Battery Park City,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
1,Carnegie Hill,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
2,Central Harlem,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
3,Chelsea,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
4,Chinatown,Tea Room,Bubble Tea Shop,Café,Coffee Shop,Bakery


# 2.3 Clustering Data

We are going to apply K-mean clustering unsupervised ML algorithm to cluster the venues based on a list of locations for these three different types of drink venues. By doing this we will have a better understanding of the similarities and dissimilarities between the chosen neighborhoods to retrieve mroe insights.

In [36]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
print('Libraries Imported')

Libraries Imported


In [37]:
# set number of clusters
kclusters = 6

neighborhoods_grouped_clustering = drinks_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoods_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 4, 2, 1, 1, 0, 4, 4])

In [38]:
# Code to enable display of all rows and columns in df
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_rows', None)

In [39]:
drinks_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Battery Park City,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
1,Carnegie Hill,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
2,Central Harlem,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
3,Chelsea,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
4,Chinatown,Tea Room,Bubble Tea Shop,Café,Coffee Shop,Bakery


In [40]:
#drinks_venues_sorted = drinks_venues_sorted.drop(['Cluster Labels'], axis=1)
#drinks_venues_sorted.head(10)

#add clustering labels
drinks_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
drinks_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Battery Park City,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
1,1,Carnegie Hill,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
2,0,Central Harlem,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
3,4,Chelsea,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
4,2,Chinatown,Tea Room,Bubble Tea Shop,Café,Coffee Shop,Bakery


In [41]:
neighborhoods_merged = manhattan_neighborhoods

# merge neighborhoods_grouped with neighborhoods_data to add latitude/longitude for each neighborhood
neighborhoods_merged = neighborhoods_merged.join(drinks_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neighborhoods_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Coffee Shop,Tea Room,Café,Bubble Tea Shop,Bakery
1,Manhattan,Chinatown,40.715618,-73.994279,2,Tea Room,Bubble Tea Shop,Café,Coffee Shop,Bakery
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
3,Manhattan,Inwood,40.867684,-73.92121,0,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery


Create Map Cluster

In [42]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map from neighborhood_merged
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoods_merged['Latitude'], 
                                  neighborhoods_merged['Longitude'], 
                                  neighborhoods_merged['Neighborhood'], 
                                  neighborhoods_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Section 3. Results and Discussion

# 3.1 Explore Data in Each Cluster

Cluster One

In [43]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 0, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Washington Heights,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
3,Inwood,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
6,Central Harlem,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
7,East Harlem,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
13,Lincoln Square,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
36,Tudor City,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery


Cluster Two

In [44]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 1, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,Upper East Side,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
10,Lenox Hill,Coffee Shop,Café,Bubble Tea Shop,Tea Room,Bakery
11,Roosevelt Island,Coffee Shop,Café,Bubble Tea Shop,Tea Room,Bakery
12,Upper West Side,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
14,Clinton,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
15,Midtown,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
16,Murray Hill,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
28,Battery Park City,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
30,Carnegie Hill,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
32,Civic Center,Coffee Shop,Café,Bubble Tea Shop,Tea Room,Bakery


Cluster Three

In [45]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 2, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Chinatown,Tea Room,Bubble Tea Shop,Café,Coffee Shop,Bakery
20,Lower East Side,Café,Tea Room,Coffee Shop,Bubble Tea Shop,Bakery
22,Little Italy,Café,Coffee Shop,Bubble Tea Shop,Tea Room,Bakery
27,Gramercy,Café,Coffee Shop,Bubble Tea Shop,Tea Room,Bakery


Cluster Four

In [46]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 3, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Marble Hill,Coffee Shop,Tea Room,Café,Bubble Tea Shop,Bakery
37,Stuyvesant Town,Coffee Shop,Tea Room,Café,Bubble Tea Shop,Bakery


Cluster Five

In [47]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 4, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Hamilton Heights,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
9,Yorkville,Coffee Shop,Tea Room,Café,Bubble Tea Shop,Bakery
17,Chelsea,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
18,Greenwich Village,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
19,East Village,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
23,Soho,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
24,West Village,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
25,Manhattan Valley,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
29,Financial District,Coffee Shop,Café,Tea Room,Bubble Tea Shop,Bakery
31,Noho,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery


Cluster Six

In [48]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 5, 
                         neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Manhattanville,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
21,Tribeca,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
26,Morningside Heights,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
33,Midtown South,Café,Coffee Shop,Bubble Tea Shop,Tea Room,Bakery
35,Turtle Bay,Café,Coffee Shop,Tea Room,Bubble Tea Shop,Bakery
