# New theater location selection in NYC
## INTRODUCTION
### This project would like to investigate the potential location of an invested theater location in NYC. In order to obtain the potential places, the clustering technique will be applied to rank venues especially to discover where the theaters have been located. Finally choosing the ranking no. 1 venues as the aiming location. 
## Data
### The Foursquare location data will be applied to complete the "New theater location selection in NYC" project. The Manhattan county will be the target town and be explored from the Foursquare developer website. The data were chosen because 1) the convenience to obtain the data and 2) updated and precise venue information could be received. For example, the regular search with Foursquare can be used  to explore Manhattan, NY.# New theater location selection in NYC

### import libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Dataset

In [5]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [6]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [10]:
 newyork_data # the output too long , don't render this cell

In [11]:
neighborhoods_data = newyork_data['features']

In [12]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [13]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [14]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [17]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] # which means 'borough' in 'properties' set
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

For illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [77]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Let's get the geographical coordinates of Manhattan.

In [76]:
address = 'New York City'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of NYC are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of NYC are 40.7127281, -74.0060152.


Let's visualizat Manhattan the neighborhoods in it.

In [20]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's latitude and longitude values.

### Let's search all theaters in New York City, NY

In [89]:
CLIENT_ID = 'JTBDLVXOVXOMK5C2CIJZVEL4XLRNIIHOYFSAWIVIKA5DXSBV' # your Foursquare ID
CLIENT_SECRET = 'EE2RCSEVPT21M33MIWBNUDLK23VVPGJEDOQ5VF5EUMEKRDLE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
search_query = 'theater'
radius = 15000
LIMIT=30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
url = 'https://api.foursquare.com/v2/venues/search?client_id=JTBDLVXOVXOMK5C2CIJZVEL4XLRNIIHOYFSAWIVIKA5DXSBV&client_secret=EE2RCSEVPT21M33MIWBNUDLK23VVPGJEDOQ5VF5EUMEKRDLE&ll=40.7127281, -74.0060152&v=20180605&query=theater&radius=15000&limit=30'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()
results
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Your credentails:
CLIENT_ID: JTBDLVXOVXOMK5C2CIJZVEL4XLRNIIHOYFSAWIVIKA5DXSBV
CLIENT_SECRET:EE2RCSEVPT21M33MIWBNUDLK23VVPGJEDOQ5VF5EUMEKRDLE


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood
0,4ea89e6702d5b8174e85e65e,20 Thomas St. (The Flea Theater),"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",v-1594642768,False,20 Thomas St,40.716161,-74.005826,"[{'label': 'display', 'lat': 40.71616108888684...",382,10007,US,New York,NY,United States,"[20 Thomas St, New York, NY 10007, United States]",,,
1,4bf470e194af2d7f0a9e3a72,Schaeberle Studio Theater,"[{'id': '4bf58dd8d48988d1ac941735', 'name': 'C...",v-1594642768,False,41 Park Row,40.711158,-74.005322,"[{'label': 'display', 'lat': 40.71115829255644...",184,10038,US,New York,NY,United States,"[41 Park Row (at Pace University), New York, N...",at Pace University,,
2,4be070a3652b0f47c7727211,Theater Bar,"[{'id': '4bf58dd8d48988d11e941735', 'name': 'C...",v-1594642768,False,114 Franklin St,40.718688,-74.0063,"[{'label': 'display', 'lat': 40.71868815941004...",663,10013,US,New York,NY,United States,"[114 Franklin St (btw W Bway & Church St), New...",btw W Bway & Church St,,
3,4b032755f964a5200f4d22e3,iHeartRadio Theater,"[{'id': '4bf58dd8d48988d1e5931735', 'name': 'M...",v-1594642768,False,32 Avenue of the Americas,40.720312,-74.00515,"[{'label': 'display', 'lat': 40.72031234030413...",847,10013,US,New York,NY,United States,"[32 Avenue of the Americas (at Walker St), New...",at Walker St,32694113.0,
4,4c3d0ec67c1ee21e017e8d71,Royal Theater Cinema,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",v-1594642768,False,63 Wall St,40.705627,-74.008774,"[{'label': 'display', 'lat': 40.70562744140625...",824,10005,US,New York,NY,United States,"[63 Wall St (Pearl St), New York, NY 10005, Un...",Pearl St,,


In [90]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,20 Thomas St. (The Flea Theater),Theater,20 Thomas St,40.716161,-74.005826,"[{'label': 'display', 'lat': 40.71616108888684...",382,10007.0,US,New York,NY,United States,"[20 Thomas St, New York, NY 10007, United States]",,,4ea89e6702d5b8174e85e65e
1,Schaeberle Studio Theater,College Theater,41 Park Row,40.711158,-74.005322,"[{'label': 'display', 'lat': 40.71115829255644...",184,10038.0,US,New York,NY,United States,"[41 Park Row (at Pace University), New York, N...",at Pace University,,4bf470e194af2d7f0a9e3a72
2,Theater Bar,Cocktail Bar,114 Franklin St,40.718688,-74.0063,"[{'label': 'display', 'lat': 40.71868815941004...",663,10013.0,US,New York,NY,United States,"[114 Franklin St (btw W Bway & Church St), New...",btw W Bway & Church St,,4be070a3652b0f47c7727211
3,iHeartRadio Theater,Music Venue,32 Avenue of the Americas,40.720312,-74.00515,"[{'label': 'display', 'lat': 40.72031234030413...",847,10013.0,US,New York,NY,United States,"[32 Avenue of the Americas (at Walker St), New...",at Walker St,,4b032755f964a5200f4d22e3
4,Royal Theater Cinema,Movie Theater,63 Wall St,40.705627,-74.008774,"[{'label': 'display', 'lat': 40.70562744140625...",824,10005.0,US,New York,NY,United States,"[63 Wall St (Pearl St), New York, NY 10005, Un...",Pearl St,,4c3d0ec67c1ee21e017e8d71
5,Theater for the New City,Theater,155 1st Ave,40.728612,-73.984681,"[{'label': 'display', 'lat': 40.72861171634928...",2523,10003.0,US,New York,NY,United States,"[155 1st Ave (Between 9th and 10th Streets), N...",Between 9th and 10th Streets,,431f7f00f964a5208c271fe3
6,Theater District,Neighborhood,,40.758912,-73.985715,,5418,10036.0,US,New York,NY,United States,"[New York, NY 10036, United States]",,,4edab7350e011b46eed5778a
7,Peter Jay Sharp Theater,Theater,235 Bowery,40.722347,-73.992844,"[{'label': 'display', 'lat': 40.72234725952148...",1543,10002.0,US,New York,NY,United States,"[235 Bowery, New York, NY 10002, United States]",,,5d30f415d6746d0008b3918b
8,LES* Drive-In Movie Theater,Movie Theater,88 Ludlow,40.718087,-73.989058,"[{'label': 'display', 'lat': 40.71808685300118...",1550,10002.0,US,New York,NY,United States,"[88 Ludlow (Ludlow and Broom), New York, NY 10...",Ludlow and Broom,,51be383f501974b71a897e60
9,The Public Theater,Theater,425 Lafayette St,40.729169,-73.99207,"[{'label': 'display', 'lat': 40.72916880904384...",2175,10003.0,US,New York,NY,United States,"[425 Lafayette St (btwn Astor Pl & E 4th St), ...",btwn Astor Pl & E 4th St,,4a07986bf964a52087731fe3


In [91]:
dataframe_filtered.name

0         20 Thomas St. (The Flea Theater)
1                Schaeberle Studio Theater
2                              Theater Bar
3                      iHeartRadio Theater
4                     Royal Theater Cinema
5                 Theater for the New City
6                         Theater District
7                  Peter Jay Sharp Theater
8              LES* Drive-In Movie Theater
9                       The Public Theater
10                            Hulu Theater
11                          Nevada Theater
12                       St Venice Theater
13                       Delacorte Theater
14                        Connelly Theater
15          iPic Theaters at Fulton Market
16                         Pioneer Theater
17        Russian American Musical Theater
18          Public Theater Conference Room
19                       Theater 219 (VPR)
20                     Robert Moss Theater
21                           Galli Theater
22                       Time Inc. Theater
23    IRT (

### Visualize theater places in Manhattan

In [92]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) 

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Based on the map, the planned theater should be placed near NYC City Hall because this area has the most theaters located and a highly possible place people would like to visit to enjoy the performing arts compared to other locations.