# Data

According to the project definition, these are the properties we are looking for in our data:

* Richest borough
* Population data of New York City
* Busy entertainment area
* Fewest number of theatres in the borough (ie low competition)

**Richest borough**
- Kaggle Census data from 2017-08-04

In [19]:
import numpy as np
import pandas as pd

In [20]:
nyc_census_data = pd.read_csv("nyc_census_tracts.csv")
nyc_census_data.head()

Unnamed: 0,CensusTract,County,Borough,TotalPop,Men,Women,Hispanic,White,Black,Native,...,Walk,OtherTransp,WorkAtHome,MeanCommute,Employed,PrivateWork,PublicWork,SelfEmployed,FamilyWork,Unemployment
0,36005000100,Bronx,Bronx,7703,7133,570,29.9,6.1,60.9,0.2,...,,,,,0,,,,,
1,36005000200,Bronx,Bronx,5403,2659,2744,75.8,2.3,16.0,0.0,...,2.9,0.0,0.0,43.0,2308,80.8,16.2,2.9,0.0,7.7
2,36005000400,Bronx,Bronx,5915,2896,3019,62.7,3.6,30.7,0.0,...,1.4,0.5,2.1,45.0,2675,71.7,25.3,2.5,0.6,9.5
3,36005001600,Bronx,Bronx,5879,2558,3321,65.1,1.6,32.4,0.0,...,8.6,1.6,1.7,38.8,2120,75.0,21.3,3.8,0.0,8.7
4,36005001900,Bronx,Bronx,2591,1206,1385,55.4,9.0,29.0,0.0,...,3.0,2.4,6.2,45.4,1083,76.8,15.5,7.7,0.0,19.2


- Borough cordinates(using json file)

In [21]:
import json # library to handle JSON files

In [22]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [18]:
# newyork_data

In [23]:
neighborhoods_data = newyork_data['features']

In [24]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [25]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [26]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [27]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


**Population data of New York City**
- **Folium map**
-This is a Library use to visualize interactive geographical plots(example below)

In [28]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library

In [29]:
# Get Latitude and Longitude of New York city
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [30]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
display(map_newyork)

In [31]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
display(map_newyork)

**Busy entertainment area**
- For this Data I will be using the **Foursquare API**

From **Foursquare API documentation**, we can find the corresponding movie theater category in **Venue Categories**. The corresponding ID of **Movie Theater** in Foursquare API is *4bf58dd8d48988d17f941735* which is under **Arts & Entertainment main category**. It contains several sub-categories:
- **Drive-in Theater, id**: *56aa371be4b08b9a8d5734de*
- **Indie Movie Theater, id**: *4bf58dd8d48988d17e941735*
- **Multiplex, id**: *4bf58dd8d48988d180941735*

In [1]:
# Fouresquare API details
CLIENT_ID = 'Foursquare ID' 
CLIENT_SECRET = 'your Foursquare Secret'  
ACCESS_TOKEN = 'FourSquare Access Token'  
VERSION = '20180604'
LIMIT = 30

In [3]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Getting coodinates for New York City 
address = 'New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7127281 -74.0060152


In [4]:
categories = '4bf58dd8d48988d17f941735'  #movie theatre ID
radius = 600 # in meters
lat = latitude
lon = longitude

In [5]:
# URL structure for API request
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, categories, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=21RKV43CS5KKJO3Z33IC1SQNCRKPO4KRQJYBDCSGIOSCITJQ&client_secret=OP1Y21GSFRWX1EGMOA41UO1UNL3JNIGNUFSC3VD1T25WBA5W&v=20180604&ll=40.7127281,-74.0060152&categoryId=4bf58dd8d48988d17f941735&radius=600&limit=30'

In [8]:
import requests # library to handle requests

# API GET request
results = requests.get(url).json()

In [10]:
# Shows request status
results["meta"]

{'code': 200, 'requestId': '609799fb1743541312221533'}

In [11]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

In [12]:
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,4e820ced8231f06792d47a4f,Blue Bloods Stage,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",v-1620548091,False,80 Centre St,40.715096,-74.001175,"[{'label': 'display', 'lat': 40.71509552001953...",486,10013.0,US,New York,NY,United States,"[80 Centre St, New York, NY 10013, United States]",
1,4cdd16f2cbed60fc34634f9f,Nevada Theater,"[{'id': '4bf58dd8d48988d180941735', 'name': 'M...",v-1620548091,False,Waterside Park,40.714354,-74.005972,"[{'label': 'display', 'lat': 40.714354004484, ...",181,10011.0,US,New York,NY,United States,"[Waterside Park (37 West 17th Street), New Yor...",37 West 17th Street
2,4b2d77bcf964a52051d724e3,embassy theaters,"[{'id': '4bf58dd8d48988d180941735', 'name': 'M...",v-1620548091,False,qmex center,40.714371,-74.005996,"[{'label': 'display', 'lat': 40.714371, 'lng':...",182,,US,New York,NY,United States,"[qmex center (west side highway), New York, NY...",west side highway
3,4ec4f55029c224ea7c2f1d1c,Online Theatre,"[{'id': '4bf58dd8d48988d17e941735', 'name': 'I...",v-1620548091,False,,40.712696,-74.012262,"[{'label': 'display', 'lat': 40.71269589839951...",527,,US,,New York,United States,"[New York, United States]",
4,4d4ac4a094ab2c0f4230250c,photo,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",v-1620548091,False,jl sidodadi,40.715275,-74.004911,"[{'label': 'display', 'lat': 40.71527533835294...",298,,US,pekanbaru,NY,United States,"[jl sidodadi, pekanbaru, NY, United States]",


In [15]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]


In [17]:
dataframe_filtered.head()

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,id
0,Blue Bloods Stage,Movie Theater,80 Centre St,40.715096,-74.001175,"[{'label': 'display', 'lat': 40.71509552001953...",486,10013.0,US,New York,NY,United States,"[80 Centre St, New York, NY 10013, United States]",,4e820ced8231f06792d47a4f
1,Nevada Theater,Multiplex,Waterside Park,40.714354,-74.005972,"[{'label': 'display', 'lat': 40.714354004484, ...",181,10011.0,US,New York,NY,United States,"[Waterside Park (37 West 17th Street), New Yor...",37 West 17th Street,4cdd16f2cbed60fc34634f9f
2,embassy theaters,Multiplex,qmex center,40.714371,-74.005996,"[{'label': 'display', 'lat': 40.714371, 'lng':...",182,,US,New York,NY,United States,"[qmex center (west side highway), New York, NY...",west side highway,4b2d77bcf964a52051d724e3
3,Online Theatre,Indie Movie Theater,,40.712696,-74.012262,"[{'label': 'display', 'lat': 40.71269589839951...",527,,US,,New York,United States,"[New York, United States]",,4ec4f55029c224ea7c2f1d1c
4,photo,Movie Theater,jl sidodadi,40.715275,-74.004911,"[{'label': 'display', 'lat': 40.71527533835294...",298,,US,pekanbaru,NY,United States,"[jl sidodadi, pekanbaru, NY, United States]",,4d4ac4a094ab2c0f4230250c


**Fewest number of theatres in the borough** 
- For this Data I will be using the **Foursquare API venue** tag