# Capstone Project Battle of the Neighborhoods Part 2 #

## Introduction ##

For this project, I decided to examine New York City. I have a bit of a sweet tooth, as I know many others do as well, so I decided I would see which areas in New York would be best to visit to satisfy that craving. My plan is to map every candy store, ice cream stop, and chocolate shop I can find, then make clusters to map New York into sweet zones.

## The Data ##

This project will rely on venue data from Foursquare, along with the neighborhood data for New York. I'll perform a search for every type of sweet category I can find on foursquare on each neighborhood, then map all those locations. Once I have all this data, I'll make clusters to find the best areas.

## Methodology ##
First is mapping the neighborhoods of New York. Then getting Foursquare venue data on any shop likely to have sugary food. After that, is mapping the venues onto New York. A visual inspection shows quite a few clusters, but I don't have enough time to see the whole city, so we'll be limiting it to just five clusters that will put us in central locations and try to stay near one of these clusters when we visit. Because we're measuring distances, and we're local enough we don't need to worry about global curvature, we're going to use Kmeans to find mark out these clusters and find their center points. This forms nice blob-like clusters with centers gravitating towards denser locations. 

## The Results ##

Our analysis shows that the best spot, with in the densest cluster, is at NYU Langone Orthopedic Hospital in Manhattan! Who would have known? Well, we might have suspected, seeing as Manhattan has among the densest concentration of Neighborhood zones in the city. The runner up spots are La Tourette Park in Staten Island, an apartment complex to the east of Washington Cemetery, an apartment complex to the east of Willow Lake, and Public School 57 to the west of Bronx Park. Not very convenient, but the runner up cluster zones are much more spread out and cover a larger area.

## Discussion ##
There was consideration going into this of mapping out each type of store and performing separate analysis on each catagory and then mapping them all to look for overlapping clusters. But it turned out not to be such a great idea, as many venues from Foursquare fulfill the requirements for many categories and the data ended up being largely the same. After all, most shops that sell ice cream probably also sell smoothies and shakes, just like places that sell pies also sell cakes. In a perfect world, every venue would have an easily accessed online menu, in which case we could perform this analysis again looking for a single specific dessert and find the ultimate spot for collecting hot fudge sundaes! Alas, one can only dream. 

## Conclusion ##
We have now successfully mapped the sweet treats of New York into five zones, from the center each of which you will be able to satisfy your craving with minimal travel time to any location. We have also learned that Manhattan is clearly the best, most delicious part of the Big Apple. Perhaps future studies will cover entire states, to find the best cities to live in, or even entire countries. You are now armed with the proper knowledge to go out and develop tooth cavities in the most efficient manner possible (provided you are in New York City).

In [65]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [66]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Data downloaded!




The geograpical coordinate of New York City are 40.7308619, -73.9871558.


In [67]:
CLIENT_ID = 'YBJPVSL0GVYSJAZ2BDDAOQ4L1GFY353OAKQZ2S2DOFVPTTIG' # your Foursquare ID
CLIENT_SECRET = 'QNNKUWGTQLFEHZVEHG0E2ED3VTB1VC1XJEL5DU1DZTKGCYJR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YBJPVSL0GVYSJAZ2BDDAOQ4L1GFY353OAKQZ2S2DOFVPTTIG
CLIENT_SECRET:QNNKUWGTQLFEHZVEHG0E2ED3VTB1VC1XJEL5DU1DZTKGCYJR


In [68]:
#Sweet shop type list: Creperie, Dessert Shop (has subcatagories), donut shop, chocolate shop (not in food category), Smoothie Shop (not in food catagory),
desired_catagories = ['52e81612bcbc57f1066b79f2', '4bf58dd8d48988d1d0941735', '4bf58dd8d48988d148941735', '52f2ab2ebcbc57f1066b8b31', '52f2ab2ebcbc57f1066b8b41']
cat_list=','.join(desired_catagories)

In [69]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            cat_list)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                  'Neighborhood',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [70]:
newyork_sweets_venues = getNearbyVenues(
                                   names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

In [71]:
newyork_sweets_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Dunkin Donuts,40.890631,-73.849027,Donut Shop
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Co-op City,40.874294,-73.829939,Dunkin Donuts,40.869702,-73.828826,Donut Shop
4,Co-op City,40.874294,-73.829939,Baskin Robbins,40.870045,-73.829578,Ice Cream Shop


In [72]:
#remove neighborhood data
filter_c=['Venue','Venue Latitude', 'Venue Longitude', 'Venue Category']
ny_venues = newyork_sweets_venues.loc[:,filter_c]

In [73]:
from folium.plugins import MarkerCluster

### Only displaying half of venues, folium has issue with displaying large numbers of markers ###

In [74]:
# add markers to map
i = 0
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
marker_cluster = MarkerCluster().add_to(map_newyork)
for lat, lng, name, category in zip(ny_venues['Venue Latitude'], ny_venues['Venue Longitude'], ny_venues['Venue'], ny_venues['Venue Category']):
    i+=1
    if(i%2 ==0):
        label = '{}, {}'.format(name, category)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='red',
            fill=True,
            fill_color='#FFFF00',
            fill_opacity=0.4,
            parse_html=False).add_to(map_newyork) 
map_newyork

In [75]:
#Get just location data
venue_loc= ny_venues.drop(columns=['Venue','Venue Category'])

In [76]:
#create clusters
km = KMeans(5, init='k-means++') # initialize
km.fit(venue_loc)
clusters = km.predict(venue_loc)

### Blue spots are the five best spots for finding a nearby sweet shop in the New York area. ###

By stopping at each of these spots, you will be able to minimize travel time and satiate that sweet tooth as whenever you like.

In [77]:
i=0
for lat, lng in zip(km.cluster_centers_[:,0], km.cluster_centers_[:,1]): 
    label='cluster {}'.format(str(i))
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1.0,
        parse_html=False).add_to(map_newyork)
    i+=1
map_newyork