# Capstone Project-The Battle of the Neighborhoods (Week 1)

## Table of contents
* Introduction: Business Problem
* Data
* Methodology
* Analysis
* Results and Discussion
* Conclusion

## Introduction

1.1.Background

This data science project aims to help my friend who is planning to shift Brooklyn, New York. He is interested to start a coffee shop in one of the neighborhoods of Brooklyn. As a data scientist I would like to help my friend by doing survey of Brooklyn neighborhoods and suggest him right place to start his business. Each neighborhoods need to be analyzed for no or less number of existing coffee shops.
1.2.Business Problem

This project aims to predict more suitable neighborhood in Brooklyn for a coffee shop in terms of the no. of existing coffee shops and most trending.It is required to explore, segment, and cluster the neighborhoods in the city of Brooklyn and find out location with no or less number of coffee shops.

1.3.Audience

For anyone who wants to explore options of starting new business in US or any other place , this kind of analysis can be useful. The methodology will be the same which is used here in this project.


## Data

Data Sources

New York Neighborhood data has a total of 5 boroughs and 306 neighborhoods. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the latitude and longitude coordinates of each neighborhood. 
This dataset exists on the web here - https://geo.nyu.edu/catalog/nyu_2451_34572


The data will be used as follows:

Using geopy library get the list of the latitude and longitude values of New York City
Slice the original dataframe and create a new dataframe of the Brooklyn neighborhoods with their latitude and longitude
Using Foursquare explore all neighborhood in data frame.
Use Foursquare and geopy data to map top 5 venues for Brooklyn neighborhoods and clustered in groups
 


Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    scikit-learn-0.20.1        |   py36h22eb022_0         5.7 MB
    liblapack-3.8.0            |      11_openblas          10 KB  conda-forge
    liblapacke-3.8.0           |      11_openblas          10 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    libopenblas-0.3.6          |       h5a2b251_2         7.7 MB
    numpy-1.17.3               |   py36h95a1406_0         5.2 MB  conda-forge
    scipy-1.4.1                |   py36h921218d_0        

# Capstone Project-The Battle of the Neighborhoods (Week 2)

##  Download and Explore Dataset

In [4]:
!wget -q -O 'data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [6]:
with open('data.json') as json_data:
    data = json.load(json_data)
neigh_data = data['features']

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhood = pd.DataFrame(columns=column_names)

In [8]:
for neigh in neigh_data:
    borough = neigh['properties']['borough'] 
    neigh_name = neigh['properties']['name']
        
    neigh_latlon = neigh['geometry']['coordinates']
    neigh_lat = neigh_latlon[1]
    neigh_lon = neigh_latlon[0]
    
    neighborhood = neighborhood.append({'Borough': borough,'Neighborhood': neigh_name,'Latitude': neigh_lat,'Longitude': neigh_lon}, ignore_index=True)

In [9]:
brooklyn_data= neighborhood[neighborhood['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


In [10]:
address = 'Brooklyn, NY'
geolocator=Nominatim(user_agent='ny_explorer')
location=geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


Folium is a great visualization library.Use it to exlore Brooklyn and the neighborhood and its respective borough.

In [11]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

## Use Foursquare API to find out Venues details

In [12]:
CLIENT_ID = 'T4KVRWB54PTAC5SWRSNENFQVDQJ1EY0POYDV1IIYCUF3O2RP' # your Foursquare ID
CLIENT_SECRET = '1PLNALU0ZUQ3OVFW43ZHNBDLFWRN4GULOEPSWFBGCEE3RKYR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: T4KVRWB54PTAC5SWRSNENFQVDQJ1EY0POYDV1IIYCUF3O2RP
CLIENT_SECRET:1PLNALU0ZUQ3OVFW43ZHNBDLFWRN4GULOEPSWFBGCEE3RKYR


In [13]:
brooklyn_data.loc[0, 'Neighborhood']

'Bay Ridge'

In [14]:
neigh_lat = brooklyn_data.loc[0, 'Latitude'] # neighborhood latitude value
neigh_long = brooklyn_data.loc[0, 'Longitude'] # neighborhood longitude value

neigh_name = brooklyn_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neigh_name, 
                                                               neigh_lat, 
                                                               neigh_long))

Latitude and longitude values of Bay Ridge are 40.625801065010656, -74.03062069353813.


In [15]:
LIMIT = 100 
radius = 500

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neigh_lat, 
    neigh_long, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=T4KVRWB54PTAC5SWRSNENFQVDQJ1EY0POYDV1IIYCUF3O2RP&client_secret=1PLNALU0ZUQ3OVFW43ZHNBDLFWRN4GULOEPSWFBGCEE3RKYR&v=20180605&ll=40.625801065010656,-74.03062069353813&radius=500&limit=100'

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [29]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [30]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(2794, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place


In [34]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 289 uniques categories.


In [46]:
brooklyn_coffeeshopes= brooklyn_venues[brooklyn_venues['Venue Category']=='Coffee Shop']

In [47]:
print(brooklyn_coffeeshopes.shape)
brooklyn_coffeeshopes.head()

(88, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
15,Bay Ridge,40.625801,-74.030621,Caffe Café,40.624946,-74.030404,Coffee Shop
61,Bay Ridge,40.625801,-74.030621,Mocha Mocha Cafe,40.622699,-74.028636,Coffee Shop
168,Greenpoint,40.730201,-73.954241,Homecoming,40.729696,-73.957525,Coffee Shop
186,Greenpoint,40.730201,-73.954241,Maman,40.730427,-73.958035,Coffee Shop
189,Greenpoint,40.730201,-73.954241,odd fox,40.732673,-73.95455,Coffee Shop


In [49]:
brooklyn_coffeeshopes.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,1,1,1,1,1,1
Bay Ridge,2,2,2,2,2,2
Bedford Stuyvesant,2,2,2,2,2,2
Boerum Hill,4,4,4,4,4,4
Borough Park,1,1,1,1,1,1
Brighton Beach,1,1,1,1,1,1
Brooklyn Heights,1,1,1,1,1,1
Bushwick,6,6,6,6,6,6
Carroll Gardens,5,5,5,5,5,5
City Line,1,1,1,1,1,1


In [71]:
brooklyn_coffeeshopes_grped = brooklyn_coffeeshopes.groupby('Neighborhood').mean().reset_index()
#brooklyn_coffeeshopes_grped

In [62]:
# set number of clusters
kclusters = 5

brooklyn_grouped_clustering = brooklyn_coffeeshopes_grped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 3, 0, 4, 4, 0, 3, 0, 1], dtype=int32)

In [70]:
# add clustering labels
#brooklyn_coffeeshopes_grped.insert(0, 'Cluster Labels', kmeans.labels_)
brooklyn_coffeeshopes_grped.head()
#brooklyn_merged = brooklyn_data
#brooklyn_merged.head()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(brooklyn_coffeeshopes.set_index('Neighborhood'), on='Neighborhood')

#brooklyn_merged.head() # check the last columns!

Unnamed: 0,Cluster Labels,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude
0,2,Bath Beach,40.599519,-73.998752,40.595227,-74.000017
1,2,Bay Ridge,40.625801,-74.030621,40.623822,-74.02952
2,3,Bedford Stuyvesant,40.687232,-73.941785,40.685595,-73.944594
3,0,Boerum Hill,40.685683,-73.983748,40.687125,-73.98137
4,4,Borough Park,40.633131,-73.990498,40.631909,-73.994964


In [None]:
#create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

 ## Results and Discussion

Analysis shows that there are few places where there are only 1 Coffee shop. eg- Bath Beach,Bay Ridge..etc These are the places which my friend consider for opening a coffee shop

Further if decides the exact location in neighorhood where he wants to shift then again we can calculate the distance from his house to the location of coffee shop he intended to start and also the distance from existing coffee shop in that location.So that he can fix the place for his new business

## Conclusion

Purpose of this project was to identify Brooklyn areas where there are less number of coffee shops. 
Final decision on optimal coffee shop  location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like distance from the stake holders house, distance of existing coffee shop if any in the same location and all neighborhoods of Brooklyn