# Capstone Project - The Battle of the Neighborhoods (Week 1)

## Introduction: Business Problem

In this project I will try to find an ideal location for opening a new restaurant for stakeholders interested in Denver, United States. In addition, we will identify the best kind of restaurant to open in different neighborhoods. 

Denver is the capital and most populous municipality of the U.S. state of Colorado. With an estimated population of 716,492 in 2018, Denver has been one of the fastest-growing major cities in the United States. So opening a restaurant in Denver is promosing and profitable because of the large local population. What's more, Denver is also a jumping-off point for ski resorts in the nearby Rocky Mountains. Every winter there are huge amount of people coming to Denver for vacation, meaning that tourists also need restaurants.

Regarding the locations for the restaurant, we will try to find areas that are not already crowded with restaruants. We also prefer locations that are either close to downtown or ski resorts.

We will utilize data science analysis to generate neighborhoods that are promising based on our criteria. Results and advantages will be clearly stated after the data analysis.

## Data Source

We will use regularly spaced grid of locations that are centered around city center to define neighborhoods. 

Data sources we will use are as follows:
- Google Maps API reverse geocoding: obtain approximate addresses of centers of candidate areas
- Foursquare API: obtain detailed information of restuarants in every neighborhoods

Key data factors that we will look into:
- number of existing restuarants in the neighborhood
- number of distance to other restuarants in the neighborhood
- distance of neighborhood from city center

## Work Flow

- HTTP requests would be made to Foursquare API server using zip codes of Denver city neighborhoods to get the location information
- We will use Foursquare API search feature to collect information on nearby restaurants
- We will use Folium (Python visualization library) to visualize our neighborhood candidates in Denver
- Unsupervised machine learning algorithm K-mean clustering would be applied to form the cluster of different categories of restuarants in and around the neighborhood
- Based on clusters, we will draw our conclusions and recommendations

## Getting Neighborhood Candidates

First let's Import libraries needed

In [1]:
!pip install folium
import folium
!pip install geopy
from geopy.geocoders import Nominatim
!pip install shapely
import shapely.geometry
!pip install pyproj
import pyproj
import math
import json 
import requests
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#import beautiful soup
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 5.9MB/s eta 0:00:011
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.1
Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/38/b6/b53f19062afd49bb5abd049aeed36f13bf8d57ef8f3fa07a5203531a0252/Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5MB)
[K     |████████████████████████████████| 1.5MB 7.6MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/d6/70/eedc

We will find the latitude and longitude of Denver city center using Google Maps geocoding API.

In [2]:
address = 'Denver, United States'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
denver_center = [latitude, longitude]
print('The geograpical coordinate of ', address ,'are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of  Denver, United States are 39.7392364, -104.9848623.


Now let's create a grid of area candidates, equaly spaced, centered around city center and within 6km from Denver. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. 

In [3]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Denver center longitude={}, latitude={}'.format(denver_center[1], denver_center[0]))
x, y = lonlat_to_xy(denver_center[1], denver_center[0])
print('Denver center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Denver center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Denver center longitude=-104.9848623, latitude=39.7392364
Denver center UTM X=-4629707.622043283, Y=13449629.704199307
Denver center longitude=-104.98486230000002, latitude=39.73923639999991


Here we are going to create a hexagonal grid of cells. We make sure that every cell center is equally distant from all its neighbors by adjusting vertical row spacing.

In [4]:
denver_center_x, denver_center_y = lonlat_to_xy(denver_center[1], denver_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = denver_center_x - 6000
x_step = 600
y_min = denver_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(denver_center_x, denver_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Let's visualize the Denver city center and neighborhood candidates.

In [5]:
map_denver= folium.Map(location=denver_center, zoom_start=13)
folium.Marker(denver_center, popup='Denver').add_to(map_denver)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_denver)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_denver

We enter our Foursquare API credentials for access.

In [6]:
CLIENT_ID = 'QCPJNK0SHN3LWPARLCIEBNYG4IU2TX3N04FEJ2N2PGSYONWE' 
CLIENT_SECRET = 'CNJ2K145YU5W3WM4H5CJWC53HQKGLUWEMZ002HISFCYSMP3X' 
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QCPJNK0SHN3LWPARLCIEBNYG4IU2TX3N04FEJ2N2PGSYONWE
CLIENT_SECRET:CNJ2K145YU5W3WM4H5CJWC53HQKGLUWEMZ002HISFCYSMP3X


In [7]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=QCPJNK0SHN3LWPARLCIEBNYG4IU2TX3N04FEJ2N2PGSYONWE&client_secret=CNJ2K145YU5W3WM4H5CJWC53HQKGLUWEMZ002HISFCYSMP3X&v=20180605&ll=39.7392364,-104.9848623&radius=500&limit=100'

Results from Foursquare

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e1566dc949393001b46b990'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Capitol Hill',
  'headerFullLocation': 'Capitol Hill, Denver',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 76,
  'suggestedBounds': {'ne': {'lat': 39.743736404500005,
    'lng': -104.97902117546376},
   'sw': {'lat': 39.7347363955, 'lng': -104.99070342453625}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '53b03e59498e0929747864bc',
       'name': 'Sassafras American Eatery',
       'location': {'address': '320 E Colfax Ave',
        'lat': 39.73994899661985,
        'lng': -104.9827555095705,
        'labele

Let's pull the data from Foursquare into a dataframe for further analysis

In [10]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
#pull the actual data from the Foursquare API
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues

Unnamed: 0,venue.name,venue.id,venue.categories,venue.location.lat,venue.location.lng
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669
3,Kindness Yoga,506cb2dbe4b09d5b1e02b24d,Yoga Studio,39.736721,-104.984407
4,Good Chemistry - Denver Dispensary,4c61e35eeb82d13aaf2c04d6,Marijuana Dispensary,39.739907,-104.982668
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633
7,Capitol Hill Books,4b98376af964a520db3435e3,Bookstore,39.739979,-104.983472
8,CorePower Yoga,4a870c35f964a520520220e3,Yoga Studio,39.737070,-104.983057
9,Civic Center Park,4a3c82b6f964a5209ba11fe3,Park,39.739370,-104.988776


In [12]:
#fix the column names so they look relatively normal
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues

Unnamed: 0,name,id,categories,lat,lng
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669
3,Kindness Yoga,506cb2dbe4b09d5b1e02b24d,Yoga Studio,39.736721,-104.984407
4,Good Chemistry - Denver Dispensary,4c61e35eeb82d13aaf2c04d6,Marijuana Dispensary,39.739907,-104.982668
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633
7,Capitol Hill Books,4b98376af964a520db3435e3,Bookstore,39.739979,-104.983472
8,CorePower Yoga,4a870c35f964a520520220e3,Yoga Studio,39.737070,-104.983057
9,Civic Center Park,4a3c82b6f964a5209ba11fe3,Park,39.739370,-104.988776


Place the data into a Pandas dataframe

In [34]:
nearby_venues['categories'].unique()

array(['Breakfast Spot', 'Noodle House', 'Vegetarian / Vegan Restaurant',
       'Yoga Studio', 'Marijuana Dispensary', 'Burger Joint',
       'Middle Eastern Restaurant', 'Bookstore', 'Park', 'Jewelry Store',
       'Mexican Restaurant', 'Art Museum', 'History Museum', 'Museum',
       'Organic Grocery', 'Food Truck', 'Restaurant', 'Sandwich Place',
       'Art Gallery', 'Exhibit', 'Nightclub', 'Bakery', 'Pub',
       'Japanese Restaurant', 'Dance Studio', 'Café', 'Poke Place', 'Gym',
       'Coffee Shop', 'Outdoor Sculpture', 'Historic Site', 'Gastropub',
       'Asian Restaurant', 'Salad Place', 'Bar', 'Thai Restaurant',
       'Dive Bar', 'Diner', 'Pizza Place', 'ATM', 'Hotel', 'Lounge',
       'Convenience Store', 'Rental Car Location', 'Chinese Restaurant',
       'Thrift / Vintage Store', 'Shipping Store', 'Recreation Center',
       'Travel Lounge', 'Playground', 'Paper / Office Supplies Store',
       'Ramen Restaurant', 'Smoke Shop', 'Bike Rental / Bike Share',
       'Cowork

Here we create a list to remove all the places that are not restaurants

In [35]:
removal_list = ['Yoga Studio', 'Marijuana Dispensary','Bookstore', 'Park', 
                'Jewelry Store','Art Museum', 'History Museum', 'Museum',
                'Organic Grocery', 'Art Gallery', 'Exhibit','Dance Studio','Poke Place', 
                'Gym','Outdoor Sculpture', 'Historic Site', 'ATM', 'Hotel', 'Lounge',
                'Convenience Store', 'Rental Car Location', 'Thrift / Vintage Store', 
                'Shipping Store', 'Recreation Center','Travel Lounge', 'Playground', 
                'Paper / Office Supplies Store','Smoke Shop', 'Bike Rental / Bike Share',
                'Coworking Space']

nearby_venues2 = nearby_venues.copy()

#getting a clear dataframe of just restaurants
nearby_venues2 = nearby_venues2[~nearby_venues2['categories'].isin(removal_list)]
nearby_venues2

Unnamed: 0,name,id,categories,lat,lng
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633
11,La Abeja,4bc0b9644cdfc9b675559321,Mexican Restaurant,39.740026,-104.980851
15,Tycoon Ramen & Sushi Bar,56882b46498e46899321ab14,Noodle House,39.739958,-104.982459
17,Quiero Arepas,4e0cc05c7d8bfe35bbc4d86d,Food Truck,39.738654,-104.988764
18,Fork & Spoon,538a2731498e5e2ed87ffe3a,Restaurant,39.74016,-104.98254
19,Sub Culture,4e7e98056da1103ad2492244,Sandwich Place,39.736963,-104.980995


In [36]:
# get a list of venues
venue_id_list = nearby_venues2['id'].tolist()
venue_id_list

['53b03e59498e0929747864bc',
 '4c7d40c2b33a224b957ed781',
 '4a062ed5f964a520cb721fe3',
 '4a0842c3f964a520a8731fe3',
 '4a7a3db4f964a520f9e81fe3',
 '4bc0b9644cdfc9b675559321',
 '56882b46498e46899321ab14',
 '4e0cc05c7d8bfe35bbc4d86d',
 '538a2731498e5e2ed87ffe3a',
 '4e7e98056da1103ad2492244',
 '40e0b100f964a52044031fe3',
 '57e81188498e2d323bd59833',
 '4dddb008b0fbc2c4eef3896a',
 '4b1d4dedf964a5207b0e24e3',
 '56325b77498ebde73e1975e8',
 '4c72b58b4bc4236a26fbcb7a',
 '50ba9a6be4b09ef2d94c3021',
 '55e10513498e44d12474f5f0',
 '4b94231af964a5208f6a34e3',
 '599e325ef96b2c3468d75ec2',
 '4ad4d7f0f964a5202efc20e3',
 '4abd9fe4f964a520338b20e3',
 '514213bf3d7ca5b4135a944a',
 '5bb955ce7dc9e1002c0c68ad',
 '4bfdf6e64cf820a1b24fedf4',
 '51f1daf0498e5c6ef3018963',
 '4ac6d449f964a52067b620e3',
 '4a7e9575f964a52005f21fe3',
 '4faeb10ce4b097c37e642ee9',
 '51017d8fe4b073cacfe81bf3',
 '4b92c9fcf964a520e01b34e3',
 '52abbb1811d28f0e43511c09',
 '4c76772066be6dcb2327c30f',
 '4a5e689df964a52083be1fe3',
 '4bc52c1e5935

In [37]:
#pull the likes from the API based on venue ID

url_list = []
like_list = []
json_list = []

for i in venue_id_list:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_list.append(likes)
print(like_list)

[90, 75, 432, 74, 43, 16, 25, 13, 18, 50, 89, 6, 121, 24, 7, 8, 67, 19, 15, 7, 35, 14, 14, 6, 78, 5, 14, 84, 3, 6, 0, 33, 1, 33, 14, 7, 5, 7, 6, 2]


In [38]:
#double check that we did not lose any venues based on if likes were available

print(len(like_list))
print(len(venue_id_list))

40
40


## Data Preparation
We are going to bin the data into a quality categorical variables so that we can cluster appropriately

In [39]:
# Make a copy of our initial dataframe just in case anything goes wrong

denver_venues = nearby_venues2.copy()
denver_venues.head()

Unnamed: 0,name,id,categories,lat,lng
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633


In [40]:
# Add in the list of likes

denver_venues['total likes'] = like_list
denver_venues.head()

Unnamed: 0,name,id,categories,lat,lng,total likes
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756,90
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111,75
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669,432
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861,74
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633,43


In [41]:
# Bin total likes

print(denver_venues['total likes'].max())
print(denver_venues['total likes'].min())
print(denver_venues['total likes'].median())
print(denver_venues['total likes'].mean())

432
0
14.5
39.15


In [42]:
# Decide bin we are going to use
print(np.percentile(denver_venues['total likes'], 25))
print(np.percentile(denver_venues['total likes'], 50))
print(np.percentile(denver_venues['total likes'], 75))

6.75
14.5
44.75


In [43]:
# Set bin to the appropriate values
# less than 24, 24-45, 45-76, 76>
# poor, below avg, abv avg, great

poor = denver_venues['total likes']<=24
below_avg = denver_venues[(denver_venues['total likes']>24) & (denver_venues['total likes']<=45)]
abv_avg = denver_venues[(denver_venues['total likes']>45) & (denver_venues['total likes']<=76)]
great = denver_venues['total likes']>76

denver_venues

Unnamed: 0,name,id,categories,lat,lng,total likes
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756,90
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111,75
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669,432
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861,74
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633,43
11,La Abeja,4bc0b9644cdfc9b675559321,Mexican Restaurant,39.740026,-104.980851,16
15,Tycoon Ramen & Sushi Bar,56882b46498e46899321ab14,Noodle House,39.739958,-104.982459,25
17,Quiero Arepas,4e0cc05c7d8bfe35bbc4d86d,Food Truck,39.738654,-104.988764,13
18,Fork & Spoon,538a2731498e5e2ed87ffe3a,Restaurant,39.74016,-104.98254,18
19,Sub Culture,4e7e98056da1103ad2492244,Sandwich Place,39.736963,-104.980995,50


In [44]:
# Re-categorizing the categories

denver_venues['categories'].unique()

array(['Breakfast Spot', 'Noodle House', 'Vegetarian / Vegan Restaurant',
       'Burger Joint', 'Middle Eastern Restaurant', 'Mexican Restaurant',
       'Food Truck', 'Restaurant', 'Sandwich Place', 'Nightclub',
       'Bakery', 'Pub', 'Japanese Restaurant', 'Café', 'Coffee Shop',
       'Gastropub', 'Asian Restaurant', 'Salad Place', 'Bar',
       'Thai Restaurant', 'Dive Bar', 'Diner', 'Pizza Place',
       'Chinese Restaurant', 'Ramen Restaurant'], dtype=object)

In [45]:
# Create our new categories and create a function to apply those to our existing data

bars = ['Nightclub', 'Pub', 'Gastropub', 'Bar', 'Dive Bar']
other = ['Breakfast Spot', 'Vegetarian / Vegan Restaurant', 'Food Truck', 'Restaurant', 'Sandwich Place', 'Café', 'Salad Place', 'Diner', 'Bakery', 'Coffee Shop']
euro_asia_indian_food = ['Noodle House', 'Middle Eastern Restaurant','Japanese Restaurant', 'Asian Restaurant', 'Thai Restaurant', 'Chinese Restaurant', 'Ramen Restaurant']
mex_southam_food = ['Mexican Restaurant']
american_food = ['Burger Joint']
italian_food = ['Pizza Place']

def conditions2(s):
    if s['categories'] in bars:
        return 'bars'
    if s['categories'] in other:
        return 'other'
    if s['categories'] in euro_asia_indian_food:
        return 'euro asia indian food'
    if s['categories'] in mex_southam_food:
        return 'mex southam food'
    if s['categories'] in american_food:
        return 'american food'
    if s['categories'] in italian_food:
        return 'italian food'

denver_venues['categories_new']=denver_venues.apply(conditions2, axis=1)
denver_venues

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756,90,other
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111,75,euro asia indian food
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669,432,other
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861,74,american food
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633,43,euro asia indian food
11,La Abeja,4bc0b9644cdfc9b675559321,Mexican Restaurant,39.740026,-104.980851,16,mex southam food
15,Tycoon Ramen & Sushi Bar,56882b46498e46899321ab14,Noodle House,39.739958,-104.982459,25,euro asia indian food
17,Quiero Arepas,4e0cc05c7d8bfe35bbc4d86d,Food Truck,39.738654,-104.988764,13,other
18,Fork & Spoon,538a2731498e5e2ed87ffe3a,Restaurant,39.74016,-104.98254,18,other
19,Sub Culture,4e7e98056da1103ad2492244,Sandwich Place,39.736963,-104.980995,50,other


In [46]:
# one hot encoding
denver_onehot = pd.get_dummies(denver_venues[['categories_new', 'total likes']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
denver_onehot['Name'] = denver_venues['name'] 

# move neighborhood column to the first column
fixed_columns = [denver_onehot.columns[-1]] + list(denver_onehot.columns[:-1])
denver_onehot = denver_onehot[fixed_columns]

denver_onehot.head()

Unnamed: 0,Name,total likes,american food,bars,euro asia indian food,italian food,mex southam food,other
0,Sassafras American Eatery,90,0,0,0,0,0,1
1,Phở-natic,75,0,0,1,0,0,0
2,"City, O' City",432,0,0,0,0,0,1
5,City Grille,74,1,0,0,0,0,0
6,Shish Kabob Grill,43,0,0,1,0,0,0


In [47]:
cluster_df = denver_onehot.drop('Name', axis=1)

k_clusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(cluster_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 0, 3, 1, 3, 1, 1, 3], dtype=int32)

In [48]:
denver_venues['label'] = kmeans.labels_
denver_venues.head()

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new,label
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756,90,other,0
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111,75,euro asia indian food,0
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669,432,other,2
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861,74,american food,0
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633,43,euro asia indian food,3


In [49]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(denver_venues['lat'], denver_venues['lng'], denver_venues['name'], denver_venues['label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Analyzing clusters

In [50]:
denver_venues.loc[denver_venues['label']==0]

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new,label
0,Sassafras American Eatery,53b03e59498e0929747864bc,Breakfast Spot,39.739949,-104.982756,90,other,0
1,Phở-natic,4c7d40c2b33a224b957ed781,Noodle House,39.740081,-104.984111,75,euro asia indian food,0
5,City Grille,4a0842c3f964a520a8731fe3,Burger Joint,39.740165,-104.982861,74,american food,0
22,The Church,40e0b100f964a52044031fe3,Nightclub,39.735245,-104.985891,89,bars,0
24,Prohibition,4dddb008b0fbc2c4eef3896a,Pub,39.739908,-104.981048,121,bars,0
32,Pablo's Coffee,50ba9a6be4b09ef2d94c3021,Coffee Shop,39.73716,-104.980835,67,other,0
43,Pub on Penn,4bfdf6e64cf820a1b24fedf4,Pub,39.736833,-104.98085,78,bars,0
47,Tom's Diner,4a7e9575f964a52005f21fe3,Diner,39.740183,-104.979532,84,other,0


In [51]:
denver_venues.loc[denver_venues['label']==1]

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new,label
11,La Abeja,4bc0b9644cdfc9b675559321,Mexican Restaurant,39.740026,-104.980851,16,mex southam food,1
17,Quiero Arepas,4e0cc05c7d8bfe35bbc4d86d,Food Truck,39.738654,-104.988764,13,other,1
18,Fork & Spoon,538a2731498e5e2ed87ffe3a,Restaurant,39.74016,-104.98254,18,other,1
23,Make Believe Bakery,57e81188498e2d323bd59833,Bakery,39.736855,-104.98437,6,other,1
27,The Spring Cafe,56325b77498ebde73e1975e8,Café,39.738297,-104.983892,7,other,1
30,Tuscany Coffee & Deli,4c72b58b4bc4236a26fbcb7a,Coffee Shop,39.743625,-104.986026,8,other,1
35,Fire,55e10513498e44d12474f5f0,Gastropub,39.735522,-104.987698,19,bars,1
36,Quiznos,4b94231af964a5208f6a34e3,Sandwich Place,39.736565,-104.983921,15,other,1
37,Bourbon Grill,599e325ef96b2c3468d75ec2,Asian Restaurant,39.740138,-104.980051,7,euro asia indian food,1
39,Satellite Bar,4abd9fe4f964a520338b20e3,Bar,39.740097,-104.983265,14,bars,1


In [52]:
denver_venues.loc[denver_venues['label']==2]

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new,label
2,"City, O' City",4a062ed5f964a520cb721fe3,Vegetarian / Vegan Restaurant,39.736724,-104.984669,432,other,2


In [53]:
denver_venues.loc[denver_venues['label']==3]

Unnamed: 0,name,id,categories,lat,lng,total likes,categories_new,label
6,Shish Kabob Grill,4a7a3db4f964a520f9e81fe3,Middle Eastern Restaurant,39.740246,-104.983633,43,euro asia indian food,3
15,Tycoon Ramen & Sushi Bar,56882b46498e46899321ab14,Noodle House,39.739958,-104.982459,25,euro asia indian food,3
19,Sub Culture,4e7e98056da1103ad2492244,Sandwich Place,39.736963,-104.980995,50,other,3
25,Tokyo Joe's,4b1d4dedf964a5207b0e24e3,Japanese Restaurant,39.737657,-104.983505,24,euro asia indian food,3
38,MAD Greens - Inspired Eats,4ad4d7f0f964a5202efc20e3,Salad Place,39.736147,-104.988549,35,other,3
51,Oblio's Cap Hill Tavern,52abbb1811d28f0e43511c09,Pizza Place,39.73557,-104.982422,33,italian food,3
58,Starbucks,4a5e689df964a52083be1fe3,Coffee Shop,39.742794,-104.98704,33,other,3


## Results and Discussion

Our analysis focuses on low restuarant density area that is farily close to Denver city center. We think those area would be great choice to open a restaurant considering the tourists popularity and less competition.

After directing our attention to this more narrow area of interest we first created a dense grid of location candidates (spaced 100m apart); those locations were then filtered so that those with more than two restaurants in radius of 250m and those with a restaurant closer than 400m were removed.

In addition, we clustered those neighborhood candidates to try to identify zones of interest. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is all zones containing largest number of potential new restaurant locations based on number of and distance to existing venues. This, of course, does not imply that those zones are actually optimal locations for a new restaurant. Purpose of this analysis was to only provide info on areas close to Denver center but not crowded with existing restaurants - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion

This project is to identify ideal places to open a new resturant in Denver. We focused on area that is close to city center but has low resturants density.

We utilized Foursquare API and Google Maps geocoding API to get the information on city center and nearby restaurants. Then we used K-mean clustering to create major zones of interest, addresses of these zone centers were also provided for businesses to explore.

We identified four zones of interest and provided our recommendations in terms of zone characteristics and restaurant types. Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.