# The Battle of Neighborhoods

## Introduction

San Francisco, officially the City and County of San Francisco, is a cultural, commercial, and financial center in Northern California. San Francisco is the 16th most populous city in the United States, and the fourth most populous in California. Also, San Francisco is famous for its diverse race and ethinicy as many Chinese immigrants chose to live here from long time ago. As a result, many Asian restaurants have been opened in the city and locals and tourists love this so much.

The purpose of this project is to help people who tend to open their own Chinese restaurant in the city. This project will take factors including restaurant density, rating and tips to provide stragety information in terms of location option to the clients.

## Data and Methodology

The data includes San Francisco Neighborhood zip code and geographic data

We first generate the list of neighborhoods candidate around the Chinatown in the city. We use the information on this website: http://www.healthysf.org/bdi/outcomes/zipmap.html. 
Then we get the longitude and latitude information. In order to get the longitude and latitude data information, we will use uszip module to search the information with zipcode

The second part is for Neighborhood Venue information from Foursquare API. We requested:
1. Chinese Restaurants: 
    1. Location information 
    2. Rating
2. All Venue Categories

We use the Foursquare to get Chinese Restaurant Venue information including information and Rating, getting the average rating in each neighborhood to balance good reputation and strong competition. 

In order to attract more customers by having a diverse neighborhood with many other entertainment places, we then cluster the neighborhood by different venue categories using KNN cluster method.

# Results

### Getting the longitude and latitude data

In [68]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import folium
import matplotlib
import matplotlib.cm as cm
import matplotlib.colors as colors

In [7]:
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
address = 'Chinatown, San Francisco, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

37.7943011 -122.4063757


We will then generate the list of neighborhoods candidate around the Chinatown in the city. We use the information on this website: http://www.healthysf.org/bdi/outcomes/zipmap.htm
 

In [8]:
response = requests.get("http://www.healthysf.org/bdi/outcomes/zipmap.htm")
soup = BeautifulSoup(response.text, "lxml")
table = soup.find_all("table")
df = pd.read_html(str(table))
df = pd.DataFrame(df[4])
df.columns = df.iloc[0]
df = df.iloc[1:-1, :-1]
df=df.rename(columns={'Zip Code': 'Zipcode'})
sf_data = df
print(sf_data.head)

<bound method NDFrame.head of 0  Zipcode                             Neighborhood
1    94102  Hayes Valley/Tenderloin/North of Market
2    94103                          South of Market
3    94107                             Potrero Hill
4    94108                                Chinatown
5    94109             Polk/Russian Hill (Nob Hill)
6    94110             Inner Mission/Bernal Heights
7    94112       Ingelside-Excelsior/Crocker-Amazon
8    94114                        Castro/Noe Valley
9    94115               Western Addition/Japantown
10   94116                     Parkside/Forest Hill
11   94117                           Haight-Ashbury
12   94118                           Inner Richmond
13   94121                           Outer Richmond
14   94122                                   Sunset
15   94123                                   Marina
16   94124                    Bayview-Hunters Point
17   94127    St. Francis Wood/Miraloma/West Portal
18   94131                     Twi

In order to get the longitude and latitude data information, we will use uszip to search the information with zipcode

In [80]:
!pip install uszipcode
from uszipcode import SearchEngine

search = SearchEngine(simple_zipcode=True)

latitude = []
longitude = []

for index, row in sf_data.iterrows():
    zipcode = search.by_zipcode(row["Zipcode"]).to_dict()
    latitude.append(zipcode.get("lat"))
    longitude.append(zipcode.get("lng"))

sf_data["Latitude"] = latitude
sf_data["Longitude"] = longitude

sf_data



Unnamed: 0,Zipcode,Neighborhood,Latitude,Longitude
1,94102,Hayes Valley/Tenderloin/North of Market,37.78,-122.42
2,94103,South of Market,37.78,-122.41
3,94107,Potrero Hill,37.77,-122.39
4,94108,Chinatown,37.791,-122.409
5,94109,Polk/Russian Hill (Nob Hill),37.79,-122.42
6,94110,Inner Mission/Bernal Heights,37.75,-122.42
7,94112,Ingelside-Excelsior/Crocker-Amazon,37.72,-122.44
8,94114,Castro/Noe Valley,37.76,-122.44
9,94115,Western Addition/Japantown,37.79,-122.44
10,94116,Parkside/Forest Hill,37.74,-122.48


Let's do a map to visualize the neighborhoods using folium

In [76]:
address = 'San Francisco'

geolocator = Nominatim(user_agent = "san_francisco_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of San Francisco are {}, {}.'.format(latitude, longitude))


The geograpical coordinates of San Francisco are 37.7790262, -122.419906.


In [78]:
sf_map = folium.Map(location = [latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(sf_data['Latitude'], sf_data['Longitude'], sf_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(sf_map)  
    
sf_map

In [11]:
from IPython.display import Image 
from IPython.core.display import HTML 

### Foursquare Restaurant Data

#### Define Foursquare Credentials

In [97]:
print('These information is hide in github')

These information is hide in github


 #### Get the Chinese Restaurants in each neighborhoods

In [13]:
# root categories for Asian Restaurant
ch_food_category='4bf58dd8d48988d145941735'
radius=500
chinese_restaurant_categories=['52af3a5e3cf9994f4e043bea','52af3a723cf9994f4e043bec',
                   '52af3a7c3cf9994f4e043bed','58daa1558bbb0b01f18ec1d3',
                   '52af3a673cf9994f4e043beb','52af3a903cf9994f4e043bee',
                   '4bf58dd8d48988d1f5931735','52af3a9f3cf9994f4e043bef',
                   '52af3aaa3cf9994f4e043bf0','52af3ab53cf9994f4e043bf1',
                   '52af3abe3cf9994f4e043bf2','52af3ac83cf9994f4e043bf3',
                   '52af3ad23cf9994f4e043bf4','52af3add3cf9994f4e043bf5',
                   '52af3af23cf9994f4e043bf7','52af3ae63cf9994f4e043bf6',
                   '52af3afc3cf9994f4e043bf8','52af3b053cf9994f4e043bf9',
                   '52af3b213cf9994f4e043bfa','52af3b293cf9994f4e043bfb',
                   '52af3b343cf9994f4e043bfc','52af3b3b3cf9994f4e043bfd',
                   '52af3b463cf9994f4e043bfe','52af3b633cf9994f4e043c01',
                   '52af3b513cf9994f4e043bff','52af3b593cf9994f4e043c00',
                   '52af3b6e3cf9994f4e043c02','52af3b773cf9994f4e043c03',
                   '52af3b813cf9994f4e043c04','52af3b893cf9994f4e043c05',
                   '52af3b913cf9994f4e043c06','52af3b9a3cf9994f4e043c07',
                   '52af3ba23cf9994f4e043c08']

In [14]:
def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'lunch', 'food']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific


In [15]:
def get_restaurants(name,lats, lons):
    venues_list = []
 
    print('Obtaining venues around candidate locations:', end='')
    for name, lat, lon in zip(name,lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, ch_food_category, radius, LIMIT)
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   item['venue']['location'],
                   item['venue']['location']['distance']) for item in results]            
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_chinese = is_restaurant(venue_categories, specific_filter=chinese_restaurant_categories)
            if is_res:
                if is_chinese:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        venue_id,
                        venue_name, 
                        venue_categories, 
                        venue_latlon, 
                        venue_address, 
                        venue_distance
                     )])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 'Venue_id',
                  'Venue', 'Venue Category',
                  'Venue Latitude&Longitude', 
                  'Venue address', 'Venue Distance'
                  ]
                  
    return(nearby_venues)

In [16]:

chinese_restaurants= get_restaurants(sf_data['Neighborhood'],sf_data['Latitude'], sf_data['Longitude'])

Obtaining venues around candidate locations:

In [17]:
chinese_restaurants

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue_id,Venue,Venue Category,Venue Latitude&Longitude,Venue address,Venue Distance
0,South of Market,37.78,-122.41,5a616f53d69ed07940d7c279,Sizzling Pot King,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.77711002625363, -122.4125721848816)","{'address': '139 8th St', 'lat': 37.7771100262...",393
1,Chinatown,37.791,-122.41,5c5f6c27ff0306002c2ff96b,Dim Sum Corner,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.792627, -122.406181)","{'address': '601 Grant Ave', 'crossStreet': 'C...",307
2,Chinatown,37.791,-122.41,4e7680ef1838f9188a51413f,Szechuan Cuisine Uncle Cafe,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.79419177842302, -122.40683096020591)","{'address': '65 Waverly Pl', 'lat': 37.7941917...",403
3,Chinatown,37.791,-122.41,4b36c6fcf964a520333c25e3,Oriental Pearl,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.79435323360806, -122.40595180711202)","{'address': '760 Clay St', 'lat': 37.794353233...",459
4,Chinatown,37.791,-122.41,4eb9ce33722edc0eaef08d3e,Yummy Dim Sum,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.794362, -122.40771399999998)","{'address': '930 Stockton St', 'lat': 37.79436...",390
5,Chinatown,37.791,-122.41,4e4ce09dbd413c4cc66ce392,Chinese Seafood Dim Sum,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.793717, -122.405992)","{'address': '708 Grant Ave', 'lat': 37.793717,...",401
6,Chinatown,37.791,-122.41,4ac9186af964a52095be20e3,Blossom Bakery,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.79452314205239, -122.40680407220638)","{'address': '133 Waverly Pl', 'crossStreet': '...",437
7,Chinatown,37.791,-122.41,5aff2c8b18d43b00396daf2f,Spice Kitchen,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.7895966, -122.4075812)","{'address': '432 Sutter St', 'crossStreet': 'S...",199
8,Chinatown,37.791,-122.41,54347c1e498e72e113993c01,Hunan House,"[(Hunan Restaurant, 52af3afc3cf9994f4e043bf8)]","(37.79517630000001, -122.407011)","{'address': '826 Washington St', 'lat': 37.795...",496
9,Chinatown,37.791,-122.41,52db40d011d268b3e4037631,Hunan Cuisine,"[(Hunan Restaurant, 52af3afc3cf9994f4e043bf8)]","(37.79481887817383, -122.40677642822266)","{'address': '150 Waverly Pl', 'lat': 37.794818...",467


#### Get the rating information for each venue

In [25]:
def get_venue_rating(venue_ids):
    venues_list=[]
    for venue_id in venue_ids:
        print(venue_id)
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(
            venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)
        result = requests.get(url).json()
        try:
            rating= result['response']['venue']['rating']
        except:
            rating=None            
        venues_list.append((venue_id, rating))
        print(venues_list)
    venue_rating_df=pd.DataFrame(venues_list,columns=['Venue_id','Rating'])
    return venue_rating_df

In [26]:
rating_df=get_venue_rating(chinese_restaurants['Venue_id'])
rating_df

5a616f53d69ed07940d7c279
[('5a616f53d69ed07940d7c279', 6.2)]
5c5f6c27ff0306002c2ff96b
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9)]
4e7680ef1838f9188a51413f
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7)]
4b36c6fcf964a520333c25e3
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7), ('4b36c6fcf964a520333c25e3', 6.4)]
4eb9ce33722edc0eaef08d3e
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7), ('4b36c6fcf964a520333c25e3', 6.4), ('4eb9ce33722edc0eaef08d3e', 6.5)]
4e4ce09dbd413c4cc66ce392
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7), ('4b36c6fcf964a520333c25e3', 6.4), ('4eb9ce33722edc0eaef08d3e', 6.5), ('4e4ce09dbd413c4cc66ce392', None)]
4ac9186af964a52095be20e3
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef18

[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7), ('4b36c6fcf964a520333c25e3', 6.4), ('4eb9ce33722edc0eaef08d3e', 6.5), ('4e4ce09dbd413c4cc66ce392', None), ('4ac9186af964a52095be20e3', None), ('5aff2c8b18d43b00396daf2f', None), ('54347c1e498e72e113993c01', 6.3), ('52db40d011d268b3e4037631', None), ('4648c04df964a520a8461fe3', 6.2), ('4ceec3b83b03f04de88d3bdc', 8.4), ('51b9fdee498ec3a4b2580261', 8.1), ('4db2fbf65da3a76f441e692d', None), ('58423840876ade32f7faa72a', None), ('531a929c498ea455ec6cd3a8', None), ('589631b7730a926561597378', None), ('56722103498ef0287b6edbdb', None), ('3fd66200f964a52005ed1ee3', None), ('4b5b53cef964a520e3f428e3', None), ('4a47c4fcf964a52011aa1fe3', None), ('4a70fc10f964a520ebd81fe3', None)]
4ae8d83cf964a520d2b221e3
[('5a616f53d69ed07940d7c279', 6.2), ('5c5f6c27ff0306002c2ff96b', 6.9), ('4e7680ef1838f9188a51413f', 6.7), ('4b36c6fcf964a520333c25e3', 6.4), ('4eb9ce33722edc0eaef08d3e', 6.5), ('4e4ce09dbd413

Unnamed: 0,Venue_id,Rating
0,5a616f53d69ed07940d7c279,6.2
1,5c5f6c27ff0306002c2ff96b,6.9
2,4e7680ef1838f9188a51413f,6.7
3,4b36c6fcf964a520333c25e3,6.4
4,4eb9ce33722edc0eaef08d3e,6.5
5,4e4ce09dbd413c4cc66ce392,
6,4ac9186af964a52095be20e3,
7,5aff2c8b18d43b00396daf2f,
8,54347c1e498e72e113993c01,6.3
9,52db40d011d268b3e4037631,


#### Merge back to the venue dataframe

In [35]:
chinese_restaurants=chinese_restaurants.merge(rating_df, on='Venue_id')
chinese_restaurants

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue_id,Venue,Venue Category,Venue Latitude&Longitude,Venue address,Venue Distance,Rating_x,Rating_y
0,South of Market,37.78,-122.41,5a616f53d69ed07940d7c279,Sizzling Pot King,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.77711002625363, -122.4125721848816)","{'address': '139 8th St', 'lat': 37.7771100262...",393,6.2,6.2
1,Chinatown,37.791,-122.41,5c5f6c27ff0306002c2ff96b,Dim Sum Corner,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.792627, -122.406181)","{'address': '601 Grant Ave', 'crossStreet': 'C...",307,6.9,6.9
2,Chinatown,37.791,-122.41,4e7680ef1838f9188a51413f,Szechuan Cuisine Uncle Cafe,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.79419177842302, -122.40683096020591)","{'address': '65 Waverly Pl', 'lat': 37.7941917...",403,6.7,6.7
3,Chinatown,37.791,-122.41,4b36c6fcf964a520333c25e3,Oriental Pearl,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.79435323360806, -122.40595180711202)","{'address': '760 Clay St', 'lat': 37.794353233...",459,6.4,6.4
4,Chinatown,37.791,-122.41,4eb9ce33722edc0eaef08d3e,Yummy Dim Sum,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.794362, -122.40771399999998)","{'address': '930 Stockton St', 'lat': 37.79436...",390,6.5,6.5
5,Chinatown,37.791,-122.41,4e4ce09dbd413c4cc66ce392,Chinese Seafood Dim Sum,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.793717, -122.405992)","{'address': '708 Grant Ave', 'lat': 37.793717,...",401,,
6,Chinatown,37.791,-122.41,4ac9186af964a52095be20e3,Blossom Bakery,"[(Dim Sum Restaurant, 4bf58dd8d48988d1f5931735)]","(37.79452314205239, -122.40680407220638)","{'address': '133 Waverly Pl', 'crossStreet': '...",437,,
7,Chinatown,37.791,-122.41,5aff2c8b18d43b00396daf2f,Spice Kitchen,"[(Szechuan Restaurant, 52af3b773cf9994f4e043c03)]","(37.7895966, -122.4075812)","{'address': '432 Sutter St', 'crossStreet': 'S...",199,,
8,Chinatown,37.791,-122.41,54347c1e498e72e113993c01,Hunan House,"[(Hunan Restaurant, 52af3afc3cf9994f4e043bf8)]","(37.79517630000001, -122.407011)","{'address': '826 Washington St', 'lat': 37.795...",496,6.3,6.3
9,Chinatown,37.791,-122.41,52db40d011d268b3e4037631,Hunan Cuisine,"[(Hunan Restaurant, 52af3afc3cf9994f4e043bf8)]","(37.79481887817383, -122.40677642822266)","{'address': '150 Waverly Pl', 'lat': 37.794818...",467,,


In [34]:
mean_rating_df = chinese_restaurants.groupby("Neighborhood")['Rating'].mean()
mean_rating_df = mean_rating_df.reset_index().sort_values(by=['Rating'],ascending=False)
mean_rating_df

Unnamed: 0,Neighborhood,Rating
2,Ingelside-Excelsior/Crocker-Amazon,8.4
0,Castro/Noe Valley,8.1
1,Chinatown,6.56
5,Polk/Russian Hill (Nob Hill),6.2
6,South of Market,6.2
3,Inner Richmond,
4,Parkside/Forest Hill,
7,Sunset,


From the above information, we can know that Ingelside-Excelsior/Crocker-Amazon neighborhood had the best average chinese restaurant rating and to our surprise, Chinatown ranks only the third. It is important for a new restaurant to avoid strong competition but opening in an area which is famous for Chinese food is also important. 

# Clustering neighborhoods

When trying to find a location for opening a restaurant, neighborhoods entertainment diversity might also be a good factor to take into considerations as people nowadays tend to find some after-meal activities.

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [56]:
sf_venues = getNearbyVenues(names = sf_data['Neighborhood'],
                                   latitudes = sf_data['Latitude'],
                                   longitudes = sf_data['Longitude']
                                  )
                                  
sf_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,War Memorial Opera House,37.778601,-122.420816,Opera House
1,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,Louise M. Davies Symphony Hall,37.777976,-122.420157,Concert Hall
2,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,Herbst Theater,37.779548,-122.420953,Concert Hall
3,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,San Francisco Ballet,37.77858,-122.420798,Dance Studio
4,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,"Books, Inc.",37.781614,-122.420531,Bookstore


In [86]:
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']])

# add neighborhood column back to dataframe
sf_onehot['Neighborhood'] = sf_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sf_onehot.columns[-1]] + list(sf_onehot.columns[:-1])
sf_onehot = sf_onehot[fixed_columns]

sf_grouped = sf_onehot.groupby('Neighborhood').mean().reset_index()
sf_grouped.head()

Unnamed: 0,Neighborhood,Venue Category_ATM,Venue Category_Accessories Store,Venue Category_Adult Boutique,Venue Category_American Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,...,Venue Category_Trail,Venue Category_Turkish Restaurant,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Video Store,Venue Category_Vietnamese Restaurant,Venue Category_Whisky Bar,Venue Category_Wine Bar,Venue Category_Wine Shop,Venue Category_Yoga Studio
0,Bayview-Hunters Point,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Castro/Noe Valley,0.018519,0.018519,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,...,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.018519,0.037037
2,Chinatown,0.0,0.0,0.0,0.022472,0.011236,0.0,0.0,0.0,0.011236,...,0.0,0.0,0.0,0.0,0.0,0.022472,0.011236,0.0,0.0,0.011236
3,Haight-Ashbury,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037
4,Hayes Valley/Tenderloin/North of Market,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,...,0.0,0.0,0.0375,0.0,0.0,0.0625,0.0,0.0375,0.0125,0.0


In [87]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [88]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        # append 'st', 'nd', 'rd' to the top 3 venues
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = sf_grouped['Neighborhood']

for ind in np.arange(sf_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayview-Hunters Point,Venue Category_Motorcycle Shop,Venue Category_Marijuana Dispensary,Venue Category_Liquor Store,Venue Category_Coffee Shop,Venue Category_Yoga Studio,Venue Category_French Restaurant,Venue Category_Food Truck,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Flower Shop
1,Castro/Noe Valley,Venue Category_Park,Venue Category_Yoga Studio,Venue Category_Coffee Shop,Venue Category_Pharmacy,Venue Category_Gay Bar,Venue Category_Playground,Venue Category_Wine Bar,Venue Category_Historic Site,Venue Category_Pizza Place,Venue Category_Pet Store
2,Chinatown,Venue Category_Coffee Shop,Venue Category_Boutique,Venue Category_Hotel,Venue Category_Chinese Restaurant,Venue Category_Sushi Restaurant,Venue Category_Cosmetics Shop,Venue Category_Jewelry Store,Venue Category_Spa,Venue Category_Bubble Tea Shop,Venue Category_Gym / Fitness Center
3,Haight-Ashbury,Venue Category_Grocery Store,Venue Category_Bakery,Venue Category_Coffee Shop,Venue Category_Yoga Studio,Venue Category_Ice Cream Shop,Venue Category_Restaurant,Venue Category_Recreation Center,Venue Category_Pizza Place,Venue Category_Pharmacy,Venue Category_Park
4,Hayes Valley/Tenderloin/North of Market,Venue Category_Vietnamese Restaurant,Venue Category_Coffee Shop,Venue Category_Sandwich Place,Venue Category_Wine Bar,Venue Category_French Restaurant,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Beer Bar,Venue Category_Hotel,Venue Category_Theater,Venue Category_Sushi Restaurant


In [89]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5
sf_grouped_clustering = sf_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(sf_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 0, 0, 0, 0, 0, 3, 0], dtype=int32)

In [95]:

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sf_merged = sf_data
sf_merged = sf_merged.merge(neighborhoods_venues_sorted, on = 'Neighborhood')

sf_merged.head()

ValueError: cannot insert Cluster Labels, already exists

In [94]:
sf_merged[['Neighborhood','1st Most Common Venue']]

Unnamed: 0,Neighborhood,1st Most Common Venue
0,Hayes Valley/Tenderloin/North of Market,Venue Category_Vietnamese Restaurant
1,South of Market,Venue Category_Coffee Shop
2,Potrero Hill,Venue Category_Food Truck
3,Chinatown,Venue Category_Coffee Shop
4,Polk/Russian Hill (Nob Hill),Venue Category_Massage Studio
5,Inner Mission/Bernal Heights,Venue Category_Mexican Restaurant
6,Ingelside-Excelsior/Crocker-Amazon,Venue Category_Pizza Place
7,Castro/Noe Valley,Venue Category_Park
8,Western Addition/Japantown,Venue Category_Park
9,Parkside/Forest Hill,Venue Category_Chinese Restaurant


In [96]:
sf_merged

Unnamed: 0,Zipcode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,94102,Hayes Valley/Tenderloin/North of Market,37.78,-122.42,0,Venue Category_Vietnamese Restaurant,Venue Category_Coffee Shop,Venue Category_Sandwich Place,Venue Category_Wine Bar,Venue Category_French Restaurant,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Beer Bar,Venue Category_Hotel,Venue Category_Theater,Venue Category_Sushi Restaurant
1,94103,South of Market,37.78,-122.41,0,Venue Category_Coffee Shop,Venue Category_Theater,Venue Category_Sandwich Place,Venue Category_Bakery,Venue Category_Marijuana Dispensary,Venue Category_Vietnamese Restaurant,Venue Category_American Restaurant,Venue Category_Pizza Place,Venue Category_Mexican Restaurant,Venue Category_Shoe Store
2,94107,Potrero Hill,37.77,-122.39,0,Venue Category_Food Truck,Venue Category_Coffee Shop,Venue Category_Gym,Venue Category_Pharmacy,Venue Category_Café,Venue Category_Park,Venue Category_Dog Run,Venue Category_Street Food Gathering,Venue Category_Light Rail Station,Venue Category_Pizza Place
3,94108,Chinatown,37.791,-122.409,0,Venue Category_Coffee Shop,Venue Category_Boutique,Venue Category_Hotel,Venue Category_Chinese Restaurant,Venue Category_Sushi Restaurant,Venue Category_Cosmetics Shop,Venue Category_Jewelry Store,Venue Category_Spa,Venue Category_Bubble Tea Shop,Venue Category_Gym / Fitness Center
4,94109,Polk/Russian Hill (Nob Hill),37.79,-122.42,0,Venue Category_Massage Studio,Venue Category_Grocery Store,Venue Category_Sushi Restaurant,Venue Category_Diner,Venue Category_Donut Shop,Venue Category_Mexican Restaurant,Venue Category_Thai Restaurant,Venue Category_Steakhouse,Venue Category_Gym / Fitness Center,Venue Category_Wine Bar
5,94110,Inner Mission/Bernal Heights,37.75,-122.42,0,Venue Category_Mexican Restaurant,Venue Category_Bakery,Venue Category_Italian Restaurant,Venue Category_Deli / Bodega,Venue Category_Art Gallery,Venue Category_Grocery Store,Venue Category_Burrito Place,Venue Category_Café,Venue Category_Market,Venue Category_Massage Studio
6,94112,Ingelside-Excelsior/Crocker-Amazon,37.72,-122.44,0,Venue Category_Pizza Place,Venue Category_Chinese Restaurant,Venue Category_Bus Station,Venue Category_Bar,Venue Category_Pharmacy,Venue Category_Café,Venue Category_Coffee Shop,Venue Category_Sandwich Place,Venue Category_Mexican Restaurant,Venue Category_Deli / Bodega
7,94114,Castro/Noe Valley,37.76,-122.44,0,Venue Category_Park,Venue Category_Yoga Studio,Venue Category_Coffee Shop,Venue Category_Pharmacy,Venue Category_Gay Bar,Venue Category_Playground,Venue Category_Wine Bar,Venue Category_Historic Site,Venue Category_Pizza Place,Venue Category_Pet Store
8,94115,Western Addition/Japantown,37.79,-122.44,0,Venue Category_Park,Venue Category_Spa,Venue Category_Sushi Restaurant,Venue Category_Chinese Restaurant,Venue Category_Coffee Shop,Venue Category_Sports Bar,Venue Category_Bakery,Venue Category_Playground,Venue Category_Bubble Tea Shop,Venue Category_Burrito Place
9,94116,Parkside/Forest Hill,37.74,-122.48,0,Venue Category_Chinese Restaurant,Venue Category_Café,Venue Category_Light Rail Station,Venue Category_Park,Venue Category_Pizza Place,Venue Category_Sandwich Place,Venue Category_Bakery,Venue Category_Pub,Venue Category_Bubble Tea Shop,Venue Category_Snack Place


In [93]:
from IPython.display import HTML, display
import numpy as np
# create map
map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i * x) ** 2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sf_merged['Latitude'], sf_merged['Longitude'], sf_merged['Neighborhood'], sf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster - 1],
        fill = True,
        fill_color = rainbow[cluster - 1],
        fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

# Conclusion

Based on the common venue categories shown in the previous slide, we can know that Ingelside-Excelsior/Crocker-Amazon area is a common place for pizza restaurant but the 2nd common place is Chinese Restaurants. While Chinatown has a lot of coffee shop. Taking all of these into consideration, Chinatown might still be a good place to start a new Chinese restaurant. As long as the food are in good quality, the new venue should stand out with high rating in this famous Chinese food area. 