# The Battle of Neighborhoods
## Where is the best location to open an Italian Restaurant in San Francisco?

Table of Contents
* [1. Introduction/Businness Problem](#introduction)
* [2. Data](#data)
* [3. Methodology](#method)
* [4. Analysis](#analysis)
* [5.Results](#results)
* [6. Discussion](#discussion)
* [7. Conclusions](#conclusions)

### 1. Introduction/Businness Problem <a name="introduction"></a>

In this project we will analyze neighborhoods in San Francisco to find the best location to open an Italian Restaurant. Our stakeholders will be Italian chefs moving to San Francisco.
Since there are already many Restaurants in San Francisco, we are looking for a neighborhood with not so many Italian Restaurants, but also with other businesses, since the workers will go eat around the area.

### 2. Data <a name="data"></a>

1) First we need to install the packages en modules needed

In [None]:
!pip install requests

In [None]:
!pip install pandas

In [1]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!pip3 install KMeans
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [2]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

2) We download data about San Francisco neighborhoods, with Postal Codes and Population

In [3]:
url = 'http://www.healthysf.org/bdi/outcomes/zipmap.htm'
response = requests.get(url).text
soup = BeautifulSoup(response, 'html5lib')
##beautiful soup

In [4]:
table = soup.find_all('tbody')[3]

table_contents=[]
for row in table.find_all("tr"):
    cell = {}
    col = row.find_all("td")
    
    cell['PostalCode'] = col[0].get_text().replace('\n','')
    cell['Neighborhood'] = col[1].get_text().replace('\n','')
    cell['Population'] = col[2].get_text().replace('\n','')
    table_contents.append(cell)
    
df=pd.DataFrame(table_contents)
df.head()

Unnamed: 0,PostalCode,Neighborhood,Population
0,Zip Code,Neighborhood,Population (Census 2000)
1,94102,Hayes Valley/Tenderloin/North of...,28991
2,94103,South of Market,23016
3,94107,Potrero Hill,17368
4,94108,Chinatown,13716


In [5]:
#drop first and last row, as they have not useful data
sf_data = df.drop([0,22], axis=0).reset_index(drop=True)
sf_data.head()

Unnamed: 0,PostalCode,Neighborhood,Population
0,94102,Hayes Valley/Tenderloin/North of...,28991
1,94103,South of Market,23016
2,94107,Potrero Hill,17368
3,94108,Chinatown,13716
4,94109,Polk/Russian Hill (Nob Hill),56322


4) We download a csv file with Latitude and Longitude of neighborhoods in San Francisco

In [6]:
path = 'https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&q=san+francisco&timezone=Europe/Berlin&lang=en&use_labels_for_header=true&csv_separator=%3B'
df_ll=pd.read_csv(path,delimiter=';')
df_ll.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,94122,San Francisco,CA,37.75838,-122.48478,-8,1,"37.75838,-122.48478"
1,94141,San Francisco,CA,37.784827,-122.727802,-8,1,"37.784827,-122.727802"
2,94110,San Francisco,CA,37.74873,-122.41545,-8,1,"37.74873,-122.41545"
3,94146,San Francisco,CA,37.784827,-122.727802,-8,1,"37.784827,-122.727802"
4,94165,San Francisco,CA,37.784827,-122.727802,-8,1,"37.784827,-122.727802"


5) We create a new dataframe with Neighborhoods, Latitude and Longitude

In [7]:
#change data type of Postal Code to integer
sf_data['PostalCode'] = sf_data['PostalCode'].astype(int)


df_coord=pd.DataFrame(columns=["PostalCode","Neighborhood","Latitude","Longitude"])
for i in range(len(sf_data)):
    #retrieve postal code
    postal_code = sf_data.loc[i,'PostalCode']
    latitude = df_ll.loc[df_ll["Zip"] == postal_code,'Latitude'].item()
    longitude  = df_ll.loc[df_ll["Zip"] == postal_code,'Longitude'].item()
    df_coord = df_coord.append({"PostalCode": postal_code,"Neighborhood": sf_data.loc[i,'Neighborhood'], "Latitude": latitude, "Longitude": longitude},ignore_index=True)
    
df_coord.head()

Unnamed: 0,PostalCode,Neighborhood,Latitude,Longitude
0,94102,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915
1,94103,South of Market,37.772329,-122.41087
2,94107,Potrero Hill,37.766529,-122.39577
3,94108,Chinatown,37.792678,-122.40793
4,94109,Polk/Russian Hill (Nob Hill),37.792778,-122.42188


6) We use Foursquare to see how many italian restaurants are in each neighborhood and cluster them

In [8]:
CLIENT_ID = '1VR4ZY4QGLNCDQGNS223GCYXDBZNEHY4F3KT42F0TDQKZDVP' # Foursquare ID
CLIENT_SECRET = 'TIWEXIHCNTBGWS5OOQIGKXINVAHFPIHPV3JAJ0ZFXAHJYS1C' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
sf_venues = getNearbyVenues(names=df_coord['Neighborhood'],
                                   latitudes=df_coord['Latitude'],
                                   longitudes=df_coord['Longitude']
                                  )
sf_venues.head()

Hayes               Valley/Tenderloin/North of Market
South               of Market
Potrero               Hill
Chinatown               
Polk/Russian               Hill (Nob Hill)
Inner               Mission/Bernal Heights
Ingelside-Excelsior/Crocker-Amazon
Castro/Noe               Valley 
Western               Addition/Japantown
Parkside/Forest               Hill 
Haight-Ashbury
Inner               Richmond 
Outer               Richmond 
Sunset
Marina
Bayview-Hunters               Point 
St.               Francis Wood/Miraloma/West Portal
Twin               Peaks-Glen Park
Lake               Merced 
North               Beach/Chinatown
Visitacion               Valley/Sunnydale


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Louise M. Davies Symphony Hall,37.777976,-122.420157,Concert Hall
1,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,War Memorial Opera House,37.778601,-122.420816,Opera House
2,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Herbst Theater,37.779548,-122.420953,Concert Hall
3,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,San Francisco Ballet,37.77858,-122.420798,Dance Studio
4,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Asian Art Museum,37.780178,-122.416505,Art Museum


In [13]:
print('There are {} uniques categories.'.format(len(sf_venues['Venue Category'].unique())))

There are 238 uniques categories.


### 3. Methodology <a name="method"></a>

Our objective is to find the ideal location for an Italian Restaurant.<br> We want to discard neighborhoods with already many Italian Restaurants, but we consider that also places with too many food halls wouldn't be a good place, as all the people looking for a place to eat would have many choices and the chances to choose our Restuarant would be less.<br>
We will also consider population in each neighborhood, as more people living there, will mean more potential customers for the restaurant.

### 4. Analysis <a name="analysis"></a>

1) We select the venue category Italian Restaurant, to see in which neighborhoods there are already many italian Restaurants

In [14]:
# Italian Restaurants
sf_ita_venues = sf_venues[sf_venues['Venue Category'] == 'Italian Restaurant']
sf_ita_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
58,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,a Mano,37.776917,-122.423856,Italian Restaurant
150,South of Market,37.772329,-122.41087,Rocco's Cafe,37.776106,-122.408536,Italian Restaurant
270,Chinatown,37.792678,-122.40793,Venticello,37.794303,-122.413119,Italian Restaurant
324,Polk/Russian Hill (Nob Hill),37.792778,-122.42188,Ristorante Milano,37.795428,-122.419065,Italian Restaurant
335,Polk/Russian Hill (Nob Hill),37.792778,-122.42188,Seven Hills,37.795331,-122.418291,Italian Restaurant
496,Castro/Noe Valley,37.758434,-122.43512,Poesia Osteria Italiana,37.761012,-122.434347,Italian Restaurant
559,Western Addition/Japantown,37.786129,-122.43736,SPQR,37.787287,-122.433606,Italian Restaurant
606,Western Addition/Japantown,37.786129,-122.43736,Florio,37.787496,-122.433622,Italian Restaurant
644,Parkside/Forest Hill,37.743381,-122.48578,Ristorante Marcello,37.742597,-122.488683,Italian Restaurant
726,Inner Richmond,37.782029,-122.46158,Bella Trattoria Italiana,37.781315,-122.46104,Italian Restaurant


In [15]:
sf_ita_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Castro/Noe Valley,1,1,1,1,1,1
Chinatown,1,1,1,1,1,1
Hayes Valley/Tenderloin/North of Market,1,1,1,1,1,1
Inner Richmond,1,1,1,1,1,1
Marina,6,6,6,6,6,6
North Beach/Chinatown,8,8,8,8,8,8
Parkside/Forest Hill,1,1,1,1,1,1
Polk/Russian Hill (Nob Hill),2,2,2,2,2,2
South of Market,1,1,1,1,1,1
Western Addition/Japantown,2,2,2,2,2,2


Marina and North Beach/Chinatown already have 6 and 8 italian restaurants. Polk/Russian Hill and Western Addition/Japan have 2 each. We will exclude those Neighborhoods for our location, choosing one with 1 or 0 Italian Restaurants.

2)We add again the postal code and eliminate neighborhoods with italian restaurants from the dataframe

In [16]:
df_coord_b = df_coord.drop('Neighborhood',axis=1)
df_coord_b.rename(columns={'Latitude':'Neighborhood Latitude','Longitude':'Neighborhood Longitude'},inplace=True)
sf_ita_venues = pd.merge(left=sf_ita_venues,right=df_coord_b, left_on=['Neighborhood Latitude','Neighborhood Longitude'],
                 right_on=['Neighborhood Latitude','Neighborhood Longitude'])
sf_ita_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,PostalCode
0,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,a Mano,37.776917,-122.423856,Italian Restaurant,94102
1,South of Market,37.772329,-122.41087,Rocco's Cafe,37.776106,-122.408536,Italian Restaurant,94103
2,Chinatown,37.792678,-122.40793,Venticello,37.794303,-122.413119,Italian Restaurant,94108
3,Polk/Russian Hill (Nob Hill),37.792778,-122.42188,Ristorante Milano,37.795428,-122.419065,Italian Restaurant,94109
4,Polk/Russian Hill (Nob Hill),37.792778,-122.42188,Seven Hills,37.795331,-122.418291,Italian Restaurant,94109
5,Castro/Noe Valley,37.758434,-122.43512,Poesia Osteria Italiana,37.761012,-122.434347,Italian Restaurant,94114
6,Western Addition/Japantown,37.786129,-122.43736,SPQR,37.787287,-122.433606,Italian Restaurant,94115
7,Western Addition/Japantown,37.786129,-122.43736,Florio,37.787496,-122.433622,Italian Restaurant,94115
8,Parkside/Forest Hill,37.743381,-122.48578,Ristorante Marcello,37.742597,-122.488683,Italian Restaurant,94116
9,Inner Richmond,37.782029,-122.46158,Bella Trattoria Italiana,37.781315,-122.46104,Italian Restaurant,94118


In [17]:
neigh_to_exclude=[94123,94133,94109,94115]

df_coord_b = df_coord.drop('Neighborhood',axis=1)
df_coord_b.rename(columns={'Latitude':'Neighborhood Latitude','Longitude':'Neighborhood Longitude'},inplace=True)
sf_venues = pd.merge(left=sf_venues,right=df_coord_b, left_on=['Neighborhood Latitude','Neighborhood Longitude'],
                 right_on=['Neighborhood Latitude','Neighborhood Longitude'])

In [18]:
sf_venues = sf_venues[~sf_venues.PostalCode.isin(neigh_to_exclude)].reset_index()
sf_venues

Unnamed: 0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,PostalCode
0,0,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Louise M. Davies Symphony Hall,37.777976,-122.420157,Concert Hall,94102
1,1,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,War Memorial Opera House,37.778601,-122.420816,Opera House,94102
2,2,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Herbst Theater,37.779548,-122.420953,Concert Hall,94102
3,3,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,San Francisco Ballet,37.77858,-122.420798,Dance Studio,94102
4,4,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Asian Art Museum,37.780178,-122.416505,Art Museum,94102
5,5,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,War Memorial Court,37.779042,-122.420971,Park,94102
6,6,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Siam Orchid Traditional Thai Massage,37.777111,-122.417967,Massage Studio,94102
7,7,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,War Memorial Veterans Building,37.779664,-122.420334,Theater,94102
8,8,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Philz Coffee,37.781266,-122.416901,Coffee Shop,94102
9,9,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,Urban Bowls,37.778139,-122.422168,Poke Place,94102


3) We use one hot encoding and group rows by neighborhood and by taking the mean of the frequency of occurrence of each category getting

In [19]:
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sf_onehot['Neighborhood'] = sf_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sf_onehot.columns[-1]] + list(sf_onehot.columns[:-1])
sf_onehot = sf_onehot[fixed_columns]

sf_grouped = sf_onehot.groupby('Neighborhood').mean().reset_index()
sf_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Café,Camera Store,Candy Store,Cantonese Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distillery,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Entertainment Service,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hill,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Luggage Store,Marijuana Dispensary,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Music School,Music Store,Music Venue,New American Restaurant,Nightclub,Office,Opera House,Optical Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Restaurant,Rock Club,Roof Deck,Russian Restaurant,Salon / Barbershop,Sandwich Place,Sausage Shop,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,Bayview-Hunters Point,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Castro/Noe Valley,0.0,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.049383,0.0,0.0,0.0,0.012346,0.024691,0.0,0.0,0.012346,0.0,0.024691,0.0,0.012346,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.08642,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.012346,0.012346,0.0,0.0,0.0,0.0,0.012346,0.024691,0.012346,0.0,0.012346,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.012346,0.012346,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.012346,0.0,0.012346,0.024691,0.024691,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.024691,0.0,0.012346,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.049383,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.012346,0.024691
2,Chinatown,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.0625,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.0,0.03125,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.052083,0.0,0.020833,0.020833,0.010417,0.0625,0.0,0.010417,0.010417,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.0,0.0,0.0625,0.0,0.0,0.010417,0.0,0.010417,0.0,0.010417,0.010417,0.010417,0.0,0.010417,0.0,0.010417,0.0,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.010417,0.0,0.010417,0.020833,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.020833,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.020833,0.010417,0.010417,0.0,0.0,0.010417
3,Haight-Ashbury,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.02381,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.142857,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.0,0.02381,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0
4,Hayes Valley/Tenderloin/North of...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.031915,0.010638,0.021277,0.0,0.010638,0.021277,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.021277,0.042553,0.0,0.0,0.021277,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.031915,0.010638,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.042553,0.010638,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.010638,0.0,0.010638,0.010638,0.010638,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.010638,0.0,0.0,0.010638,0.010638,0.021277,0.0,0.0,0.0,0.010638,0.0,0.0,0.031915,0.0,0.0,0.021277,0.0,0.0,0.0,0.010638,0.0,0.021277,0.010638,0.010638,0.0,0.0,0.021277,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.010638,0.031915,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.031915,0.010638,0.0,0.031915,0.010638,0.0
5,Ingelside-Excelsior/Crocker-Amazon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.058824,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
6,Inner Mission/Bernal Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.085106,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Inner Richmond,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.015385,0.0,0.015385,0.0,0.046154,0.0,0.0,0.0,0.015385,0.015385,0.0,0.015385,0.030769,0.015385,0.046154,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.046154,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.015385,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.030769,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.015385,0.015385,0.061538,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.030769,0.0,0.015385,0.030769,0.015385
8,Lake Merced,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.018519,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.018519,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.037037,0.0,0.0,0.055556,0.0,0.018519,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0
9,Outer Richmond,0.0,0.0,0.0,0.0,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.042553,0.021277,0.106383,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085106,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.06383,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0


In [20]:
num_top_venues = 5

for hood in sf_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sf_grouped[sf_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bayview-Hunters               Point ----
                             venue  freq
0  Southern / Soul Food Restaurant  0.19
1                      Coffee Shop  0.10
2               Mexican Restaurant  0.10
3                           Bakery  0.10
4                         Pharmacy  0.05


----Castro/Noe               Valley ----
             venue  freq
0          Gay Bar  0.09
1  Thai Restaurant  0.05
2      Coffee Shop  0.05
3      Yoga Studio  0.02
4   Scenic Lookout  0.02


----Chinatown               ----
                venue  freq
0         Coffee Shop  0.06
1               Hotel  0.06
2              Bakery  0.06
3  Chinese Restaurant  0.05
4     Bubble Tea Shop  0.03


----Haight-Ashbury----
                    venue  freq
0             Coffee Shop  0.14
1                    Park  0.05
2             Pizza Place  0.05
3  Thrift / Vintage Store  0.05
4                  Bakery  0.05


----Hayes               Valley/Tenderloin/North of Market----
                           venue

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

4) Display top 10 venues of each neighborhood

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sf_grouped['Neighborhood']

for ind in np.arange(sf_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayview-Hunters Point,Southern / Soul Food Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Thrift / Vintage Store,Bus Station,Garden,Light Rail Station,Taco Place,Gym
1,Castro/Noe Valley,Gay Bar,Coffee Shop,Thai Restaurant,Yoga Studio,Indian Restaurant,Clothing Store,Deli / Bodega,Café,Playground,Pizza Place
2,Chinatown,Coffee Shop,Bakery,Hotel,Chinese Restaurant,Dim Sum Restaurant,Bubble Tea Shop,Sushi Restaurant,Szechuan Restaurant,Clothing Store,Steakhouse
3,Haight-Ashbury,Coffee Shop,Park,Thrift / Vintage Store,Pizza Place,Bakery,Boutique,Breakfast Spot,Gastropub,Liquor Store,Dog Run
4,Hayes Valley/Tenderloin/North of...,Coffee Shop,Hotel,Pizza Place,Theater,Beer Bar,French Restaurant,Vegetarian / Vegan Restaurant,Café,Wine Bar,Cocktail Bar


5) We finally create 7 __Clusters__ for the Neighborhoods to see which one suits the best for the location of the Restuarant

In [23]:
# set number of clusters
kclusters = 7
sf_grouped_clustering = sf_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([6, 1, 4, 4, 4, 4, 4, 1, 4, 1])

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sf_merged = df_coord[~df_coord.PostalCode.isin(neigh_to_exclude)].reset_index()

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
sf_merged = sf_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [25]:
#sf_merged['Cluster Labels'] = sf_merged['Cluster Labels'].astype(int)

sf_merged.head()

Unnamed: 0,index,PostalCode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,94102,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,4,Coffee Shop,Hotel,Pizza Place,Theater,Beer Bar,French Restaurant,Vegetarian / Vegan Restaurant,Café,Wine Bar,Cocktail Bar
1,1,94103,South of Market,37.772329,-122.41087,4,Nightclub,Food Truck,Motorcycle Shop,Cocktail Bar,Gay Bar,Thai Restaurant,Wine Bar,Bar,Restaurant,Coffee Shop
2,2,94107,Potrero Hill,37.766529,-122.39577,4,Coffee Shop,Breakfast Spot,Park,Deli / Bodega,Café,Wine Shop,Mexican Restaurant,French Restaurant,Bubble Tea Shop,Pet Store
3,3,94108,Chinatown,37.792678,-122.40793,4,Coffee Shop,Bakery,Hotel,Chinese Restaurant,Dim Sum Restaurant,Bubble Tea Shop,Sushi Restaurant,Szechuan Restaurant,Clothing Store,Steakhouse
4,5,94110,Inner Mission/Bernal Heights,37.74873,-122.41545,4,Mexican Restaurant,Pizza Place,Park,Dive Bar,Gym / Fitness Center,Coffee Shop,Grocery Store,Deli / Bodega,Bookstore,Food & Drink Shop


In [26]:
sf_data_r=sf_data.drop('Neighborhood',axis=1)

sf_merged = pd.merge(left=sf_merged,right=sf_data_r,how='left', left_on='PostalCode',
                 right_on='PostalCode')
sf_merged.head()

Unnamed: 0,index,PostalCode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
0,0,94102,Hayes Valley/Tenderloin/North of...,37.779329,-122.41915,4,Coffee Shop,Hotel,Pizza Place,Theater,Beer Bar,French Restaurant,Vegetarian / Vegan Restaurant,Café,Wine Bar,Cocktail Bar,28991
1,1,94103,South of Market,37.772329,-122.41087,4,Nightclub,Food Truck,Motorcycle Shop,Cocktail Bar,Gay Bar,Thai Restaurant,Wine Bar,Bar,Restaurant,Coffee Shop,23016
2,2,94107,Potrero Hill,37.766529,-122.39577,4,Coffee Shop,Breakfast Spot,Park,Deli / Bodega,Café,Wine Shop,Mexican Restaurant,French Restaurant,Bubble Tea Shop,Pet Store,17368
3,3,94108,Chinatown,37.792678,-122.40793,4,Coffee Shop,Bakery,Hotel,Chinese Restaurant,Dim Sum Restaurant,Bubble Tea Shop,Sushi Restaurant,Szechuan Restaurant,Clothing Store,Steakhouse,13716
4,5,94110,Inner Mission/Bernal Heights,37.74873,-122.41545,4,Mexican Restaurant,Pizza Place,Park,Dive Bar,Gym / Fitness Center,Coffee Shop,Grocery Store,Deli / Bodega,Bookstore,Food & Drink Shop,74633


6) We show the results in a map of San francisco, assigning a color to each cluster

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sf_merged['Latitude'], sf_merged['Longitude'], sf_merged['Neighborhood'], sf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

7) We analyze neighborhoods from the first cluster

In [28]:
sf_merged.loc[sf_merged['Cluster Labels'] == 0, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
16,94134,0,Trail,Garden,Baseball Field,Park,Donut Shop,Fast Food Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,40134


8) We analyze neighborhoods from the second cluster

In [29]:
sf_merged.loc[sf_merged['Cluster Labels'] == 1, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
6,94114,1,Gay Bar,Coffee Shop,Thai Restaurant,Yoga Studio,Indian Restaurant,Clothing Store,Deli / Bodega,Café,Playground,Pizza Place,30574
7,94116,1,Chinese Restaurant,Dumpling Restaurant,Café,Light Rail Station,Bubble Tea Shop,Sandwich Place,Korean Restaurant,Sushi Restaurant,Dance Studio,Burrito Place,42958
9,94118,1,Thai Restaurant,Bakery,Chinese Restaurant,Burmese Restaurant,Café,Pizza Place,Vietnamese Restaurant,Wine Shop,Japanese Restaurant,Bubble Tea Shop,38939
10,94121,1,Café,Convenience Store,Chinese Restaurant,Pizza Place,American Restaurant,Pharmacy,Dessert Shop,Japanese Restaurant,Grocery Store,Bus Station,42473


9) We analyze neighborhoods from the third cluster

In [30]:
sf_merged.loc[sf_merged['Cluster Labels'] == 2, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
11,94122,2,Chinese Restaurant,Playground,Café,Pharmacy,Train Station,Shoe Store,Yoga Studio,Farmers Market,Food Court,Food & Drink Shop,55492


10) We analyze neighborhoods from the fourth cluster

In [32]:
sf_merged.loc[sf_merged['Cluster Labels'] == 3, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
13,94127,3,Trail,Bus Line,Yoga Studio,Farmers Market,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,Flower Shop,20624


11) We analyze neighborhoods from the fifth cluster

In [33]:
sf_merged.loc[sf_merged['Cluster Labels'] == 4, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
0,94102,4,Coffee Shop,Hotel,Pizza Place,Theater,Beer Bar,French Restaurant,Vegetarian / Vegan Restaurant,Café,Wine Bar,Cocktail Bar,28991
1,94103,4,Nightclub,Food Truck,Motorcycle Shop,Cocktail Bar,Gay Bar,Thai Restaurant,Wine Bar,Bar,Restaurant,Coffee Shop,23016
2,94107,4,Coffee Shop,Breakfast Spot,Park,Deli / Bodega,Café,Wine Shop,Mexican Restaurant,French Restaurant,Bubble Tea Shop,Pet Store,17368
3,94108,4,Coffee Shop,Bakery,Hotel,Chinese Restaurant,Dim Sum Restaurant,Bubble Tea Shop,Sushi Restaurant,Szechuan Restaurant,Clothing Store,Steakhouse,13716
4,94110,4,Mexican Restaurant,Pizza Place,Park,Dive Bar,Gym / Fitness Center,Coffee Shop,Grocery Store,Deli / Bodega,Bookstore,Food & Drink Shop,74633
5,94112,4,Pizza Place,Mexican Restaurant,Vietnamese Restaurant,Sandwich Place,Bus Station,Dessert Shop,Restaurant,Filipino Restaurant,Hot Dog Joint,Metro Station,73104
8,94117,4,Coffee Shop,Park,Thrift / Vintage Store,Pizza Place,Bakery,Boutique,Breakfast Spot,Gastropub,Liquor Store,Dog Run,38738
15,94132,4,Coffee Shop,Juice Bar,Pizza Place,Food Truck,Cosmetics Shop,Clothing Store,Sandwich Place,Lingerie Store,Candy Store,Bakery,26291


12) We analyze neighborhoods from the sixth cluster

In [34]:
sf_merged.loc[sf_merged['Cluster Labels'] == 5, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
14,94131,5,Park,Trail,Dim Sum Restaurant,Shopping Mall,Coffee Shop,Grocery Store,Salon / Barbershop,Korean Restaurant,Pharmacy,Playground,27897


13) We analyze neighborhoods from the seventh cluster

In [35]:
sf_merged.loc[sf_merged['Cluster Labels'] == 6, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
12,94124,6,Southern / Soul Food Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Thrift / Vintage Store,Bus Station,Garden,Light Rail Station,Taco Place,Gym,33170


### 5. Results <a name="results"></a>

After eliminating neighborhoods with more than 1 Italian Restaurant, we analyzed venues and population in each neighborhood. We cluster them by Venue Category, separating them in 7 clusters.

### 6.Discussion <a name="discussion"></a>

Analyzing clusters we see that most of neighborhoods enter in the clusters number 1 and 4, even if some of these neighborhoods are highly populated, the most common venues in each of them are Restaurants, Pizza Places, Food Trucks and Coffee Shops, as all of them provide food, so it would be a competitive area for a new restaurant. We also discard culter number 6, because the most common venues are Restaurants, and besides Vintage Stores and Gyms, there are not many store, so not many people would look there for a place to eat. SAme goes for cluster 3 and 5, they are not highly populated and we consider there are enough restaurants for the volume of shops and people living there.<br>
We keep cluster 0 and 2, both of them have a population above 40.000 people.<br>
The neighborhood in cluster 2 (Sunset), with 55.492 people, has Chinese Restaurants as the first most common venue, but as it is a really different type of couisine from the Italian one, it is unlikely that they would have the same public. From the 2nd to the 8th most common venue, we can only find Cafes, so there will be probably a lot of people working there, bringing their kids on Playgrounds, going to the Farmers Market or Yoga Studios, moreover, there are many Train Stations, so it will be easy to reach even without living in a walking distance from the Restaurant.<br>
On the other side, the neighborhood of cluster number 0 (Visitation Valley/Sunnydale) is less populated (40.134 people) and even if the first 3 most common venues are not related with food, suggesting a shortage of restaurants, we don't see other businesses in the 10 most common venues, just gardens, park, baseball fields in the first places, and food courts and trucks in the second half.

### 7. Conclusions <a name="conclusions"></a>

We consider that the neighborhood Sunset is the best place to open an Italian Restaurant, as it is highly populated, with other businesses not related with food and with no other Italian Restaurants. <br>
We will explain to our client that the neighborhood of Visitation Valley/Sunnydale is also an option they could consider, if the place suits them better, as there are not many Restaurants, but it may be harder to run a restaurant there, rather than in Sunset, since there would be probably less people looking for a Rastaurant there.