# Introduction

### The goal of this project is to provide suggestions on selecting neighborhood for opening an Asian restaurant in Amsterdam. 
### The audience of this project is people who want to open an Asian restaurant in Amsterdam. They will find the outcome of this project useful because neighborhood will determine the passenger flow and the environment, which is crucial for opening an restaurant. Therefore, finding similar neighborhoods as the neighborhood of top-rated restaurant will be helpful. 
### To do that, first, I obtained the target neighborhood where top rated Asian restaurant is located. Second, I clustered all neighborhoods of Amsterdam by their nearby venues. Finally, neighborhoods that are in the same group with the target neighborhood are recommended choices of neighborhood for opening an Asian restaurant in Amsterdam. 

# Procedure and Data Source
1. Get the neighborhoods of Amsterdam from ___wikipedia___. 
2. Get the latitudes and longitudes of these neighborhoods through ___geopy___. 
3. Find Asian restaurants near Amsterdam (within 5000m) through ___Foursquare___. 
4. Find the highest rated one among these restaurants through ___Foursquare___ and the neighborhood it is located in. 
5. Get the nearby venues of the neighborhoods of Amsterdam through ___Foursquare___. 
6. Cluster the neighborhoods based on their nearby venues using ___KMeans___. 
7. Find the neighborhoods that are in the same cluster as the top rated Asian restaurant. 
8. Visualize the clustered neighborhoods of Amsterdam on map using ___Folium___. 



In [1]:
import pandas as pd
import numpy as np
import json
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### 1. Get the neighborhoods of Amsterdam

In [4]:
## The Source
#get the table of neighborhoods in Amsterdam from wikipedia
url = 'https://en.wikipedia.org/wiki/Template:Neighborhoods_of_Amsterdam'
results = requests.get(url)
html = results.text
ams = pd.read_html(html)
ams[0].iloc[:, 1]

0    Binnenstad (Oude Zijde - Nieuwe Zijde) Grachte...
1    Geuzenveld (De Eendracht) Nieuw Sloten Oostoev...
2    Banne Buiksloot Buiksloot Buikslotermeer Flora...
3    IJburg (Haveneiland - Rieteilanden - Steigerei...
4    Admiralenbuurt Bos en Lommer (Kolenkitbuurt - ...
5    Apollobuurt Buitenveldert Hoofddorppleinbuurt ...
6    Bijlmer Bullewijk Driemond Gaasperdam Holendre...
7            Teleport Westelijk Havengebied (Ruigoord)
8    Former boroughs: De Baarsjes - Geuzenveld-Slot...
Name: vteNeighbourhoods of Amsterdam.1, dtype: object

#### However, it is too difficult for me to preprocess this data by code. So I preprocessed it in excel and saved as a .csv file. Then I readed it into DataFrame

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
ams_data = pd.read_csv(body)
ams_data.head()

Unnamed: 0,Neighborhoods
0,Binnenstad
1,Grachtengordel
2,Haarlemmerbuurt
3,Jodenbuurt
4,Jordaan


### 2. Get latitudes and longitudes of the neighborhoods in Amsterdam

In [7]:
#Create a function that get the latitude and longitude of neighborhoods through geopy 
#And record those who cannot be found latitude and longitude as 0 and print "'xx' cannot be found"
def get_loc_info(names):
    loc_info = []
    for name in names:
        addr = '{}, Netherlands'.format(name)
        geolocator = Nominatim(user_agent = 'foursquare_agent')
        loc = geolocator.geocode(addr)
        try:
            lat = loc.latitude
            lng = loc.longitude
            loc_info.append([name, lat, lng])
        except:
            loc_info.append([name, 0, 0])
            print(str(name) + ' -- cannot be found')
    return loc_info

#### Because of the limitation of geopy "service not available" error, I run the code separately on different segment of the list, and add them up at the end to get the latitudes and longitudes of the whole neighborhoods list. 

In [8]:
#get the latitude and longitude of neighborhoods from index 0 to 15
loc_info_1 = get_loc_info(ams_data['Neighborhoods'][:16])

Jodenbuurt -- cannot be found
Westelijke Eilanden -- cannot be found


In [10]:
#get the latitude and longitude of neighborhoods from index 16 to 29
loc_info_2 = get_loc_info(ams_data['Neighborhoods'][16:30])

Oud Osdorp -- cannot be found


In [11]:
#get the latitude and longitude of neighborhoods from index 30 to 45
loc_info_3 = get_loc_info(ams_data['Neighborhoods'][30:46])

Nieuwendammerdijk en Buiksloterdijk -- cannot be found
Admiralenbuurt -- cannot be found
ChassZbuurt -- cannot be found


In [12]:
#get the latitude and longitude of neighborhoods from index 46 to 61
loc_info_4 = get_loc_info(ams_data['Neighborhoods'][46:62])

Postjesbuurt -- cannot be found
Trompbuurt -- cannot be found
Hoofddorppleinbuurt -- cannot be found


In [13]:
#get the latitude and longitude of neighborhoods from index 62 to 76
loc_info_5 = get_loc_info(ams_data['Neighborhoods'][62:77])

Prinses Irenebuurt -- cannot be found
Vondelparkbuurt -- cannot be found
Westelijk Havengebied -- cannot be found


In [14]:
#combine the sublists into one list
loc_info = loc_info_1 + loc_info_2 + loc_info_3 + loc_info_4 + loc_info_5
#Transform loc_info (list) into dataframe 
neigh_df = pd.DataFrame(loc_info, columns = ['Neighborhoods', 'Latitude', 'Longitude'])
neigh_df.shape

(77, 3)

In [15]:
#Delete neighborhoods(rows) who cannot be found location information (recorded as 0)
neigh_df = neigh_df[neigh_df['Longitude'] != 0]
neigh_df.shape

(65, 3)

### 3. find Asian restaurants in Amsterdam

In [16]:
# The code was removed by Watson Studio for sharing.

In [17]:
#radius is set to 5000, because larger range is needed
VERSION = '20200125' # Foursquare API version
LIMIT = 100
search_query = 'Asian'
radius = 5000

In [18]:
#get latitude and longitude of Amsterdam
address = 'Amsterdam, North Holland, the Netherlands'
geolocator = Nominatim(user_agent = 'foursquare_agent')
location = geolocator.geocode(address)
ams_latitude = location.latitude
ams_longitude = location.longitude
print(ams_latitude, ams_longitude)

52.3727598 4.8936041


In [38]:
#search Asian restaurants in Amsterdam
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, ams_latitude, ams_longitude, search_query, radius, LIMIT)
results = requests.get(url).json()['response']['venues']

In [21]:
#create a list named 'venues_list' containing names, IDs, latitudes and longitudes of Asian venues
venues_list = []
venues_list.append([(v['name'],
                  v['id'],
                  v['location']['lat'], 
                  v['location']['lng']) for v in results])
venues_list

[[('Asian Kitchen',
   '4a27086ef964a520b0901fe3',
   52.366616660477,
   4.893176328253022),
  ('Asian Spirit (Eberhardt)',
   '4eda278729c2b9122816f23c',
   52.372197233081,
   4.894600326971349),
  ('The Asian Carribean', '4a27db29f964a52056921fe3', 52.373423, 4.895536),
  ('Asian Supermarket',
   '4d7f4bd2e7e1721e811eef0b',
   52.36241161823273,
   4.864116311073303),
  ('Asian Food Market',
   '4f253130e4b006e5c2eb7979',
   52.37248370353377,
   4.8995486559957575),
  ('Asian Food Festival',
   '50cdcd6ae4b05e62d00d8f58',
   52.3724250793457,
   4.900631904602051),
  ('Asia Nails & Massage',
   '4c1a0fd1838020a10095e661',
   52.373781521929345,
   4.8824636616219035),
  ('Asian King',
   '4ecd30b346907179cbd53dfc',
   52.36642185649256,
   4.891932538532796),
  ('Asian Beauty Salon',
   '4b697583f964a52030a32be3',
   52.36636712000001,
   4.897131),
  ('New Asian', '4de8e4a1d1648c97963d8924', 52.369072, 4.888828),
  ('Asian Taste Restaurant',
   '4a2704a1f964a520cc851fe3',
   52.3

In [22]:
#transform venues_list(list) into dataframe
venues = pd.DataFrame(item for venue_list in venues_list for item in venue_list)
venues.columns = ['Name', 'Id', 'Latitude', 'Longitude']
venues.head()

Unnamed: 0,Name,Id,Latitude,Longitude
0,Asian Kitchen,4a27086ef964a520b0901fe3,52.366617,4.893176
1,Asian Spirit (Eberhardt),4eda278729c2b9122816f23c,52.372197,4.8946
2,The Asian Carribean,4a27db29f964a52056921fe3,52.373423,4.895536
3,Asian Supermarket,4d7f4bd2e7e1721e811eef0b,52.362412,4.864116
4,Asian Food Market,4f253130e4b006e5c2eb7979,52.372484,4.899549


### 4. Find the restaurant with highest rating and the neighborhood it is located in

In [23]:
#define a function that intake IDs of venues and return a list of ratings of venues
#those without ratings will be recorded as 0
def find_ratings(IDs): 
    ratings = []
    for venue_id in IDs:
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
        results = requests.get(url).json()
        try:
            ratings.append([venue_id, 
                            results['response']['venue']['rating'], 
                            results['response']['venue']['location']['lat'], 
                            results['response']['venue']['location']['lng']])
        except:
            ratings.append([venue_id, 
                            0, 
                            results['response']['venue']['location']['lat'], 
                            results['response']['venue']['location']['lng']])
    return ratings
        

In [24]:
#apply the function on Asian venues
ratings = find_ratings(venues['Id'])
ratings

[['4a27086ef964a520b0901fe3', 8.2, 52.366616660477, 4.893176328253022],
 ['4eda278729c2b9122816f23c', 0, 52.372197233081, 4.894600326971349],
 ['4a27db29f964a52056921fe3', 0, 52.373423, 4.895536],
 ['4d7f4bd2e7e1721e811eef0b', 0, 52.36241161823273, 4.864116311073303],
 ['4f253130e4b006e5c2eb7979', 0, 52.37248370353377, 4.8995486559957575],
 ['50cdcd6ae4b05e62d00d8f58', 0, 52.3724250793457, 4.900631904602051],
 ['4c1a0fd1838020a10095e661', 0, 52.373781521929345, 4.8824636616219035],
 ['4ecd30b346907179cbd53dfc', 0, 52.36642185649256, 4.891932538532796],
 ['4b697583f964a52030a32be3', 0, 52.36636712000001, 4.897131],
 ['4de8e4a1d1648c97963d8924', 0, 52.369072, 4.888828],
 ['4a2704a1f964a520cc851fe3', 5.8, 52.36076210657005, 4.827015078474296],
 ['58d2bc6015fb4351252b0ab5', 0, 52.380648, 4.890836],
 ['509fd20ee4b0872dbe035f6b', 0, 52.36229, 4.89889],
 ['58038a5a38faaf1a8fa8e962', 6.2, 52.35702, 4.8989835],
 ['4f95744ce4b0f08d418b6910', 0, 52.36643866152377, 4.867361964343783],
 ['5ac35fb1a

In [25]:
#transform ratings(list) into dataframe
ratings_df = pd.DataFrame(ratings, columns = ['Id', 'Rating', 'Latitude', 'Longitude'])
#find the venue with highest rating
top_rated = ratings_df.sort_values(by = 'Rating', ascending = False).head(1)
top_rated

Unnamed: 0,Id,Rating,Latitude,Longitude
25,4a270782f964a520088e1fe3,8.5,52.356416,4.878249


In [26]:
#Get the neighborhood of this top-rated venue by finding the closest distance between this venue and the neighbothood
distance = []
#calculate the distance between this venue and every neighborhood and get a list of distances
for name, lat, lng in zip(neigh_df['Neighborhoods'], neigh_df['Latitude'], neigh_df['Longitude']): 
    dis = np.sqrt((float(lat) - float(top_rated['Latitude']))**2 + (float(lng) - float(top_rated['Longitude']))**2)
    distance.append([name, dis])
#transform this distance(list) into dist(dataframe)
dist = pd.DataFrame(distance, columns = ['Neighborhoods', 'Distance'])
#get the closest neighborhood's name
best_neigh = dist.sort_values(by = 'Distance', ascending = True).loc[0, 'Neighborhoods']
best_neigh

'Binnenstad'

### 5. Get the nearby venues of neighborhoods of Amsterdam

In [27]:
#define a function that get nearby venues of neighborhoods
def get_venues(neighborhoods, latitudes, longitudes): 
    venues_list = []
    for name, lat, lng in zip(neighborhoods, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results1 = requests.get(url).json()#['response']['groups'][0]['items']
        results = results1['response']['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name']
            ) for v in results])
    nearby_venues = pd.DataFrame(item for venue_list in venues_list for item in venue_list)
    nearby_venues.columns = ['Neighborhoods', 'Latitude', 'Longitude', 'Venue_name', 'Venue_lat', 'Venue_lng', 'Category']
    return nearby_venues

In [28]:
nearby_venues = get_venues(neigh_df['Neighborhoods'], neigh_df['Latitude'], neigh_df['Longitude'])
nearby_venues.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude,Venue_name,Venue_lat,Venue_lng,Category
0,Binnenstad,50.849271,5.688756,Boekhandel Dominicanen,50.850129,5.689706,Bookstore
1,Binnenstad,50.849271,5.688756,Vrijthof,50.849295,5.689001,Plaza
2,Binnenstad,50.849271,5.688756,Onze Lieve Vrouweplein,50.847734,5.692877,Plaza
3,Binnenstad,50.849271,5.688756,Il Bacaro,50.848495,5.689203,Italian Restaurant
4,Binnenstad,50.849271,5.688756,With Love Burrito,50.850991,5.690527,Burrito Place


In [29]:
#get dummy variables for venue categories
features = pd.get_dummies(nearby_venues[['Category']], prefix = "", prefix_sep = "")
#insert neighborhoods column to dummy variables
features.insert(0, 'Neighborhoods', nearby_venues['Neighborhoods'])
#calculate values of categories for each neighborhood
features = features.groupby('Neighborhoods').mean().reset_index()
features.head()

Unnamed: 0,Neighborhoods,Adult Boutique,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brasserie,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Butcher,Café,Camera Store,Campground,Canal,Candy Store,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Distillery,Doner Restaurant,Drugstore,Dutch Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food Court,Food Service,Food Stand,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Friterie,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hockey Field,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Lighthouse,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Venue,Nightclub,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outlet Mall,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Café,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Restaurant,Rock Club,Salad Place,Sandwich Place,Satay Restaurant,Scandinavian Restaurant,Scenic Lookout,Science Museum,Scottish Restaurant,Seafood Restaurant,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Spa,Spanish Restaurant,Spiritual Center,Sporting Goods Shop,Sports Club,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tour Provider,Track,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Apollobuurt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.03,0.0
1,Banne Buiksloot,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bijlmer,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.12,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Binnenstad,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.06,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,Bos en Lommer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.03,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.09,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0


### 6. Cluster the neighborhoods based on their nearby venues

In [30]:
#import k-means for cluster stage
from sklearn.cluster import KMeans

In [31]:
#cluster the neighborhoods into 5 groups
kclusters = 5
km = KMeans(n_clusters = kclusters, random_state = 0).fit(features.drop('Neighborhoods', 1))

In [32]:
#insert labels of neighborhoods
features.insert(1, 'Labels', km.labels_)
#extract 'Neighborhoods' and 'Labels' columns into a new dataframe 'labels' 
labels = features[['Neighborhoods', 'Labels']]
labels.head()

Unnamed: 0,Neighborhoods,Labels
0,Apollobuurt,4
1,Banne Buiksloot,1
2,Bijlmer,0
3,Binnenstad,1
4,Bos en Lommer,4


In [33]:
#insert location information into "labels" dataframe
labels = labels.join(neigh_df.set_index('Neighborhoods'), on = 'Neighborhoods', how = 'left')
labels.head()

Unnamed: 0,Neighborhoods,Labels,Latitude,Longitude
0,Apollobuurt,4,52.348073,4.875559
1,Banne Buiksloot,1,52.408505,4.918843
2,Bijlmer,0,52.317033,4.964991
3,Binnenstad,1,50.849271,5.688756
4,Bos en Lommer,4,52.378521,4.848738


### 7. Find the neighborhoods that are in the same cluster as the top_rated restaurant

In [34]:
#find the cluster of the neighborhood of best-rated asian restaurant
best_cluster = labels[labels['Neighborhoods'] == best_neigh].iloc[0,1]
#find the neighborhoods in the same cluster of the neighborhood of the best-rated asian restaurant
best_cluster_neighborhoods = labels[labels['Labels'] == best_cluster]
best_cluster_neighborhoods
#the following listed neighborhoods are neighborhoods suggested for opening a new asian restaurant!

Unnamed: 0,Neighborhoods,Labels,Latitude,Longitude
1,Banne Buiksloot,1,52.408505,4.918843
3,Binnenstad,1,50.849271,5.688756
19,Ijburg,1,52.354994,4.997157
23,Kadoelen,1,52.416447,4.899304
29,Nieuwendam,1,52.394472,4.953412
36,Oostzanerwerf,1,52.419487,4.889508
43,Rapenburg,1,52.155884,4.487615
55,Tuindorp Oostzaan,1,52.412796,4.888052
62,Willemspark,1,52.086459,4.305179


### 8. Visualize the clustered neighborhoods of Amsterdam on map

In [35]:
#create map
ams_map = folium.Map(location = [ams_latitude, ams_longitude], zoom_start = 12)

In [36]:
#set up colors for clusters
import matplotlib.cm as cm
import matplotlib.colors as colors
color_array = cm.rainbow(np.linspace(1,0,kclusters))
rainbow = [colors.rgb2hex(i) for i in color_array]

In [37]:
#add markers to ams_map
for lat, lng, cluster, name in zip(labels['Latitude'], labels['Longitude'], labels['Labels'], labels['Neighborhoods']):
    label = folium.Popup(str(name) + ' cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, lng], 
        radius = 5, 
        popup = label, 
        color = rainbow[cluster-1], 
        fill = True, 
        fill_color = rainbow[cluster-1], 
        fill_opacity = 0.8
    ).add_to(ams_map)
ams_map