# BERLIN // Where to open a new restaurant: neigborhood analysis

# 1 - Introduction

## 1.1 Discussion of the problem

After COVID pandemic more problematic waves have passsed and once vaccination process is fully advanced, a well known spanish hospitality company is planning to open a new hotel in Berlin. What will be the best place to do it?

## 1.2 Discussion of the background

During 2020 tourism was one the main industries affected by lockdown decisions around the world. According to UN data, during 2020 international arrivals are estimated to have dropped to 381 million, down from 1.461 billion $ in 2019 — a 74% decline. In countries whose economies are heavily reliant on tourism as the suth of Europe (Italy, Portugal, Greece or Spain), the precipitous drop in visitors was, and remains, devastating.

Berlin was not left out of this huge crisis. It is the capital and the biggest city of Germany, the second most populous city in the European Union, Berlin has nearly 3,6 million residents from more than 190 countries with a population density of 4,200 people per km², the city is divided into 12 boroughs, 95 neighborhoods. 
Also it is considered a top European destination – ranked third after London and Paris.

During 2020 even though the world is facing the Coronavirus crisis, Berlin welcomed almost 5 millions tourists in the whole year 2020, which represents a decrease of 65% of the same period in 2019.
At the beggining of 2021, between January and April 400,000 tourists have visited Berlin, and it is expected these figures could rise as vaccination process improves and frontiers are widely opened.
Actually there are 635 accommodation establishments classified as "hotels" (includes hotels, guesthouses and bed & breakfast properties) in Berlin.

In order to face this issue, we can solve this problem by creating a map and information chart that shows the real distribution of hotels in Berlin and clustering each area according to the density of the place.
We will need to find a method to use Foursquare location data where machine learning to help us make decisions for hte spanish hospitality company. 

In this project, I will try to use Foursquare location data and clustering methods to divide regions into different groups based on their hotel location information.

# 2 - Data description: how it helps to solve the problem

For this project, data needed is as follows:

**1 - Berlin neigborood data: list of Boroughs and neighborhoods and their latitudes and longitudes.**
<ul>
<li> Data source: https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin </li>
<li> Description: We will discard the Berlin area (district) table through Wikipedia. Then using geocoder class of the Geopy to get coordinates (lattitude and longited) of these 12 main areas. </li>
</ul>
    
**2 - Hotels in each neighborhood in Berlin:**

<ul>
<li> Data source: Foursquare API </li>
<li> Description: By using this API, we will obtain all venues in each community. We can filter these places to get only hotels. </li>
</ul>

# 3 - Methodology

## 3.1 Getting information from Berlin's neighborhood

First of all, we get information about boroughs and neighborhood of Berlin scrapped from Wikipedia

In [1]:
!pip install bs4
from bs4 import BeautifulSoup
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation



In [2]:
!wget -O berlin.html https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin

--2021-07-19 16:42:59--  https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin
Resolving en.wikipedia.org (en.wikipedia.org)... 208.80.154.224, 2620:0:861:ed1a::1
Connecting to en.wikipedia.org (en.wikipedia.org)|208.80.154.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 207108 (202K) [text/html]
Saving to: ‘berlin.html’


2021-07-19 16:43:00 (616 KB/s) - ‘berlin.html’ saved [207108/207108]



Parse the html file

In [3]:
with open('berlin.html','r') as berlin_html:
    soup_berlin = BeautifulSoup(berlin_html, 'html.parser')

Create a dataframe with the list of neighbourhoods from the html file

In [4]:
df_berlin = []
for tr in soup_berlin.find_all('tr'):
    row = tr.text.replace('(','').replace(')','')
    row = row.split('\n')
    row = list(filter(lambda s: s != '', row)) # delete empty strings from list
    row = list(map(lambda s: s.strip(), row)) # remove leading and trailing spaces from strings in list
                 
    if row[0][0:4].isdigit():
        row = row[0].split(' ', 1)
        df_berlin.append(row)

df_berlin = pd.DataFrame(df_berlin)
df_berlin.columns = ['neighborhood_id', 'neighborhood']

Get list of boroughs in ID order and add to each neighbourhood

In [5]:
boroughs = []
for dt in soup_berlin.find_all('dt'):
    boroughs.append(dt.text[5:])

# add borough 
borough = []
for lid in df_berlin.neighborhood_id:
    borough.append(boroughs[int(lid)//100-1])
    
df_berlin['borough'] = borough
df_berlin['city'] = 'Berlin'

df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city
0,0101,Mitte,Mitte,Berlin
1,0102,Moabit,Mitte,Berlin
2,0103,Hansaviertel,Mitte,Berlin
3,0104,Tiergarten,Mitte,Berlin
4,0105,Wedding,Mitte,Berlin
...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin
92,1208,Lübars,Reinickendorf,Berlin
93,1209,Wittenau,Reinickendorf,Berlin
94,1210,Märkisches Viertel,Reinickendorf,Berlin


## 3.2 Adding coordinates for each neighborhood

Now the aim is to add coordinates details for each of 96 neighborhood. We will use Geopy client detailes as follows. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>be_explorer</em>, as shown below.

In [6]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



Getting information about Berlin coordinates:

In [7]:
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent="be_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Berlin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Berlin are 52.5170365, 13.3888599.


Now details about latitude and longitude for all neighborhoods

In [8]:
geolocator = Nominatim(user_agent="be_explorer")

df_berlin['neighborhood_coord']= df_berlin['neighborhood'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df_berlin[['Latitude', 'Longitude']] = df_berlin['neighborhood_coord'].apply(pd.Series)

df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city,neighborhood_coord,Latitude,Longitude
0,0101,Mitte,Mitte,Berlin,"(39.98020495, -7.905590887431517)",39.980205,-7.905591
1,0102,Moabit,Mitte,Berlin,"(52.5301017, 13.3425422)",52.530102,13.342542
2,0103,Hansaviertel,Mitte,Berlin,"(52.5191234, 13.3418725)",52.519123,13.341872
3,0104,Tiergarten,Mitte,Berlin,"(50.3409222, 6.956329)",50.340922,6.956329
4,0105,Wedding,Mitte,Berlin,"(52.550123, 13.34197)",52.550123,13.341970
...,...,...,...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin,"(52.6080354, 13.3225327)",52.608035,13.322533
92,1208,Lübars,Reinickendorf,Berlin,"(52.6146467, 13.3530197)",52.614647,13.353020
93,1209,Wittenau,Reinickendorf,Berlin,"(52.5912366, 13.3233195)",52.591237,13.323320
94,1210,Märkisches Viertel,Reinickendorf,Berlin,"(52.5993123, 13.3565324)",52.599312,13.356532


In [9]:
df_berlin.drop(['neighborhood_coord'], axis=1, inplace=True)
df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city,Latitude,Longitude
0,0101,Mitte,Mitte,Berlin,39.980205,-7.905591
1,0102,Moabit,Mitte,Berlin,52.530102,13.342542
2,0103,Hansaviertel,Mitte,Berlin,52.519123,13.341872
3,0104,Tiergarten,Mitte,Berlin,50.340922,6.956329
4,0105,Wedding,Mitte,Berlin,52.550123,13.341970
...,...,...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin,52.608035,13.322533
92,1208,Lübars,Reinickendorf,Berlin,52.614647,13.353020
93,1209,Wittenau,Reinickendorf,Berlin,52.591237,13.323320
94,1210,Märkisches Viertel,Reinickendorf,Berlin,52.599312,13.356532


Create a map of berlin with neighborhoods details.

In [10]:
# create map of Cologne using latitude and longitude 
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
for lat, lng, label in zip(df_berlin['Latitude'], df_berlin['Longitude'], df_berlin['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin) 
    
map_berlin

## 3.3 Data analysis using Foursquare API

The aim of this part is getting details using data exploraty analysis in order to extract valuable information and insights about all these 96 different neigborhoods. The aim is getting rich information which could help us to make the rights decisions.

### 3.3.1 First of all we will use Foursquare API to explore the neighborhoods of Berlin and segment them.

In [11]:
CLIENT_ID = 'BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM' # your Foursquare ID
CLIENT_SECRET = 'MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM
CLIENT_SECRET:MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0


In [12]:
df_berlin.loc[0, 'neighborhood']

'Mitte'

In [13]:
neighborhood_latitude = df_berlin.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_berlin.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_berlin.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Mitte are 39.98020495, -7.905590887431517.


In [14]:
# Defining parameters to Foursqaue API

LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM&client_secret=MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0&v=20180605&ll=39.98020495,-7.905590887431517&radius=1000&limit=100'

In [15]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60f5ac149be13106f6c47d17'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': 39.98920495900001,
    'lng': -7.893867544095733},
   'sw': {'lat': 39.97120494099999, 'lng': -7.917314230767301}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5b9642f665211f002c799be8',
       'name': 'Yoga Evolution Retreats',
       'location': {'address': 'Yoga Evolution Retreats',
        'crossStreet': 'Quinta Do Bacelo',
        'lat': 39.979904,
        'lng': -7.9148088,
        'labeledLatLngs': [{'label': 'display',
          'lat': 39.979904,
          'lng': -7.9148088}],
        'distance': 787,
        'posta

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Yoga Evolution Retreats,Yoga Studio,39.979904,-7.914809


In [19]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

1 venues were returned by Foursquare.


In [20]:
print ('{} unique categories in Mitte.'.format(nearby_venues['categories'].value_counts().shape[0]))

1 unique categories in Mitte.


In [21]:
print (nearby_venues['categories'].value_counts()[0:15])

Yoga Studio    1
Name: categories, dtype: int64


In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000, LIMIT=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    
    return(nearby_venues)

In [23]:
berlin_venues = getNearbyVenues(names=df_berlin['neighborhood'],
                                   latitudes=df_berlin['Latitude'],
                                   longitudes=df_berlin['Longitude']
                                  )

Mitte
Moabit
Hansaviertel
Tiergarten
Wedding
Gesundbrunnen
Friedrichshain
Kreuzberg
Prenzlauer Berg
Weißensee
Blankenburg
Heinersdorf
Karow
Stadtrandsiedlung Malchow
Pankow
Blankenfelde
Buch
Französisch Buchholz
Niederschönhausen
Rosenthal
Wilhelmsruh
Charlottenburg
Wilmersdorf
Schmargendorf
Grunewald
Westend
Charlottenburg-Nord
Halensee
Spandau
Haselhorst
Siemensstadt
Staaken
Gatow
Kladow
Hakenfelde
Falkenhagener Feld
Wilhelmstadt
Steglitz
Lichterfelde
Lankwitz
Zehlendorf
Dahlem
Nikolassee
Wannsee
Schöneberg
Friedenau
Tempelhof
Mariendorf
Marienfelde
Lichtenrade
Neukölln
Britz
Buckow
Rudow
Gropiusstadt
Alt-Treptow
Plänterwald
Baumschulenweg
Johannisthal
Niederschöneweide
Altglienicke
Adlershof
Bohnsdorf
Oberschöneweide
Köpenick
Friedrichshagen
Rahnsdorf
Grünau
Müggelheim
Schmöckwitz
Marzahn
Biesdorf
Kaulsdorf
Mahlsdorf
Hellersdorf
Friedrichsfelde
Karlshorst
Lichtenberg
Falkenberg
Malchow
Wartenberg
Neu-Hohenschönhausen
Alt-Hohenschönhausen
Fennpfuhl
Rummelsburg
Reinickendorf
Tegel
Kon

In [24]:
berlin_venues.shape

(6229, 7)

In [31]:
# Create a Data-Frame out of it to concentrate only on Restaurants
Berlin_Venues_only_restaurants = berlin_venues[berlin_venues['Venue Category'].str.contains('Restaurant')].reset_index(drop=True)
Berlin_Venues_only_restaurants.index = np.arange(1, len(Berlin_Venues_only_restaurants )+1)

In [32]:
print (Berlin_Venues_only_restaurants['Venue Category'].value_counts())

Italian Restaurant                 240
German Restaurant                  185
Greek Restaurant                    86
Vietnamese Restaurant               74
Restaurant                          69
Fast Food Restaurant                57
Indian Restaurant                   53
Falafel Restaurant                  51
Chinese Restaurant                  46
Doner Restaurant                    43
Thai Restaurant                     42
Asian Restaurant                    41
Sushi Restaurant                    40
Seafood Restaurant                  30
Korean Restaurant                   29
Vegetarian / Vegan Restaurant       27
Argentinian Restaurant              26
Turkish Restaurant                  25
Mexican Restaurant                  24
Middle Eastern Restaurant           23
Japanese Restaurant                 20
Eastern European Restaurant         20
French Restaurant                   12
Modern European Restaurant           9
American Restaurant                  8
Mediterranean Restaurant 

In [33]:
Berlin_Venues_only_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Moabit,52.530102,13.342542,Güllü Lahmacun,52.532217,13.350061,Turkish Restaurant
2,Moabit,52.530102,13.342542,Sapori Di Casa,52.527425,13.351519,Italian Restaurant
3,Moabit,52.530102,13.342542,Favorit Gemüse Kebap,52.526737,13.335998,Doner Restaurant
4,Moabit,52.530102,13.342542,Hoan Kiem,52.527047,13.33897,Vietnamese Restaurant
5,Moabit,52.530102,13.342542,Recep Usta Köfteci,52.527139,13.331251,Turkish Restaurant


### 3.3.2 Analysis of the neighbourhoods.

In [34]:
Berlin_Venues_restaurants = Berlin_Venues_only_restaurants.groupby(['Neighborhood'])['Venue Category'].apply(lambda x: x[x.str.contains('Restaurant')].count())

In [35]:
Berlin_Venues_restaurants

Neighborhood
Adlershof               21
Alt-Hohenschönhausen    10
Alt-Treptow             17
Altglienicke            11
Baumschulenweg          17
                        ..
Wilhelmsruh             27
Wilhelmstadt            17
Wilmersdorf             26
Wittenau                31
Zehlendorf              20
Name: Venue Category, Length: 89, dtype: int64

In [37]:
Berlin_Venues_restaurant_df  = Berlin_Venues_restaurants.to_frame().reset_index()
Berlin_Venues_restaurant_df.columns = ['Neighborhood', 'Number of Restaurant']
Berlin_Venues_restaurant_df.index = np.arange(1, len(Berlin_Venues_restaurant_df)+1)
list_rest_no =Berlin_Venues_restaurant_df['Number of Restaurant'].to_list()
list_dist =Berlin_Venues_restaurant_df['Neighborhood'].to_list()

In [43]:
# one hot encoding
Berlin_onehot = pd.get_dummies(Berlin_Venues_only_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Berlin_onehot['Neighborhood'] = Berlin_Venues_only_restaurants['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Berlin_onehot.columns[-1]] + list(Berlin_onehot.columns[:-1])
Berlin_onehot = Berlin_onehot[fixed_columns]

Berlin_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,...,Swiss Restaurant,Syrian Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Home Cooking Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yemeni Restaurant
1,Moabit,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,Moabit,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Moabit,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Moabit,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
5,Moabit,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


### 3.3.3 Grouping by neighbourhoods and showing the mean of the frequency of occurrence for each category of restaurants.

In [44]:
Berlin_grouped = Berlin_onehot.groupby('Neighborhood').mean().reset_index()
Berlin_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,...,Swiss Restaurant,Syrian Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Home Cooking Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yemeni Restaurant
0,Adlershof,0.0,0.000000,0.000000,0.047619,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0
1,Alt-Hohenschönhausen,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.300000,0.0
2,Alt-Treptow,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.058824,0.117647,0.0,0.000000,0.117647,0.058824,0.0
3,Altglienicke,0.0,0.000000,0.000000,0.090909,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0
4,Baumschulenweg,0.0,0.000000,0.000000,0.058824,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.058824,0.117647,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84,Wilhelmsruh,0.0,0.037037,0.037037,0.000000,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.037037,0.0,0.037037,0.000000,0.000000,0.0
85,Wilhelmstadt,0.0,0.000000,0.058824,0.000000,0.058824,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.058824,0.0
86,Wilmersdorf,0.0,0.000000,0.000000,0.076923,0.000000,0.0,0.0,0.0,0.038462,...,0.0,0.0,0.038462,0.000000,0.000000,0.0,0.000000,0.000000,0.115385,0.0
87,Wittenau,0.0,0.032258,0.064516,0.000000,0.000000,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.000000,0.032258,0.0,0.000000,0.000000,0.000000,0.0


In [45]:
Berlin_grouped.shape

(89, 58)

### 3.3.4 Print the neighbourhoods with their respective top 10 most common venues.

In [46]:
num_top_venues = 10

for hood in Berlin_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Berlin_grouped[Berlin_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adlershof----
                       venue  freq
0          German Restaurant  0.29
1           Greek Restaurant  0.19
2         Italian Restaurant  0.14
3           Sushi Restaurant  0.14
4          Korean Restaurant  0.05
5                 Restaurant  0.05
6           Asian Restaurant  0.05
7  Middle Eastern Restaurant  0.05
8          Indian Restaurant  0.05
9         Mexican Restaurant  0.00


----Alt-Hohenschönhausen----
                        venue  freq
0       Vietnamese Restaurant   0.3
1           German Restaurant   0.3
2           Indian Restaurant   0.2
3          Italian Restaurant   0.1
4            Greek Restaurant   0.1
5          African Restaurant   0.0
6                  Restaurant   0.0
7          Mexican Restaurant   0.0
8   Middle Eastern Restaurant   0.0
9  Modern European Restaurant   0.0


----Alt-Treptow----
                           venue  freq
0             Falafel Restaurant  0.24
1             Italian Restaurant  0.12
2  Vegetarian / Vegan Restauran

### 3.3.5 Creating a pandas dataframe.

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Berlin_grouped['Neighborhood']

for ind in np.arange(Berlin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Berlin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(20)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adlershof,German Restaurant,Greek Restaurant,Italian Restaurant,Sushi Restaurant,Korean Restaurant,Restaurant,Asian Restaurant,Middle Eastern Restaurant,Indian Restaurant,Mexican Restaurant
1,Alt-Hohenschönhausen,Vietnamese Restaurant,German Restaurant,Indian Restaurant,Italian Restaurant,Greek Restaurant,African Restaurant,Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant
2,Alt-Treptow,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Dumpling Restaurant,Russian Restaurant,Lebanese Restaurant,German Restaurant,Tapas Restaurant,Middle Eastern Restaurant
3,Altglienicke,Italian Restaurant,Greek Restaurant,Korean Restaurant,Asian Restaurant,German Restaurant,Sushi Restaurant,Restaurant,Russian Restaurant,Mexican Restaurant,Middle Eastern Restaurant
4,Baumschulenweg,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Doner Restaurant,Sushi Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant
5,Biesdorf,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
6,Blankenburg,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
7,Blankenfelde,Mexican Restaurant,Greek Restaurant,German Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
8,Bohnsdorf,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
9,Borsigwalde,German Restaurant,Italian Restaurant,Restaurant,Eastern European Restaurant,Argentinian Restaurant,Indian Restaurant,Seafood Restaurant,Doner Restaurant,Japanese Restaurant,Fast Food Restaurant


## 3.4 Clustering the neighbourhoods with k-means.

In [51]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [97]:
# set number of clusters (5)
kclusters = 5

Berlin_grouped_clustering = Berlin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Berlin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]



array([2, 2, 1, 1, 1, 4, 4, 2, 0, 1], dtype=int32)

In [98]:
Berlin_merged = df_berlin
Berlin_merged.head(10)

Unnamed: 0,neighborhood_id,Neighborhood,borough,city,Latitude,Longitude
0,101,Mitte,Mitte,Berlin,39.980205,-7.905591
1,102,Moabit,Mitte,Berlin,52.530102,13.342542
2,103,Hansaviertel,Mitte,Berlin,52.519123,13.341872
3,104,Tiergarten,Mitte,Berlin,50.340922,6.956329
4,105,Wedding,Mitte,Berlin,52.550123,13.34197
5,106,Gesundbrunnen,Mitte,Berlin,52.55092,13.384846
6,201,Friedrichshain,Friedrichshain-Kreuzberg,Berlin,52.512215,13.45029
7,202,Kreuzberg,Friedrichshain-Kreuzberg,Berlin,52.497644,13.411914
8,301,Prenzlauer Berg,Pankow,Berlin,52.539847,13.428565
9,302,Weißensee,Pankow,Berlin,52.554619,13.463002


In [99]:
neighborhoods_venues_sorted.head(20)

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Adlershof,German Restaurant,Greek Restaurant,Italian Restaurant,Sushi Restaurant,Korean Restaurant,Restaurant,Asian Restaurant,Middle Eastern Restaurant,Indian Restaurant,Mexican Restaurant
1,2,Alt-Hohenschönhausen,Vietnamese Restaurant,German Restaurant,Indian Restaurant,Italian Restaurant,Greek Restaurant,African Restaurant,Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant
2,1,Alt-Treptow,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Dumpling Restaurant,Russian Restaurant,Lebanese Restaurant,German Restaurant,Tapas Restaurant,Middle Eastern Restaurant
3,1,Altglienicke,Italian Restaurant,Greek Restaurant,Korean Restaurant,Asian Restaurant,German Restaurant,Sushi Restaurant,Restaurant,Russian Restaurant,Mexican Restaurant,Middle Eastern Restaurant
4,1,Baumschulenweg,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Doner Restaurant,Sushi Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant
5,4,Biesdorf,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
6,4,Blankenburg,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
7,2,Blankenfelde,Mexican Restaurant,Greek Restaurant,German Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
8,0,Bohnsdorf,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
9,1,Borsigwalde,German Restaurant,Italian Restaurant,Restaurant,Eastern European Restaurant,Argentinian Restaurant,Indian Restaurant,Seafood Restaurant,Doner Restaurant,Japanese Restaurant,Fast Food Restaurant


In [100]:
neighborhoods_venues_sorted_w_clusters = neighborhoods_venues_sorted

In [101]:
neighborhoods_venues_sorted_w_clusters.head(9)

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Adlershof,German Restaurant,Greek Restaurant,Italian Restaurant,Sushi Restaurant,Korean Restaurant,Restaurant,Asian Restaurant,Middle Eastern Restaurant,Indian Restaurant,Mexican Restaurant
1,2,Alt-Hohenschönhausen,Vietnamese Restaurant,German Restaurant,Indian Restaurant,Italian Restaurant,Greek Restaurant,African Restaurant,Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant
2,1,Alt-Treptow,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Dumpling Restaurant,Russian Restaurant,Lebanese Restaurant,German Restaurant,Tapas Restaurant,Middle Eastern Restaurant
3,1,Altglienicke,Italian Restaurant,Greek Restaurant,Korean Restaurant,Asian Restaurant,German Restaurant,Sushi Restaurant,Restaurant,Russian Restaurant,Mexican Restaurant,Middle Eastern Restaurant
4,1,Baumschulenweg,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Doner Restaurant,Sushi Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant
5,4,Biesdorf,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
6,4,Blankenburg,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
7,2,Blankenfelde,Mexican Restaurant,Greek Restaurant,German Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
8,0,Bohnsdorf,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant


In [102]:
# add clustering labels
neighborhoods_venues_sorted_w_clusters.insert(0, 'Cluster Labels', kmeans.labels_)

ValueError: cannot insert Cluster Labels, already exists

In [103]:
neighborhoods_venues_sorted_w_clusters.head(23)

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Adlershof,German Restaurant,Greek Restaurant,Italian Restaurant,Sushi Restaurant,Korean Restaurant,Restaurant,Asian Restaurant,Middle Eastern Restaurant,Indian Restaurant,Mexican Restaurant
1,2,Alt-Hohenschönhausen,Vietnamese Restaurant,German Restaurant,Indian Restaurant,Italian Restaurant,Greek Restaurant,African Restaurant,Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant
2,1,Alt-Treptow,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Dumpling Restaurant,Russian Restaurant,Lebanese Restaurant,German Restaurant,Tapas Restaurant,Middle Eastern Restaurant
3,1,Altglienicke,Italian Restaurant,Greek Restaurant,Korean Restaurant,Asian Restaurant,German Restaurant,Sushi Restaurant,Restaurant,Russian Restaurant,Mexican Restaurant,Middle Eastern Restaurant
4,1,Baumschulenweg,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Doner Restaurant,Sushi Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant
5,4,Biesdorf,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
6,4,Blankenburg,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
7,2,Blankenfelde,Mexican Restaurant,Greek Restaurant,German Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
8,0,Bohnsdorf,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
9,1,Borsigwalde,German Restaurant,Italian Restaurant,Restaurant,Eastern European Restaurant,Argentinian Restaurant,Indian Restaurant,Seafood Restaurant,Doner Restaurant,Japanese Restaurant,Fast Food Restaurant


In [104]:
Berlin_merged.rename(columns={'neighborhood':'Neighborhood'}, inplace=True)

Berlin_merged = Berlin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Berlin_merged.head(10)

Unnamed: 0,neighborhood_id,Neighborhood,borough,city,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,101,Mitte,Mitte,Berlin,39.980205,-7.905591,,,,,,,,,,,
1,102,Moabit,Mitte,Berlin,52.530102,13.342542,1.0,Vietnamese Restaurant,Turkish Restaurant,Chinese Restaurant,Kebab Restaurant,Schnitzel Restaurant,Falafel Restaurant,Indian Restaurant,Italian Restaurant,Doner Restaurant,African Restaurant
2,103,Hansaviertel,Mitte,Berlin,52.519123,13.341872,1.0,Schnitzel Restaurant,Italian Restaurant,French Restaurant,Mediterranean Restaurant,Seafood Restaurant,Doner Restaurant,German Restaurant,Indian Restaurant,Kebab Restaurant,Turkish Restaurant
3,104,Tiergarten,Mitte,Berlin,50.340922,6.956329,2.0,German Restaurant,Restaurant,Eastern European Restaurant,African Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
4,105,Wedding,Mitte,Berlin,52.550123,13.34197,1.0,Turkish Restaurant,Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,African Restaurant,Falafel Restaurant,Vietnamese Restaurant,Modern European Restaurant
5,106,Gesundbrunnen,Mitte,Berlin,52.55092,13.384846,1.0,Vegetarian / Vegan Restaurant,Seafood Restaurant,African Restaurant,Modern European Restaurant,Falafel Restaurant,Doner Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Thai Restaurant
6,201,Friedrichshain,Friedrichshain-Kreuzberg,Berlin,52.512215,13.45029,1.0,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Thai Restaurant,Russian Restaurant,German Restaurant,Chinese Restaurant,Syrian Restaurant,Modern European Restaurant
7,202,Kreuzberg,Friedrichshain-Kreuzberg,Berlin,52.497644,13.411914,1.0,Italian Restaurant,Turkish Restaurant,Vietnamese Restaurant,African Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Seafood Restaurant,Ramen Restaurant,Mediterranean Restaurant
8,301,Prenzlauer Berg,Pankow,Berlin,52.539847,13.428565,1.0,Vietnamese Restaurant,Falafel Restaurant,Japanese Restaurant,German Restaurant,Doner Restaurant,Mexican Restaurant,Italian Restaurant,Israeli Restaurant,Indian Restaurant,French Restaurant
9,302,Weißensee,Pankow,Berlin,52.554619,13.463002,2.0,Indian Restaurant,German Restaurant,Greek Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Dim Sum Restaurant,Restaurant,African Restaurant,Middle Eastern Restaurant


Finally, let's visualize the resulting clusters.

In [105]:
# Delete those Neighborhood not classified = NaN in Cluster Labels

Berlin_merged = Berlin_merged[Berlin_merged['Cluster Labels'].notna()]
Berlin_merged.head(10)

Unnamed: 0,neighborhood_id,Neighborhood,borough,city,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,102,Moabit,Mitte,Berlin,52.530102,13.342542,1.0,Vietnamese Restaurant,Turkish Restaurant,Chinese Restaurant,Kebab Restaurant,Schnitzel Restaurant,Falafel Restaurant,Indian Restaurant,Italian Restaurant,Doner Restaurant,African Restaurant
2,103,Hansaviertel,Mitte,Berlin,52.519123,13.341872,1.0,Schnitzel Restaurant,Italian Restaurant,French Restaurant,Mediterranean Restaurant,Seafood Restaurant,Doner Restaurant,German Restaurant,Indian Restaurant,Kebab Restaurant,Turkish Restaurant
3,104,Tiergarten,Mitte,Berlin,50.340922,6.956329,2.0,German Restaurant,Restaurant,Eastern European Restaurant,African Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
4,105,Wedding,Mitte,Berlin,52.550123,13.34197,1.0,Turkish Restaurant,Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,African Restaurant,Falafel Restaurant,Vietnamese Restaurant,Modern European Restaurant
5,106,Gesundbrunnen,Mitte,Berlin,52.55092,13.384846,1.0,Vegetarian / Vegan Restaurant,Seafood Restaurant,African Restaurant,Modern European Restaurant,Falafel Restaurant,Doner Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Thai Restaurant
6,201,Friedrichshain,Friedrichshain-Kreuzberg,Berlin,52.512215,13.45029,1.0,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Thai Restaurant,Russian Restaurant,German Restaurant,Chinese Restaurant,Syrian Restaurant,Modern European Restaurant
7,202,Kreuzberg,Friedrichshain-Kreuzberg,Berlin,52.497644,13.411914,1.0,Italian Restaurant,Turkish Restaurant,Vietnamese Restaurant,African Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Seafood Restaurant,Ramen Restaurant,Mediterranean Restaurant
8,301,Prenzlauer Berg,Pankow,Berlin,52.539847,13.428565,1.0,Vietnamese Restaurant,Falafel Restaurant,Japanese Restaurant,German Restaurant,Doner Restaurant,Mexican Restaurant,Italian Restaurant,Israeli Restaurant,Indian Restaurant,French Restaurant
9,302,Weißensee,Pankow,Berlin,52.554619,13.463002,2.0,Indian Restaurant,German Restaurant,Greek Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Dim Sum Restaurant,Restaurant,African Restaurant,Middle Eastern Restaurant
10,303,Blankenburg,Pankow,Berlin,51.790268,10.955199,4.0,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant


In [106]:
# Convert into integer Cluster Labels column

Berlin_merged['Cluster Labels'] = Berlin_merged['Cluster Labels'].astype(int)

In [107]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Berlin_merged['Latitude'], Berlin_merged['Longitude'], Berlin_merged['Neighborhood'], Berlin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3.4.1 Exploring clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

**CLUSTER 1**

In [108]:
Berlin_merged.loc[Berlin_merged['Cluster Labels'] == 0, Berlin_merged.columns[[1] + list(range(5, Berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Stadtrandsiedlung Malchow,13.463285,0,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
17,Französisch Buchholz,13.42811,0,Greek Restaurant,Asian Restaurant,Mexican Restaurant,Doner Restaurant,African Restaurant,Schnitzel Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
31,Staaken,13.143367,0,Fast Food Restaurant,Turkish Restaurant,Chinese Restaurant,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant
62,Bohnsdorf,13.570665,0,Greek Restaurant,Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
70,Marzahn,13.563142,0,Greek Restaurant,Fast Food Restaurant,Mexican Restaurant,Asian Restaurant,Italian Restaurant,German Restaurant,Doner Restaurant,Russian Restaurant,Middle Eastern Restaurant,Modern European Restaurant
73,Mahlsdorf,13.613162,0,Greek Restaurant,Fast Food Restaurant,Italian Restaurant,Chinese Restaurant,African Restaurant,Russian Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
74,Hellersdorf,13.604774,0,Fast Food Restaurant,Mexican Restaurant,Asian Restaurant,Italian Restaurant,Greek Restaurant,Doner Restaurant,African Restaurant,Russian Restaurant,Middle Eastern Restaurant,Modern European Restaurant
80,Wartenberg,13.517582,0,Greek Restaurant,Fast Food Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant


**CLUSTER 2**

In [109]:
Berlin_merged.loc[Berlin_merged['Cluster Labels'] == 1, Berlin_merged.columns[[1] + list(range(5, Berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Moabit,13.342542,1,Vietnamese Restaurant,Turkish Restaurant,Chinese Restaurant,Kebab Restaurant,Schnitzel Restaurant,Falafel Restaurant,Indian Restaurant,Italian Restaurant,Doner Restaurant,African Restaurant
2,Hansaviertel,13.341872,1,Schnitzel Restaurant,Italian Restaurant,French Restaurant,Mediterranean Restaurant,Seafood Restaurant,Doner Restaurant,German Restaurant,Indian Restaurant,Kebab Restaurant,Turkish Restaurant
4,Wedding,13.34197,1,Turkish Restaurant,Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,African Restaurant,Falafel Restaurant,Vietnamese Restaurant,Modern European Restaurant
5,Gesundbrunnen,13.384846,1,Vegetarian / Vegan Restaurant,Seafood Restaurant,African Restaurant,Modern European Restaurant,Falafel Restaurant,Doner Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Thai Restaurant
6,Friedrichshain,13.45029,1,Falafel Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Thai Restaurant,Russian Restaurant,German Restaurant,Chinese Restaurant,Syrian Restaurant,Modern European Restaurant
7,Kreuzberg,13.411914,1,Italian Restaurant,Turkish Restaurant,Vietnamese Restaurant,African Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Seafood Restaurant,Ramen Restaurant,Mediterranean Restaurant
8,Prenzlauer Berg,13.428565,1,Vietnamese Restaurant,Falafel Restaurant,Japanese Restaurant,German Restaurant,Doner Restaurant,Mexican Restaurant,Italian Restaurant,Israeli Restaurant,Indian Restaurant,French Restaurant
11,Heinersdorf,13.437015,1,Italian Restaurant,Vietnamese Restaurant,Chinese Restaurant,German Restaurant,Indian Restaurant,Falafel Restaurant,Japanese Restaurant,Dim Sum Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant
14,Pankow,13.435316,1,Italian Restaurant,Greek Restaurant,Mexican Restaurant,Asian Restaurant,Thai Restaurant,Doner Restaurant,Restaurant,African Restaurant,Russian Restaurant,Middle Eastern Restaurant
18,Niederschönhausen,13.401397,1,Mexican Restaurant,Asian Restaurant,Modern European Restaurant,Italian Restaurant,Greek Restaurant,German Restaurant,African Restaurant,Russian Restaurant,Middle Eastern Restaurant,New American Restaurant


**CLUSTER 3**

In [110]:
Berlin_merged.loc[Berlin_merged['Cluster Labels'] == 2, Berlin_merged.columns[[1] + list(range(5, Berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Tiergarten,6.956329,2,German Restaurant,Restaurant,Eastern European Restaurant,African Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
9,Weißensee,13.463002,2,Indian Restaurant,German Restaurant,Greek Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Dim Sum Restaurant,Restaurant,African Restaurant,Middle Eastern Restaurant
15,Blankenfelde,13.388447,2,Mexican Restaurant,Greek Restaurant,German Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
34,Hakenfelde,13.196769,2,German Restaurant,Restaurant,Vietnamese Restaurant,Greek Restaurant,Italian Restaurant,Fast Food Restaurant,Argentinian Restaurant,Asian Restaurant,Turkish Restaurant,Doner Restaurant
38,Lichterfelde,13.313864,2,German Restaurant,Sushi Restaurant,Chinese Restaurant,Italian Restaurant,Asian Restaurant,Caucasian Restaurant,Fast Food Restaurant,Restaurant,Eastern European Restaurant,Greek Restaurant
42,Nikolassee,13.198145,2,German Restaurant,Italian Restaurant,Restaurant,Asian Restaurant,Indian Restaurant,Chinese Restaurant,Fast Food Restaurant,African Restaurant,Russian Restaurant,Middle Eastern Restaurant
43,Wannsee,13.158937,2,German Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Restaurant,Fast Food Restaurant,Indian Restaurant,Italian Restaurant,African Restaurant,Portuguese Restaurant
56,Plänterwald,13.478808,2,German Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Italian Restaurant,Tapas Restaurant,Dumpling Restaurant,Falafel Restaurant,Thai Restaurant,Modern European Restaurant,Restaurant
61,Adlershof,13.54755,2,German Restaurant,Greek Restaurant,Italian Restaurant,Sushi Restaurant,Korean Restaurant,Restaurant,Asian Restaurant,Middle Eastern Restaurant,Indian Restaurant,Mexican Restaurant
64,Köpenick,13.576413,2,German Restaurant,Italian Restaurant,Greek Restaurant,Middle Eastern Restaurant,Indian Restaurant,Sushi Restaurant,African Restaurant,Restaurant,Mexican Restaurant,Modern European Restaurant


**CLUSTER 4**

In [111]:
Berlin_merged.loc[Berlin_merged['Cluster Labels'] == 3, Berlin_merged.columns[[1] + list(range(5, Berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Karow,13.486276,3,Italian Restaurant,Fast Food Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
52,Buckow,14.076153,3,Italian Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
72,Kaulsdorf,13.58099,3,Italian Restaurant,Greek Restaurant,Fast Food Restaurant,African Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant


**CLUSTER 5**

In [112]:
Berlin_merged.loc[Berlin_merged['Cluster Labels'] == 4, Berlin_merged.columns[[1] + list(range(5, Berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Blankenburg,10.955199,4,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
32,Gatow,13.180134,4,German Restaurant,Italian Restaurant,Seafood Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
33,Kladow,13.140052,4,German Restaurant,Italian Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
71,Biesdorf,6.305603,4,German Restaurant,African Restaurant,Russian Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Persian Restaurant
88,Heiligensee,13.229579,4,German Restaurant,Italian Restaurant,Restaurant,African Restaurant,Russian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant
