<h1>Final Project Introduction</h1>
<h2>The Battle of Restaurants in Munich</h2>

## Table of contents
* [Introduction](#introduction)
* [Business Problem and Audience](#business)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

<h3>Introduction</h3><a name="introduction"></a>

The topic of lifelong learning is an important one. Since it was partly no longer possible to pursue my original hobbies during my time at Corona, I used the time to familiarize myself with the topic of machine learning. I have always been interested in data and I am convinced that it is becoming more and more important for making business decisions. To that end, I've done various online courses, watched instructional videos, and done exercises. To get even more involved in the topic, I signed up for the "IBM Data Science Certificate".In this course I got familiar with Python programming in general, various libraries (numpy, pandas...) but also with the IBM Cloud itself.

The "Capstone Project" now encompasses what was previously learned over the past weeks and months. To accomplish the last task, I created a Jupyter notebook, with the associated code.
For the further analysis I chose Munich, because from my point of view it is suitable for such an analysis and it is not far away from where I live.

<h3>Business Problem and Audience</h3><a name="business"></a>

Munich is a well-known city. Millions of visitors come there every year. Probably one of the most popular events is the Oktoberfest in Munich. This is visited by about 6 million people from all over the world. As a tourist, I know the problem when you are looking for a restaurant in a foreign city and first faces the challenge of which part of town to go to at all. Of course, it should not be just any restaurant, but correspond to his own preferences.

With this exercise I would like to examine the different districts of Munich and divide them into different categories. Where do you find a high density of German food, where do you find a lot of Mediterranean food or if it has to go fast, also fast food.

The audience of this exercise are tourists from near and far.

<h3>Data</h3><a name="data"></a>

The following data was used for this project.

- Wikipedia Data - Munich Neighborhoods https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens
- Geographical Data (Longitude and Latitude)
- Foursquare Data

<h4>Wikipedia Data - Munich Neighborhoods</h4>

The Munich Neigborhoods data was extracted by Wikipedia with beautiful soup
This data includes the following:
- Nr.
- Neighbourhood
- Area
- Inhabitants
- Density (Inhabitants/km²)
- Foreigners

For this project only the Neighbourhood names where used the rest of the data was droped.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

url_path = ('https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens')
html_text = requests.get(url_path).text
soup = BeautifulSoup(html_text)
wiki_tables = soup.find_all('table', {'class': 'wikitable sortable'})
first_table = wiki_tables[0].find_all("tr")
# Extracting the text from the table cells
table_list = []

for tr in first_table:
    td = tr.find_all('td')
    row = [ele.text.strip() for ele in td]
    table_list.append(row)
    
df = pd.DataFrame(table_list, columns=['Nr.', 'Stadtbezirk', 'Fläche(km²)', 'Einwohner', 'Dichte(Einw./km²)', 'Ausländer(%)'])
df = df.dropna(how='all').reset_index(drop=True)
df = df.drop(['Fläche(km²)', 'Einwohner', 'Dichte(Einw./km²)', 'Ausländer(%)'], axis = 1)
df.head()

Unnamed: 0,Nr.,Stadtbezirk
0,1,Altstadt-Lehel
1,2,Ludwigsvorstadt-Isarvorstadt
2,3,Maxvorstadt
3,4,Schwabing-West
4,5,Au-Haidhausen


The above dataframe shows the data extracted from Wikipedia. These were further enriched with longitude and latitude data and combined with the Foursquare data.

In [2]:
df.shape

(26, 2)

This resulted in a dataframe with the 25 districts of Munich. The last row is Munich itself. Thats why there are 26 entries.

<h4>Geographical Data</h4>

To add longitude (latitude and longitude) to the data from each district, I used geopy.et

In [3]:
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim
   
# declare an empty list to store
# latitude and longitude of values 
# of city column
longitude = []
latitude = []
   
# function to find the coordinate
# of a given city 
def findGeocode(city):
       
    # try and catch is used to overcome
    # the exception thrown by geolocator
    # using geocodertimedout  
    try:
          
        # Specify the user_agent as your
        # app name it should not be none
        geolocator = Nominatim(user_agent="MunichGeoData")
          
        return geolocator.geocode(city)
      
    except GeocoderTimedOut:
          
        return findGeocode(city)    
  
# each value from city column
# will be fetched and sent to
# function find_geocode   
for i in (df["Stadtbezirk"]):
      
    if findGeocode(i) != None:
           
        loc = findGeocode(i)
          
        # coordinates returned from 
        # function is stored into
        # two separate list
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)
       
    # if coordinate for a city not
    # found, insert "NaN" indicating 
    # missing value 
    else:
        latitude.append(np.nan)
        longitude.append(np.nan)

In [4]:
# now add this column to dataframe
df["Longitude"] = longitude
df["Latitude"] = latitude
  
df

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude
0,1.0,Altstadt-Lehel,11.574582,48.137828
1,2.0,Ludwigsvorstadt-Isarvorstadt,11.573366,48.13034
2,3.0,Maxvorstadt,11.562418,48.151092
3,4.0,Schwabing-West,11.569873,48.168271
4,5.0,Au-Haidhausen,11.598334,48.130274
5,6.0,Sendling,11.539083,48.118012
6,7.0,Sendling-Westpark,11.519333,48.118031
7,8.0,Schwanthalerhöhe,11.541057,48.133782
8,9.0,Neuhausen-Nymphenburg,11.531517,48.154222
9,10.0,Moosach,11.875678,48.031726


In [5]:
df_munich = df[df['Stadtbezirk'] == 'Landeshauptstadt München']
type(df_munich)

longitude = df_munich['Longitude'].values[0]
latitude =  df_munich['Latitude'].values[0]

df_munich.head()

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude
25,,Landeshauptstadt München,11.596432,48.183699


In [6]:
print('The Longitude and Latitude of Munich is', longitude, 'and', latitude)

The Longitude and Latitude of Munich is 11.5964316 and 48.1836994


The Longiutde and Latitude of Moosach was not recognized correctly so I corrected them manually.

In [7]:
df.loc[df.Stadtbezirk == 'Moosach', 'Longitude'] = 11.5057
df.loc[df.Stadtbezirk == 'Moosach', 'Latitude'] = 48.1799

In [8]:
import folium # map rendering library
# create map of Manhattan using latitude and longitude values
map_munich = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Stadtbezirk']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

<h4>Foursquare Data</h4>

Additionally, Foursquare data was used to identify different venues from Munich and to assign them to the districts of Munich. 

In [9]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [10]:
neighborhood_latitude = df_munich.loc[25, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_munich.loc[25, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_munich.loc[25, 'Stadtbezirk'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Landeshauptstadt München are 48.1836994, 11.5964316.


In [11]:
# type your answer here
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=MHYEVMQYGJXUSZGVHDT2E1SEDSH0T13POGSBTHMCH5YWCTBW&client_secret=K0CAO03DCHBAWZGS0P4WJRTWVDIDGSAQH4HFLIB5M4CGUMHM&v=20180605&ll=48.1836994,11.5964316&radius=500&limit=100'

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6123f87b735e082b36366799'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Alte Heide - Hirschau',
  'headerFullLocation': 'Alte Heide - Hirschau, Munich',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 19,
  'suggestedBounds': {'ne': {'lat': 48.188199404500004,
    'lng': 11.603168216906269},
   'sw': {'lat': 48.1791993955, 'lng': 11.589694983093732}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '53b6e112498eef48c838b15f',
       'name': "Grillin' me softly",
       'location': {'address': 'Täglich wechselnder Standort',
        'crossStreet': 'Do: Max-Diamand-Str. 7',
        'lat': 48.182678687078166,
        'lng': 11.5

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:
import json # library to handle JSON files

from pandas.io.json import json_normalize

In [15]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Grillin' me softly,Food Truck,48.182679,11.595554
1,Dolzer Masskonfektionäre,Clothing Store,48.183407,11.592058
2,Leonardi,Cafeteria,48.181568,11.596857
3,Bite Delite,Café,48.182051,11.597206
4,hasia - Asian Food & Drink,Asian Restaurant,48.182621,11.594412


In [16]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

19 venues were returned by Foursquare.


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Stadtbezirk', 
                  'Stadtbezirk Latitude', 
                  'Stadtbezirk Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
munich_venues = getNearbyVenues(
                                    names=df['Stadtbezirk'],
                                    latitudes=df['Latitude'],
                                    longitudes=df['Longitude']
                                  )

Altstadt-Lehel
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Schwabing-West
Au-Haidhausen
Sendling
Sendling-Westpark
Schwanthalerhöhe
Neuhausen-Nymphenburg
Moosach
Milbertshofen-Am Hart
Schwabing-Freimann
Bogenhausen
Berg am Laim
Trudering-Riem
Ramersdorf-Perlach
Obergiesing-Fasangarten
Untergiesing-Harlaching
Thalkirchen-Obersendling-Forstenried-Fürstenried-Solln
Hadern
Pasing-Obermenzing
Aubing-Lochhausen-Langwied
Allach-Untermenzing
Feldmoching-Hasenbergl
Laim
Landeshauptstadt München


In [19]:
print(munich_venues.shape)
munich_venues.head()

munich_venues = munich_venues[munich_venues['Venue Category'].str.contains('Restaurant')]

(669, 7)


In [20]:
munich_venues.groupby('Stadtbezirk').count()

Unnamed: 0_level_0,Stadtbezirk Latitude,Stadtbezirk Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Stadtbezirk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach-Untermenzing,1,1,1,1,1,1
Altstadt-Lehel,23,23,23,23,23,23
Au-Haidhausen,43,43,43,43,43,43
Bogenhausen,3,3,3,3,3,3
Feldmoching-Hasenbergl,1,1,1,1,1,1
Hadern,2,2,2,2,2,2
Landeshauptstadt München,5,5,5,5,5,5
Ludwigsvorstadt-Isarvorstadt,28,28,28,28,28,28
Maxvorstadt,13,13,13,13,13,13
Milbertshofen-Am Hart,4,4,4,4,4,4


In [21]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 35 uniques categories.


In [22]:
print (nearby_venues['categories'].value_counts()[0:10])

Supermarket           2
Italian Restaurant    2
Bus Stop              2
Food Truck            1
Clothing Store        1
Cafeteria             1
Café                  1
Asian Restaurant      1
Hotel                 1
Drugstore             1
Name: categories, dtype: int64


In [23]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Grillin' me softly,Food Truck,48.182679,11.595554
1,Dolzer Masskonfektionäre,Clothing Store,48.183407,11.592058
2,Leonardi,Cafeteria,48.181568,11.596857
3,Bite Delite,Café,48.182051,11.597206
4,hasia - Asian Food & Drink,Asian Restaurant,48.182621,11.594412
5,Suite Novotel Parkstadt Schwabing,Hotel,48.179846,11.59355
6,dm-drogerie markt,Drugstore,48.182808,11.594108
7,REWE,Supermarket,48.182975,11.593756
8,Coffee Fellows,Coffee Shop,48.183399,11.594659
9,Parkstadt-Center,Shopping Mall,48.182576,11.594717


In [24]:
munich_venues

Unnamed: 0,Stadtbezirk,Stadtbezirk Latitude,Stadtbezirk Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,Altstadt-Lehel,48.137828,11.574582,Augustiner Klosterwirt,48.138649,11.572527,German Restaurant
13,Altstadt-Lehel,48.137828,11.574582,Andechser am Dom,48.138302,11.573778,Bavarian Restaurant
20,Altstadt-Lehel,48.137828,11.574582,Restaurant Dallmayr,48.138489,11.576791,German Restaurant
24,Altstadt-Lehel,48.137828,11.574582,Nürnberger Bratwurst Glöckl am Dom,48.138191,11.574165,Bavarian Restaurant
38,Altstadt-Lehel,48.137828,11.574582,Leger am Dom,48.138262,11.572932,Restaurant
...,...,...,...,...,...,...,...
654,Landeshauptstadt München,48.183699,11.596432,hasia - Asian Food & Drink,48.182621,11.594412,Asian Restaurant
660,Landeshauptstadt München,48.183699,11.596432,mammaminuti,48.183397,11.594629,Italian Restaurant
661,Landeshauptstadt München,48.183699,11.596432,Gasthaus Domagk,48.183590,11.598015,Restaurant
664,Landeshauptstadt München,48.183699,11.596432,Vitello,48.183173,11.594046,Modern European Restaurant


Now we have a dataframe munich_venues that contains all Foursquare data and the corresponding city district. Thus, the data collection and preparation phase is completed and we can proceed with the analysis.

<h3>Methodology</h3><a name="methodology"></a>

In the analysis, we will focus on analyzing and categorizing the prepared data, and then on dividing the individual neighborhoods into categories.

First, we use a OneHot encoding to confer the different restaurant types into columns.

In the next step, we use k-means clustering to divide the data into different categories. The result is presented in a map and in a table form.

<h3>Analysis</h3><a name="analysis"></a>

In [25]:
# one hot encoding
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
munich_onehot['Stadtbezirk'] = munich_venues['Stadtbezirk'] 

# move neighborhood column to the first column
fixed_columns = [munich_onehot.columns[-1]] + list(munich_onehot.columns[:-1])
munich_onehot = munich_onehot[fixed_columns]

munich_onehot.head()

Unnamed: 0,Stadtbezirk,Afghan Restaurant,American Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Spanish Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
2,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13,Altstadt-Lehel,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
20,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
24,Altstadt-Lehel,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
38,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [26]:
munich_onehot.shape

(202, 36)

In [27]:
munich_grouped = munich_onehot.groupby('Stadtbezirk').mean().reset_index()
munich_grouped

Unnamed: 0,Stadtbezirk,Afghan Restaurant,American Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Spanish Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Allach-Untermenzing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altstadt-Lehel,0.0,0.0,0.0,0.0,0.304348,0.0,0.0,0.0,0.043478,...,0.0,0.217391,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0
2,Au-Haidhausen,0.0,0.0,0.0,0.0,0.046512,0.023256,0.0,0.023256,0.0,...,0.0,0.023256,0.023256,0.046512,0.023256,0.023256,0.046512,0.046512,0.046512,0.0
3,Bogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Feldmoching-Hasenbergl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Hadern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
6,Landeshauptstadt München,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ludwigsvorstadt-Isarvorstadt,0.071429,0.035714,0.107143,0.0,0.071429,0.0,0.035714,0.0,0.0,...,0.0,0.071429,0.0,0.0,0.035714,0.035714,0.0,0.0,0.071429,0.107143
8,Maxvorstadt,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,...,0.076923,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.153846
9,Milbertshofen-Am Hart,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0


In [28]:
num_top_venues = 5

for hood in munich_grouped['Stadtbezirk']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['Stadtbezirk'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach-Untermenzing----
                 venue  freq
0   Italian Restaurant   1.0
1           Restaurant   0.0
2  Japanese Restaurant   0.0
3    Jewish Restaurant   0.0
4     Kebab Restaurant   0.0


----Altstadt-Lehel----
                           venue  freq
0            Bavarian Restaurant  0.30
1                     Restaurant  0.22
2              German Restaurant  0.17
3             Italian Restaurant  0.13
4  Vegetarian / Vegan Restaurant  0.04


----Au-Haidhausen----
                 venue  freq
0   Italian Restaurant  0.21
1    German Restaurant  0.14
2    Indian Restaurant  0.12
3    French Restaurant  0.07
4  Bavarian Restaurant  0.05


----Bogenhausen----
                 venue  freq
0   Italian Restaurant  0.67
1     Greek Restaurant  0.33
2   Israeli Restaurant  0.00
3  Japanese Restaurant  0.00
4    Jewish Restaurant  0.00


----Feldmoching-Hasenbergl----
                 venue  freq
0     Greek Restaurant   1.0
1    Afghan Restaurant   0.0
2   Israeli Restaurant   

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [30]:
import numpy as np

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Stadtbezirk']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Stadtbezirk_venues_sorted = pd.DataFrame(columns=columns)
Stadtbezirk_venues_sorted['Stadtbezirk'] = munich_grouped['Stadtbezirk']

for ind in np.arange(munich_grouped.shape[0]):
    Stadtbezirk_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

Stadtbezirk_venues_sorted.head()

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,Italian Restaurant,Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant,Israeli Restaurant
1,Altstadt-Lehel,Bavarian Restaurant,Restaurant,German Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Falafel Restaurant,Seafood Restaurant,Jewish Restaurant,Kebab Restaurant
2,Au-Haidhausen,Italian Restaurant,German Restaurant,Indian Restaurant,French Restaurant,Bavarian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Turkish Restaurant,Seafood Restaurant
3,Bogenhausen,Italian Restaurant,Greek Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant
4,Feldmoching-Hasenbergl,Greek Restaurant,Afghan Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Restaurant


In [32]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [33]:
# set number of clusters
kclusters = 5

munich_grouped_clustering = munich_grouped.drop('Stadtbezirk', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

  munich_grouped_clustering = munich_grouped.drop('Stadtbezirk', 1)


array([3, 0, 0, 3, 4, 0, 0, 0, 0, 4], dtype=int32)

In [34]:
df.head()

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude
0,1,Altstadt-Lehel,11.574582,48.137828
1,2,Ludwigsvorstadt-Isarvorstadt,11.573366,48.13034
2,3,Maxvorstadt,11.562418,48.151092
3,4,Schwabing-West,11.569873,48.168271
4,5,Au-Haidhausen,11.598334,48.130274


In [35]:
# add clustering labels
Stadtbezirk_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
munich_merged = munich_merged.join(Stadtbezirk_venues_sorted.set_index('Stadtbezirk'), on='Stadtbezirk')

munich_merged.head() # check the last columns!

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Altstadt-Lehel,11.574582,48.137828,0.0,Bavarian Restaurant,Restaurant,German Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Falafel Restaurant,Seafood Restaurant,Jewish Restaurant,Kebab Restaurant
1,2,Ludwigsvorstadt-Isarvorstadt,11.573366,48.13034,0.0,Italian Restaurant,Vietnamese Restaurant,Asian Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Afghan Restaurant,Bavarian Restaurant,Indian Restaurant,Fast Food Restaurant
2,3,Maxvorstadt,11.562418,48.151092,0.0,Vietnamese Restaurant,German Restaurant,Falafel Restaurant,Sushi Restaurant,Restaurant,Ramen Restaurant,Israeli Restaurant,Grilled Meat Restaurant,Indian Restaurant,Dim Sum Restaurant
3,4,Schwabing-West,11.569873,48.168271,0.0,Vietnamese Restaurant,Italian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Austrian Restaurant,Turkish Restaurant,Tapas Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant
4,5,Au-Haidhausen,11.598334,48.130274,0.0,Italian Restaurant,German Restaurant,Indian Restaurant,French Restaurant,Bavarian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Turkish Restaurant,Seafood Restaurant


In [36]:
munich_merged['Cluster Labels'].isnull().values.any()
munich_merged['Cluster Labels'].isnull().sum()
munich_merged.shape

(26, 15)

In [37]:
munich_merged = munich_merged.dropna()

# converting 'Cluster Labels' from float to int
munich_merged['Cluster Labels'] = munich_merged['Cluster Labels'].astype(int)

In [38]:
munich_merged.head()

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Altstadt-Lehel,11.574582,48.137828,0,Bavarian Restaurant,Restaurant,German Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Falafel Restaurant,Seafood Restaurant,Jewish Restaurant,Kebab Restaurant
1,2,Ludwigsvorstadt-Isarvorstadt,11.573366,48.13034,0,Italian Restaurant,Vietnamese Restaurant,Asian Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Afghan Restaurant,Bavarian Restaurant,Indian Restaurant,Fast Food Restaurant
2,3,Maxvorstadt,11.562418,48.151092,0,Vietnamese Restaurant,German Restaurant,Falafel Restaurant,Sushi Restaurant,Restaurant,Ramen Restaurant,Israeli Restaurant,Grilled Meat Restaurant,Indian Restaurant,Dim Sum Restaurant
3,4,Schwabing-West,11.569873,48.168271,0,Vietnamese Restaurant,Italian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Austrian Restaurant,Turkish Restaurant,Tapas Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant
4,5,Au-Haidhausen,11.598334,48.130274,0,Italian Restaurant,German Restaurant,Indian Restaurant,French Restaurant,Bavarian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Turkish Restaurant,Seafood Restaurant


In [39]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['Latitude'], munich_merged['Longitude'], munich_merged['Stadtbezirk'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Cluster 1 - The Allround Food Cluster**

In [41]:
munich_merged.loc[munich_merged['Cluster Labels'] == 0, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt-Lehel,Bavarian Restaurant,Restaurant,German Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Falafel Restaurant,Seafood Restaurant,Jewish Restaurant,Kebab Restaurant
1,Ludwigsvorstadt-Isarvorstadt,Italian Restaurant,Vietnamese Restaurant,Asian Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Afghan Restaurant,Bavarian Restaurant,Indian Restaurant,Fast Food Restaurant
2,Maxvorstadt,Vietnamese Restaurant,German Restaurant,Falafel Restaurant,Sushi Restaurant,Restaurant,Ramen Restaurant,Israeli Restaurant,Grilled Meat Restaurant,Indian Restaurant,Dim Sum Restaurant
3,Schwabing-West,Vietnamese Restaurant,Italian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Austrian Restaurant,Turkish Restaurant,Tapas Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant
4,Au-Haidhausen,Italian Restaurant,German Restaurant,Indian Restaurant,French Restaurant,Bavarian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Turkish Restaurant,Seafood Restaurant
5,Sendling,German Restaurant,Vietnamese Restaurant,Doner Restaurant,Turkish Restaurant,Spanish Restaurant,Restaurant,Indian Restaurant,Austrian Restaurant,Dim Sum Restaurant,Dumpling Restaurant
7,Schwanthalerhöhe,Italian Restaurant,Asian Restaurant,German Restaurant,Sushi Restaurant,Doner Restaurant,Middle Eastern Restaurant,French Restaurant,Vietnamese Restaurant,Thai Restaurant,Chinese Restaurant
8,Neuhausen-Nymphenburg,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Vietnamese Restaurant,Bavarian Restaurant,Greek Restaurant,Turkish Restaurant,Spanish Restaurant,Restaurant,Ramen Restaurant
9,Moosach,Fast Food Restaurant,American Restaurant,German Restaurant,Italian Restaurant,Afghan Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant
17,Untergiesing-Harlaching,German Restaurant,Vietnamese Restaurant,Italian Restaurant,Kebab Restaurant,Greek Restaurant,Restaurant,Sushi Restaurant,Spanish Restaurant,Seafood Restaurant,Tapas Restaurant


**Cluster 2 - The Mediterran Food Cluster with Fast Food alternatives**

In [42]:
munich_merged.loc[munich_merged['Cluster Labels'] == 1, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Sendling-Westpark,Italian Restaurant,Fast Food Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant
11,Schwabing-Freimann,Fast Food Restaurant,Greek Restaurant,Afghan Restaurant,Italian Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Restaurant


**Cluster 3 - The German Focused Food Cluster with foreign alternatives**

In [43]:
munich_merged.loc[munich_merged['Cluster Labels'] == 2, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Ramersdorf-Perlach,German Restaurant,Italian Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant
16,Obergiesing-Fasangarten,German Restaurant,Afghan Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Restaurant


**Cluster 4 - The Italian Dominated Exotic Food Cluster**

In [44]:
munich_merged.loc[munich_merged['Cluster Labels'] == 3, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Bogenhausen,Italian Restaurant,Greek Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant
20,Pasing-Obermenzing,Italian Restaurant,Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant,Israeli Restaurant
22,Allach-Untermenzing,Italian Restaurant,Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Afghan Restaurant,Israeli Restaurant


**Cluster 5 - Greek Dominated Food Cluster**

In [45]:
munich_merged.loc[munich_merged['Cluster Labels'] == 4, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]

Unnamed: 0,Stadtbezirk,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Milbertshofen-Am Hart,Greek Restaurant,Thai Restaurant,German Restaurant,Afghan Restaurant,Ramen Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Restaurant
23,Feldmoching-Hasenbergl,Greek Restaurant,Afghan Restaurant,Israeli Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Ramen Restaurant,Restaurant


These five clusters categorize the range of restaurants in Munich

<h3>Results and Discussion</h3><a name="results"></a>

I am always surprised about the possibilities and how data that is freely available can be used to derive new insights.
These tools are available for free and only need to be used. I am sure that this method or tools will become much more important in the coming years. There are still so many use cases that can be implemented with this knowledge.

<h3>Conclusion</h3><a name="conclusion"></a>

The goal was achieved to divide the individual districts into different "food clusters". Thus, tourists can choose districts that best suit their preferences. This describes a wonderful data science use case that can be implemented with free available technologies.