# The Battle of the Neighbourhoods - full

# Introduction


Munich is one of the populous cities of Germany. It is the capital of Bayern and one of the diverse cities in Germany. Munich is the home of Bayern Munich football club and also tourist hub of the beautiful alps dominant landscape. Moreover, yearly organizing city of famous Oktoberfest. It is also a global hub of business and commerce. Munich is also a city of two world famous university in the world and thousands of students from the different part of the world come here to study and research purposes.

# Business Problem

As the number of residents increases every year, finding right place to live is always very difficult here, so is the finding of a good restaurants. Its very important to know that, whenever you are trying to move to new places, how is the new neighbourhood. What type of restaurants or supermarkets are around there? 

# Data and Methodology

I will be using https://www.muenchen.de/int/en/living/postal-codes.html for the postal code data and district name of Munich city to solve the task.
To get the latitude and longitude data I will use python geopy library, where only name of the neighbourhood is required to find the latitude and longitude values for the given address.

After scraping the website, data will be stored in the data frames. In some districts, they have multiple postal codes! This is due to the largeness of all districts and to further divide them for getting a better localization. So as first step, each postal code gets it’s own entry in a new pandas data frame in order to get more detailed information about the venues being in a small radius around the centre of each postal code. 

As next step the available top 100 venues shall be fetched for each postal code. For this task, an API call to the Foursquare API is performed. The Foursquare API offers location data from all over the world for business purpose as well as for developers. I will also use Foursquare API. As Foursquare API offers location data from all over the world for business purpose as well as for developers.

In total there are 200 unique venue categories available. As next step the data has to be prepared for the clustering algorithm. In general, the K-Means algorithm only works with numerical data an we have categorical data. In order to be able to apply the K-Means algorithm, the Venue Categories first have to get one-hot encoded. Additionally, the one-hot encoded data frame gets grouped by the districts in order to have one row and therefore one cluster for each district.

# Target Audience

Main audience of the project is the residents of the Munich city, specially the new residents, who have very little knowledge about the city or new neighbourhoods.

# Result
To help the users to find the appropriate place to live or very similiar neighborhoods like they are already enjoying, KMeans clustering algorithm is used to cluster the neighborhoods of every district in Munich according to its venues.

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests
import pandas as pd
import numpy as np
import random

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np

from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 18.8MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/36/69/d82d04022f02733bf9a72bc3b96332d360c0c5307096d76f6bb7489f7e57/soupsieve-2.2.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.3 soupsieve-2.2.1
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.12.5  |  

Importing geographical coordinates of Munich.

In [2]:
address = 'Munich'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_munich = location.latitude
longitude_munich = location.longitude
print('The geograpical coordinate of Munich are {}, {}.'.format(latitude_munich, longitude_munich))

The geograpical coordinate of Munich are 48.1371079, 11.5753822.


To get the postal codes of Munich, I used official website. In some districts, they have multiple postal codes!

In [3]:
url = 'https://www.muenchen.de/int/en/living/postal-codes.html'
munich_data_list = pd.read_html(url)
munich_data = munich_data_list[0]
munich_data

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,"80995, 80997, 80999, 81247, 81249"
1,Altstadt-Lehel,"80331, 80333, 80335, 80336, 80469, 80538, 80539"
2,Au-Haidhausen,"81541, 81543, 81667, 81669, 81671, 81675, 81677"
3,Aubing-Lochhausen-Langwied,"81243, 81245, 81249"
4,Berg am Laim,"81671, 81673, 81735, 81825"
5,Bogenhausen,"81675, 81677, 81679, 81925, 81927, 81929"
6,Feldmoching-Hasenbergl,"80933, 80935, 80995"
7,Hadern,"80689, 81375, 81377"
8,Laim,"80686, 80687, 80689"
9,Ludwigsvorstadt-Isarvorstadt,"80335, 80336, 80337, 80469"


Spliting the the postal codes in reference to districts

In [4]:
munich_data_cleaned = pd.DataFrame(columns=['District', 'Postal Code'])
munich_data_cleaned.head()

Unnamed: 0,District,Postal Code


In [5]:
items = []
for idx, codes in enumerate(munich_data['Postal Code']):
    code_list = codes.split(',')
    district = munich_data['District'][idx]
    for element in code_list:
        element = element.replace(' ', '')
        items.append({'District': district, 'Postal Code': element})

In [6]:
munich_data_cleaned = munich_data_cleaned.append(items)
munich_data_cleaned.head()

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,80995
1,Allach-Untermenzing,80997
2,Allach-Untermenzing,80999
3,Allach-Untermenzing,81247
4,Allach-Untermenzing,81249


Using Foursquare API to fetch all latitude and longitude values for each Postal Code

In [7]:
CLIENT_ID = 'DAES5YNOYVYBHMSO1CDGO1OHREASFOJDMEMBCJF03ZBTOMK2'
CLIENT_SECRET = 'KDJGSO0DGEX4MSW1EAYIY4BXDY3ETFUAHS2BIUBCSZOYJBPE'
VERSION = '20200410'
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DAES5YNOYVYBHMSO1CDGO1OHREASFOJDMEMBCJF03ZBTOMK2
CLIENT_SECRET:KDJGSO0DGEX4MSW1EAYIY4BXDY3ETFUAHS2BIUBCSZOYJBPE


creating new dataframe additionally containing the latitude and longitude values of each district and postal code mapping

In [8]:
munich_data_ll = pd.DataFrame(columns=['District', 'Postal Code', 'Latitude', 'Longitude'])

items = []
for idx, district in enumerate(munich_data_cleaned['District']):
    code = munich_data_cleaned['Postal Code'][idx]
    address = district + ', ' + code

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    items.append({'District': district, 
                  'Postal Code': code,
                  'Latitude': latitude,
                  'Longitude': longitude})

In [9]:
munich_data_ll = munich_data_ll.append(items)
munich_data_ll.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude
0,Allach-Untermenzing,80995,48.195157,11.462973
1,Allach-Untermenzing,80997,48.195157,11.462973
2,Allach-Untermenzing,80999,48.195157,11.462973
3,Allach-Untermenzing,81247,48.195157,11.462973
4,Allach-Untermenzing,81249,48.195157,11.462973


# Data Visualization
Creating a map of all districts in Munich using latitude and longitude

In [10]:
map_munich = folium.Map(location=[munich_data_ll["Latitude"].iloc[0], munich_data_ll["Longitude"].iloc[0]], zoom_start=11)

for lat, lng, district in zip(munich_data_ll['Latitude'], munich_data_ll['Longitude'], munich_data_ll['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

# Neighborhood Exploration
Exploring all neighborhood in Munich by fetching venues in the near of each district with the help of the foursquare API.

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [12]:
munich_venues = getNearbyVenues(names=munich_data_ll['District'],
                                   latitudes=munich_data_ll['Latitude'],
                                   longitudes=munich_data_ll['Longitude']
                                  )

Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Berg am Laim
Berg am Laim
Berg am Laim
Berg am Laim
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Hadern
Hadern
Hadern
Laim
Laim
Laim
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Moosach
Moosach
Moosach
Moosach
Moosach
Neuhausen-Nymphenburg
Neuhausen-Nym

Shape of the new dataframe

In [13]:
munich_venues.shape

(3213, 7)

In [14]:
munich_venues.head()

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allach-Untermenzing,48.195157,11.462973,Bäckerei Schuhmair,48.197175,11.459016,Bakery
1,Allach-Untermenzing,48.195157,11.462973,Sport Bittl,48.191447,11.466553,Sporting Goods Shop
2,Allach-Untermenzing,48.195157,11.462973,dm-drogerie markt,48.194118,11.46564,Drugstore
3,Allach-Untermenzing,48.195157,11.462973,Sicilia,48.193331,11.459387,Italian Restaurant
4,Allach-Untermenzing,48.195157,11.462973,Lidl,48.194428,11.465612,Supermarket


In [15]:
munich_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach-Untermenzing,45,45,45,45,45,45
Altstadt-Lehel,700,700,700,700,700,700
Au-Haidhausen,252,252,252,252,252,252
Berg am Laim,30,30,30,30,30,30
Bogenhausen,71,71,71,71,71,71
Feldmoching-Hasenbergl,6,6,6,6,6,6
Hadern,33,33,33,33,33,33
Laim,66,66,66,66,66,66
Ludwigsvorstadt-Isarvorstadt,172,172,172,172,172,172
Maxvorstadt,387,387,387,387,387,387


In [16]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 165 uniques categories.


Analyze every District

In [17]:
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

munich_onehot.insert(0, 'District', munich_data_ll['District'])
munich_onehot.head()

Unnamed: 0,District,Afghan Restaurant,American Restaurant,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The frequency frequency of occurrence of each category in the neighborhood 

In [18]:
munich_grouped = munich_onehot.groupby('District').mean().reset_index()
munich_grouped.head(10)

Unnamed: 0,District,Afghan Restaurant,American Restaurant,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,Allach-Untermenzing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altstadt-Lehel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Au-Haidhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aubing-Lochhausen-Langwied,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berg am Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Feldmoching-Hasenbergl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hadern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ludwigsvorstadt-Isarvorstadt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
munich_grouped.shape

(25, 166)

Let's print each neighborhood along with the top 5 most common venues

In [20]:
num_top_venues = 5

for hood in munich_grouped['District']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach-Untermenzing----
                 venue  freq
0               Bakery   0.2
1            Drugstore   0.2
2          Supermarket   0.2
3  Sporting Goods Shop   0.2
4   Italian Restaurant   0.2


----Altstadt-Lehel----
             venue  freq
0        Drugstore  0.29
1           Bakery  0.14
2      Supermarket  0.14
3  Automotive Shop  0.14
4       Playground  0.14


----Au-Haidhausen----
                venue  freq
0         Supermarket  0.29
1              Bakery  0.14
2  Italian Restaurant  0.14
3           Drugstore  0.14
4          Playground  0.14


----Aubing-Lochhausen-Langwied----
                 venue  freq
0   Italian Restaurant  0.33
1  Sporting Goods Shop  0.33
2            Drugstore  0.33
3                 Park  0.00
4               Museum  0.00


----Berg am Laim----
               venue  freq
0        Supermarket  0.50
1          Drugstore  0.25
2    Automotive Shop  0.25
3  Afghan Restaurant  0.00
4               Park  0.00


----Bogenhausen----
             

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Let's find the new dataframe and display the top 10 venues for each neighborhood.

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = munich_grouped['District']

for ind in np.arange(munich_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
1,Altstadt-Lehel,Drugstore,Sporting Goods Shop,Playground,Supermarket,Automotive Shop,Bakery,English Restaurant,Food,Fish Market,Fast Food Restaurant
2,Au-Haidhausen,Supermarket,Playground,Italian Restaurant,Automotive Shop,Drugstore,Bakery,Yoga Studio,English Restaurant,Food,Fish Market
3,Aubing-Lochhausen-Langwied,Drugstore,Italian Restaurant,Sporting Goods Shop,Electronics Store,Food,Fish Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
4,Berg am Laim,Supermarket,Drugstore,Automotive Shop,Yoga Studio,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market


# Clustering Neighborhoods
As we have an overview about the data and made some first explorations, let's cluster the neighborhoods in order to get an idea about the types of neighborhoods and which district seems to be similar to which other districts.

In [23]:
num_clusters = 5

X = munich_grouped.drop('District', 1)

kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(X)

In [24]:
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = munich_data_ll

munich_merged = munich_merged.join(district_venues_sorted.set_index('District'), on='District')

munich_merged.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,80995,48.195157,11.462973,2,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
1,Allach-Untermenzing,80997,48.195157,11.462973,2,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
2,Allach-Untermenzing,80999,48.195157,11.462973,2,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
3,Allach-Untermenzing,81247,48.195157,11.462973,2,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
4,Allach-Untermenzing,81249,48.195157,11.462973,2,Sporting Goods Shop,Italian Restaurant,Supermarket,Drugstore,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant


Finally, visualization of the resulting clusters

In [25]:
map_clusters = folium.Map(location=[latitude_munich, longitude_munich], zoom_start=11)

indian_red = '#CD5C5C'
blue = '#2980B9'
purple = '#5B2C6F'
gold = '#F1C40F'
green = '#239B56'
x = np.arange(num_clusters)
rainbow = [indian_red, blue, purple, gold, green]

markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['Latitude'], munich_merged['Longitude'], munich_merged['District'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examining the Clusters
Each cluster shall be examined according to its most frequent venues and the cluster names shall be given accordingly.

Lets examine the green cluster (number zero)

In [26]:
cluster0 = munich_merged.loc[munich_merged['Cluster Labels'] == 0, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster0['1st Most Common Venue'].value_counts()

Hotel                23
Plaza                 9
Café                  6
Afghan Restaurant     5
Organic Grocery       4
Cupcake Shop          3
Name: 1st Most Common Venue, dtype: int64

Lets examine the indian red cluster (number one)

In [27]:
cluster1 = munich_merged.loc[munich_merged['Cluster Labels'] == 1, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster1['1st Most Common Venue'].value_counts()

Café    2
Name: 1st Most Common Venue, dtype: int64

Lets examine the blue cluster (number two)

In [28]:
cluster2 = munich_merged.loc[munich_merged['Cluster Labels'] == 2, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster2['1st Most Common Venue'].value_counts()

Sporting Goods Shop    11
Drugstore               3
Italian Restaurant      3
Bakery                  3
Name: 1st Most Common Venue, dtype: int64

Lets examine the purple cluster (number three)

In [29]:
cluster3 = munich_merged.loc[munich_merged['Cluster Labels'] == 3, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster3['1st Most Common Venue'].value_counts()

German Restaurant    7
Irish Pub            5
Café                 5
Food Court           5
Department Store     4
Steakhouse           4
Name: 1st Most Common Venue, dtype: int64

Lets examine the yellow cluster (number four)

In [30]:
cluster4 = munich_merged.loc[munich_merged['Cluster Labels'] == 4, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster4['1st Most Common Venue'].value_counts()

Supermarket    14
Drugstore       7
Playground      4
Name: 1st Most Common Venue, dtype: int64

# Results and Discussion
As we can see from the above data visualization, the green cluster in Munich are the most common clusters. Data suggest that Munich has lots of similar districts in it. Though the green clusters are almost centre of the city compared to other clusters which are well distributed in the city. So, it will be very much easy for the users to learn more about the districts where they want to move next. Decision making would be more easier by using it.
As we have also find out more about the clusters, we now know which cluster got the most common venues. So that you can understand the neighbourhoods more easily. From the result, we can now call each cluster by their frequent venues and can predict the activities around. As result indicates, we can call the green cluster ‘Tourist cluster’, as it has hotels as most frequent venues and also plazas, we can easily estimate the activities in this neighbourhoods. You can probably find lots of advertisement in Airbnb for the short time stay.
Blue clusters are very good place to live, as you can find most frequently Restaurants and Bakery. Though I know most of the German love to live in the purple cluster where you can find German Restaurants, Irish Pubs and Café. We can easily call this cluster ‘German cluster’. Though my favourite cluster is yellow cluster as you can see supermarkets are very frequent and lots of playground are around.

# Conclusion
The purpose of this project was to compare the most popular venue categories and their respective venue types in Munich. So that a person from Munich  will be able to make an educated decision as to which part of the city they would like to move next based on their own personal preferences.