<h1>Final Project Introduction</h1>
<h2>The Battle of Restaurants in Munich</h2>

<h3>Introduction</h3>

The topic of lifelong learning is an important one. Since it was partly no longer possible to pursue my original hobbies during my time at Corona, I used the time to familiarize myself with the topic of machine learning. I have always been interested in data and I am convinced that it is becoming more and more important for making business decisions. To that end, I've done various online courses, watched instructional videos, and done exercises. To get even more involved in the topic, I signed up for the "IBM Data Science Certificate".In this course I got familiar with Python programming in general, various libraries (numpy, pandas...) but also with the IBM Cloud itself.

The "Capstone Project" now encompasses what was previously learned over the past weeks and months. To accomplish the last task, I created a Jupyter notebook, with the associated code.
For the further analysis I chose Munich, because from my point of view it is suitable for such an analysis and it is not far away from where I live.

<h3>Business Problem / Audience</h3>

Munich is a well-known city. Millions of visitors come there every year. Probably one of the most popular events is the Oktoberfest in Munich. This is visited by about 6 million people from all over the world. As a tourist, I know the problem when you are looking for a restaurant in a foreign city and first faces the challenge of which part of town to go to at all. Of course, it should not be just any restaurant, but correspond to his own preferences.

With this exercise I would like to examine the different districts of Munich and divide them into different categories. Where do you find a high density of German food, where do you find a lot of Mediterranean food or if it has to go fast, also fast food.

The audience of this exercise are tourists from near and far.

<h3>Data</h3>

The following data was used for this project.

- Wikipedia Data - Munich Neighborhoods https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens
- Geographical Data (Longitude and Latitude)
- Foursquare Data

<h4>Wikipedia Data - Munich Neighborhoods</h4>

The Munich Neigborhoods data was extracted by Wikipedia with beautiful soup
This data includes the following:
- Nr.
- Neighbourhood
- Area
- Inhabitants
- Density (Inhabitants/km²)
- Foreigners

For this project only the Neighbourhood names where used the rest of the data was droped.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

url_path = ('https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens')
html_text = requests.get(url_path).text
soup = BeautifulSoup(html_text)
wiki_tables = soup.find_all('table', {'class': 'wikitable sortable'})
first_table = wiki_tables[0].find_all("tr")
# Extracting the text from the table cells
table_list = []

for tr in first_table:
    td = tr.find_all('td')
    row = [ele.text.strip() for ele in td]
    table_list.append(row)
    
df = pd.DataFrame(table_list, columns=['Nr.', 'Stadtbezirk', 'Fläche(km²)', 'Einwohner', 'Dichte(Einw./km²)', 'Ausländer(%)'])
df = df.dropna(how='all').reset_index(drop=True)
df = df.drop(['Fläche(km²)', 'Einwohner', 'Dichte(Einw./km²)', 'Ausländer(%)'], axis = 1)
df.head()

Unnamed: 0,Nr.,Stadtbezirk
0,1,Altstadt-Lehel
1,2,Ludwigsvorstadt-Isarvorstadt
2,3,Maxvorstadt
3,4,Schwabing-West
4,5,Au-Haidhausen


The above dataframe shows the data extracted from Wikipedia. These were further enriched with longitude and latitude data and combined with the Foursquare data.

In [2]:
df.shape

(26, 2)

This resulted in a dataframe with the 25 districts of Munich.

<h3>Geographical Data</h3>

To add longitude (latitude and longitude) to the data from each district, I used geopy.et

In [3]:
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim
   
# declare an empty list to store
# latitude and longitude of values 
# of city column
longitude = []
latitude = []
   
# function to find the coordinate
# of a given city 
def findGeocode(city):
       
    # try and catch is used to overcome
    # the exception thrown by geolocator
    # using geocodertimedout  
    try:
          
        # Specify the user_agent as your
        # app name it should not be none
        geolocator = Nominatim(user_agent="MunichGeoData")
          
        return geolocator.geocode(city)
      
    except GeocoderTimedOut:
          
        return findGeocode(city)    
  
# each value from city column
# will be fetched and sent to
# function find_geocode   
for i in (df["Stadtbezirk"]):
      
    if findGeocode(i) != None:
           
        loc = findGeocode(i)
          
        # coordinates returned from 
        # function is stored into
        # two separate list
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)
       
    # if coordinate for a city not
    # found, insert "NaN" indicating 
    # missing value 
    else:
        latitude.append(np.nan)
        longitude.append(np.nan)

In [4]:
# now add this column to dataframe
df["Longitude"] = longitude
df["Latitude"] = latitude
  
df

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude
0,1.0,Altstadt-Lehel,11.574582,48.137828
1,2.0,Ludwigsvorstadt-Isarvorstadt,11.573366,48.13034
2,3.0,Maxvorstadt,11.562418,48.151092
3,4.0,Schwabing-West,11.569873,48.168271
4,5.0,Au-Haidhausen,11.598334,48.130274
5,6.0,Sendling,11.539083,48.118012
6,7.0,Sendling-Westpark,11.519333,48.118031
7,8.0,Schwanthalerhöhe,11.541057,48.133782
8,9.0,Neuhausen-Nymphenburg,11.531517,48.154222
9,10.0,Moosach,11.875678,48.031726


In [5]:
df_munich = df[df['Stadtbezirk'] == 'Landeshauptstadt München']
type(df_munich)

longitude = df_munich['Longitude'].values[0]
latitude =  df_munich['Latitude'].values[0]

df_munich.head()

Unnamed: 0,Nr.,Stadtbezirk,Longitude,Latitude
25,,Landeshauptstadt München,11.596432,48.183699


In [6]:
print('The Longitude and Latitude of Munich is', longitude, 'and', latitude)

The Longitude and Latitude of Munich is 11.5964316 and 48.1836994


In [7]:
# applying get_value() function 

df_munich.iloc[0]['Longitude']
df_munich.loc[25]['Longitude']

11.5964316

The Longiutde and Latitude of Moosach was not recognized correctly so I corrected them manually.

In [8]:
df.loc[df.Stadtbezirk == 'Moosach', 'Longitude'] = 11.5057
df.loc[df.Stadtbezirk == 'Moosach', 'Latitude'] = 48.1799

In [9]:
import folium # map rendering library
# create map of Manhattan using latitude and longitude values
map_munich = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Stadtbezirk']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

<h3>Foursquare Data</h3>

Additionally, Foursquare data was used to identify different venues from Munich and to assign them to the districts of Munich. 

In [10]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [11]:
df_munich.loc[25, 'Stadtbezirk']
df_munich.iloc[0]['Stadtbezirk']

'Landeshauptstadt München'

In [12]:
neighborhood_latitude = df_munich.loc[25, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_munich.loc[25, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_munich.loc[25, 'Stadtbezirk'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Landeshauptstadt München are 48.1836994, 11.5964316.


In [13]:
# type your answer here
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=MHYEVMQYGJXUSZGVHDT2E1SEDSH0T13POGSBTHMCH5YWCTBW&client_secret=K0CAO03DCHBAWZGS0P4WJRTWVDIDGSAQH4HFLIB5M4CGUMHM&v=20180605&ll=48.1836994,11.5964316&radius=500&limit=100'

In [14]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6123c6d0fe575949620ef1be'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Alte Heide - Hirschau',
  'headerFullLocation': 'Alte Heide - Hirschau, Munich',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 19,
  'suggestedBounds': {'ne': {'lat': 48.188199404500004,
    'lng': 11.603168216906269},
   'sw': {'lat': 48.1791993955, 'lng': 11.589694983093732}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '53b6e112498eef48c838b15f',
       'name': "Grillin' me softly",
       'location': {'address': 'Täglich wechselnder Standort',
        'crossStreet': 'Do: Max-Diamand-Str. 7',
        'lat': 48.182678687078166,
        'lng': 11.5

In [15]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
import json # library to handle JSON files

from pandas.io.json import json_normalize

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Grillin' me softly,Food Truck,48.182679,11.595554
1,Dolzer Masskonfektionäre,Clothing Store,48.183407,11.592058
2,Leonardi,Cafeteria,48.181568,11.596857
3,Bite Delite,Café,48.182051,11.597206
4,hasia - Asian Food & Drink,Asian Restaurant,48.182621,11.594412


In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

19 venues were returned by Foursquare.


In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Stadtbezirk', 
                  'Stadtbezirk Latitude', 
                  'Stadtbezirk Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
munich_venues = getNearbyVenues(
                                    names=df['Stadtbezirk'],
                                    latitudes=df['Latitude'],
                                    longitudes=df['Longitude']
                                  )

Altstadt-Lehel
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Schwabing-West
Au-Haidhausen
Sendling
Sendling-Westpark
Schwanthalerhöhe
Neuhausen-Nymphenburg
Moosach
Milbertshofen-Am Hart
Schwabing-Freimann
Bogenhausen
Berg am Laim
Trudering-Riem
Ramersdorf-Perlach
Obergiesing-Fasangarten
Untergiesing-Harlaching
Thalkirchen-Obersendling-Forstenried-Fürstenried-Solln
Hadern
Pasing-Obermenzing
Aubing-Lochhausen-Langwied
Allach-Untermenzing
Feldmoching-Hasenbergl
Laim
Landeshauptstadt München


In [21]:
print(munich_venues.shape)
munich_venues.head()

munich_venues = munich_venues[munich_venues['Venue Category'].str.contains('Restaurant')]

(669, 7)


In [22]:
munich_venues.groupby('Stadtbezirk').count()

Unnamed: 0_level_0,Stadtbezirk Latitude,Stadtbezirk Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Stadtbezirk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach-Untermenzing,1,1,1,1,1,1
Altstadt-Lehel,23,23,23,23,23,23
Au-Haidhausen,43,43,43,43,43,43
Bogenhausen,3,3,3,3,3,3
Feldmoching-Hasenbergl,1,1,1,1,1,1
Hadern,2,2,2,2,2,2
Landeshauptstadt München,5,5,5,5,5,5
Ludwigsvorstadt-Isarvorstadt,28,28,28,28,28,28
Maxvorstadt,13,13,13,13,13,13
Milbertshofen-Am Hart,4,4,4,4,4,4


In [23]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 35 uniques categories.


In [24]:
print (nearby_venues['categories'].value_counts()[0:10])

Supermarket           2
Italian Restaurant    2
Bus Stop              2
Food Truck            1
Clothing Store        1
Cafeteria             1
Café                  1
Asian Restaurant      1
Hotel                 1
Drugstore             1
Name: categories, dtype: int64


In [25]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Grillin' me softly,Food Truck,48.182679,11.595554
1,Dolzer Masskonfektionäre,Clothing Store,48.183407,11.592058
2,Leonardi,Cafeteria,48.181568,11.596857
3,Bite Delite,Café,48.182051,11.597206
4,hasia - Asian Food & Drink,Asian Restaurant,48.182621,11.594412
5,Suite Novotel Parkstadt Schwabing,Hotel,48.179846,11.59355
6,dm-drogerie markt,Drugstore,48.182808,11.594108
7,REWE,Supermarket,48.182975,11.593756
8,Coffee Fellows,Coffee Shop,48.183399,11.594659
9,Parkstadt-Center,Shopping Mall,48.182576,11.594717


In [26]:
munich_venues

Unnamed: 0,Stadtbezirk,Stadtbezirk Latitude,Stadtbezirk Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,Altstadt-Lehel,48.137828,11.574582,Augustiner Klosterwirt,48.138649,11.572527,German Restaurant
13,Altstadt-Lehel,48.137828,11.574582,Andechser am Dom,48.138302,11.573778,Bavarian Restaurant
20,Altstadt-Lehel,48.137828,11.574582,Restaurant Dallmayr,48.138489,11.576791,German Restaurant
24,Altstadt-Lehel,48.137828,11.574582,Nürnberger Bratwurst Glöckl am Dom,48.138191,11.574165,Bavarian Restaurant
38,Altstadt-Lehel,48.137828,11.574582,Leger am Dom,48.138262,11.572932,Restaurant
...,...,...,...,...,...,...,...
654,Landeshauptstadt München,48.183699,11.596432,hasia - Asian Food & Drink,48.182621,11.594412,Asian Restaurant
660,Landeshauptstadt München,48.183699,11.596432,mammaminuti,48.183397,11.594629,Italian Restaurant
661,Landeshauptstadt München,48.183699,11.596432,Gasthaus Domagk,48.183590,11.598015,Restaurant
664,Landeshauptstadt München,48.183699,11.596432,Vitello,48.183173,11.594046,Modern European Restaurant
