# BERLIN // Where to open a new Hotel: neigborhood analysis

# 1 - Introduction

## 1.1 Discussion of the problem

After COVID pandemic more problematic waves have passsed and once vaccination process is fully advanced, a well known spanish hospitality company is planning to open a new hotel in Berlin. What will be the best place to do it?

## 1.2 Discussion of the background

During 2020 tourism was one the main industries affected by lockdown decisions around the world. According to UN data, during 2020 international arrivals are estimated to have dropped to 381 million, down from 1.461 billion $ in 2019 — a 74% decline. In countries whose economies are heavily reliant on tourism as the suth of Europe (Italy, Portugal, Greece or Spain), the precipitous drop in visitors was, and remains, devastating.

Berlin was not left out of this huge crisis. It is the capital and the biggest city of Germany, the second most populous city in the European Union, Berlin has nearly 3,6 million residents from more than 190 countries with a population density of 4,200 people per km², the city is divided into 12 boroughs, 95 neighborhoods. 
Also it is considered a top European destination – ranked third after London and Paris.

During 2020 even though the world is facing the Coronavirus crisis, Berlin welcomed almost 5 millions tourists in the whole year 2020, which represents a decrease of 65% of the same period in 2019.
At the beggining of 2021, between January and April 400,000 tourists have visited Berlin, and it is expected these figures could rise as vaccination process improves and frontiers are widely opened.
Actually there are 635 accommodation establishments classified as "hotels" (includes hotels, guesthouses and bed & breakfast properties) in Berlin.

In order to face this issue, we can solve this problem by creating a map and information chart that shows the real distribution of hotels in Berlin and clustering each area according to the density of the place.
We will need to find a method to use Foursquare location data where machine learning to help us make decisions for hte spanish hospitality company. 

In this project, I will try to use Foursquare location data and clustering methods to divide regions into different groups based on their hotel location information.

# 2 - Data description: how it helps to solve the problem

For this project, data needed is as follows:

**1 - Berlin neigborood data: list of Boroughs and neighborhoods and their latitudes and longitudes.**
<ul>
<li> Data source: https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin </li>
<li> Description: We will discard the Berlin area (district) table through Wikipedia. Then using geocoder class of the Geopy to get coordinates (lattitude and longited) of these 12 main areas. </li>
</ul>
    
**2 - Hotels in each neighborhood in Berlin:**

<ul>
<li> Data source: Foursquare API </li>
<li> Description: By using this API, we will obtain all venues in each community. We can filter these places to get only hotels. </li>
</ul>

# 3 - Methodology

## 3.1 Getting information from Berlin's neighborhood

First of all, we get information about boroughs and neighborhood of Berlin scrapped from Wikipedia

In [1]:
!pip install bs4
from bs4 import BeautifulSoup
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1272 sha256=157d0b23716fcc2649f352ee90dff5657f583695061f2114153bdc022b8a89f7
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


In [2]:
!wget -O berlin.html https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin

--2021-07-16 08:24:23--  https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin
Resolving en.wikipedia.org (en.wikipedia.org)... 208.80.154.224, 2620:0:861:ed1a::1
Connecting to en.wikipedia.org (en.wikipedia.org)|208.80.154.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 207057 (202K) [text/html]
Saving to: ‘berlin.html’


2021-07-16 08:24:23 (567 KB/s) - ‘berlin.html’ saved [207057/207057]



Parse the html file

In [3]:
with open('berlin.html','r') as berlin_html:
    soup_berlin = BeautifulSoup(berlin_html, 'html.parser')

Create a dataframe with the list of neighbourhoods from the html file

In [4]:
df_berlin = []
for tr in soup_berlin.find_all('tr'):
    row = tr.text.replace('(','').replace(')','')
    row = row.split('\n')
    row = list(filter(lambda s: s != '', row)) # delete empty strings from list
    row = list(map(lambda s: s.strip(), row)) # remove leading and trailing spaces from strings in list
                 
    if row[0][0:4].isdigit():
        row = row[0].split(' ', 1)
        df_berlin.append(row)

df_berlin = pd.DataFrame(df_berlin)
df_berlin.columns = ['neighborhood_id', 'neighborhood']

Get list of boroughs in ID order and add to each neighbourhood

In [5]:
boroughs = []
for dt in soup_berlin.find_all('dt'):
    boroughs.append(dt.text[5:])

# add borough 
borough = []
for lid in df_berlin.neighborhood_id:
    borough.append(boroughs[int(lid)//100-1])
    
df_berlin['borough'] = borough
df_berlin['city'] = 'Berlin'

df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city
0,0101,Mitte,Mitte,Berlin
1,0102,Moabit,Mitte,Berlin
2,0103,Hansaviertel,Mitte,Berlin
3,0104,Tiergarten,Mitte,Berlin
4,0105,Wedding,Mitte,Berlin
...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin
92,1208,Lübars,Reinickendorf,Berlin
93,1209,Wittenau,Reinickendorf,Berlin
94,1210,Märkisches Viertel,Reinickendorf,Berlin


## 3.2 Adding coordinates for each neighborhood

Now the aim is to add coordinates details for each of 96 neighborhood. We will use Geopy client detailes as follows. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>be_explorer</em>, as shown below.

In [6]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.13.0          

Getting information about Berlin coordinates:

In [8]:
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent="be_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Berlin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Berlin are 52.5170365, 13.3888599.


Now details about latitude and longitude for all neighborhoods

In [9]:
geolocator = Nominatim(user_agent="be_explorer")

df_berlin['neighborhood_coord']= df_berlin['neighborhood'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df_berlin[['Latitude', 'Longitude']] = df_berlin['neighborhood_coord'].apply(pd.Series)

df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city,neighborhood_coord,Latitude,Longitude
0,0101,Mitte,Mitte,Berlin,"(39.98020495, -7.905590887431517)",39.980205,-7.905591
1,0102,Moabit,Mitte,Berlin,"(52.5301017, 13.3425422)",52.530102,13.342542
2,0103,Hansaviertel,Mitte,Berlin,"(52.5191234, 13.3418725)",52.519123,13.341872
3,0104,Tiergarten,Mitte,Berlin,"(50.3409222, 6.956329)",50.340922,6.956329
4,0105,Wedding,Mitte,Berlin,"(52.550123, 13.34197)",52.550123,13.341970
...,...,...,...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin,"(52.6080354, 13.3225327)",52.608035,13.322533
92,1208,Lübars,Reinickendorf,Berlin,"(52.6146467, 13.3530197)",52.614647,13.353020
93,1209,Wittenau,Reinickendorf,Berlin,"(52.5912366, 13.3233195)",52.591237,13.323320
94,1210,Märkisches Viertel,Reinickendorf,Berlin,"(52.5993123, 13.3565324)",52.599312,13.356532


In [10]:
df_berlin.drop(['neighborhood_coord'], axis=1, inplace=True)
df_berlin

Unnamed: 0,neighborhood_id,neighborhood,borough,city,Latitude,Longitude
0,0101,Mitte,Mitte,Berlin,39.980205,-7.905591
1,0102,Moabit,Mitte,Berlin,52.530102,13.342542
2,0103,Hansaviertel,Mitte,Berlin,52.519123,13.341872
3,0104,Tiergarten,Mitte,Berlin,50.340922,6.956329
4,0105,Wedding,Mitte,Berlin,52.550123,13.341970
...,...,...,...,...,...,...
91,1207,Waidmannslust,Reinickendorf,Berlin,52.608035,13.322533
92,1208,Lübars,Reinickendorf,Berlin,52.614647,13.353020
93,1209,Wittenau,Reinickendorf,Berlin,52.591237,13.323320
94,1210,Märkisches Viertel,Reinickendorf,Berlin,52.599312,13.356532


Create a map of berlin with neighborhoods details.

In [11]:
# create map of Cologne using latitude and longitude 
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
for lat, lng, label in zip(df_berlin['Latitude'], df_berlin['Longitude'], df_berlin['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin) 
    
map_berlin

## 3.3 Data analysis using Foursquare API

The aim of this part is getting details using data exploraty analysis in order to extract valuable information and insights about all these 96 different neigborhoods. The aim is getting rich information which could help us to make the rights decisions.

First of all we will use Foursquare API to explore the neighborhoods of Berlin and segment them.

In [12]:
CLIENT_ID = 'BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM' # your Foursquare ID
CLIENT_SECRET = 'MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM
CLIENT_SECRET:MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0


In [14]:
df_berlin.loc[0, 'neighborhood']

'Mitte'

In [15]:
neighborhood_latitude = df_berlin.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_berlin.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_berlin.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Mitte are 39.98020495, -7.905590887431517.


In [18]:
# Defining parameters to Foursqaue API

LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=BR3G0GSMYNJNDMJMI4VBRWOC3JY0ETEQZEH2FQ4QH0XXDLZM&client_secret=MOU5CFO22YMABCZ3XBNZAFWEPLBU5LPVIPX3YXYCCUPFKNG0&v=20180605&ll=39.98020495,-7.905590887431517&radius=1000&limit=100'

In [19]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60f14fb67263604df534aaed'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': 39.98920495900001,
    'lng': -7.893867544095733},
   'sw': {'lat': 39.97120494099999, 'lng': -7.917314230767301}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5b9642f665211f002c799be8',
       'name': 'Yoga Evolution Retreats',
       'location': {'address': 'Yoga Evolution Retreats',
        'crossStreet': 'Quinta Do Bacelo',
        'lat': 39.979904,
        'lng': -7.9148088,
        'labeledLatLngs': [{'label': 'display',
          'lat': 39.979904,
          'lng': -7.9148088}],
        'distance': 787,
        'posta

In [35]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [47]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import pandas as pd # library for data analsysis

In [48]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


ImportError: cannot import name 'AggFuncType' from 'pandas._typing' (/opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages/pandas/_typing.py)