# Capstone Project - The Battle of the Neighborhoods - Part 1 (Week 4)

## 1. Introduction <a name="introduction"></a>

Stuttgart is the sixth largest city in Germany with a rapidly growing population. It has a huge variety of restaurants for every taste and, thus, to start a restaurant business in this area is not an easy task.

Our stakeholder is willing to open the beer restaurant in the city of Stuttgart with middle-high level prices.
Of course, choosing a location for business is one of the stressful and controversial tasks, since there are a lot of criteria that have to be satisfied in order to achieve the highest revenue.
Here are some of them:
- the density of other restaurants
- the density of specifically beer restaurants
- population density around the location
- solvency of the population around the location
- ...

In this project, we will implement the basic analysis and try to find the most optimal Borough to open the beer restaurant according to those criteria. It's obvious, that there are many additional factors, such as distance from parking places or distance from the main streets, but this analysis can be done after choosing the Borough, and thus will not be done within the scope of this project.

## 2. Data <a name="data"></a>

### Data description

Based on criteria listed above the following data will be utilized in our analysis:
- the number of restaurants within the certain radius of each borough (Foresquare API)
- the net income per person in each borough. Since the restaurant will have middle-high prices, it is important to consider the solvency of population. Source: Socialmonitoring der Landeshauptstadt Stuttgart (https://statistik.stuttgart.de/statistiken/sozialmonitoring/atlas/Stadtbezirke/out/atlas.html)
- the population and the population density of the borough. Source: Statistikatlas Stuttgart (https://statistik.stuttgart.de/statistiken/statistikatlas/atlas/atlas.html?indikator=i0&select=00)
- the population above 18 years age. It is obvious, that potential visitors of our beer restaurant are men and women of full age. Source: Statistikatlas Stuttgart (https://statistik.stuttgart.de/statistiken/statistikatlas/atlas/atlas.html?indikator=i0&select=00)
- the coordinates of the borough. Source: Open street map (https://nominatim.openstreetmap.org/details.php?place_id=17476218)





### Data Preparation

First, let's import all the libraries that we will need

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Load the coordinates of the boroughs into a dataframe.

In [2]:
df = pd.read_csv('stuttgart_bezirke.txt')
print(df.shape)
df.head()

(23, 3)


Unnamed: 0,Borough,Latitude,Longitude
0,Bad Cannstatt,48.807109,9.221557
1,Birkach,48.728574,9.203406
2,Botnang,48.778495,9.129532
3,Degerloch,48.744052,9.180481
4,Feuerbach,48.803635,9.149803


Load the population and population density of each borough into a new dataframe.

In [3]:
df_pop = pd.read_excel ('Bezirke_bevolkerung.xlsx')
df_pop.rename(columns = {'Bezirk':'Borough', 'Bevölkerungsdichte':'Population Density','Einwohner':'Population'}, inplace = True)
df_pop.head()

Unnamed: 0,Borough,Population Density,Population
0,Mitte,6322,24060
1,Nord,4094,27903
2,Ost,5371,48526
3,Süd,4652,44601
4,West,7558,52214


Load the the percentage of underages in each borough into a new dataframe.

In [4]:
df_18 = pd.read_excel ('Bezirke_bevolkerung_unter_18.xlsx')
df_18.rename(columns = {'Bezirk':'Borough', '% Anteil unter 18-Jährige':'Percentage under 18 years old'}, inplace = True)
df_18.head()

Unnamed: 0,Borough,Percentage under 18 years old
0,Mitte,10.2
1,Nord,15.5
2,Ost,14.1
3,Süd,13.7
4,West,12.6


Let's now apply some math into a dataframes **df_pop** and **df_18** to obtain the population of full age and create a new dataframe **df_pop**

In [5]:
#merge
df_pop['Population Density'] = round(df_pop['Population Density']*(1-0.01*df_18['Percentage under 18 years old']))
df_pop['Population'] = round(df_pop['Population']*(1-0.01*df_18['Percentage under 18 years old']))
# change the floats to integers
df_pop[['Population Density','Population']] = df_pop[['Population Density','Population']].astype('int64')
df_pop.head()

Unnamed: 0,Borough,Population Density,Population
0,Mitte,5677,21606
1,Nord,3459,23578
2,Ost,4614,41684
3,Süd,4015,38491
4,West,6606,45635


In [6]:
df_pop.dtypes

Borough               object
Population Density     int64
Population             int64
dtype: object

Load the the data of **Net income per person** in each borough into a new dataframe.

In [7]:
# amount of income
url = 'https://statistik.stuttgart.de/statistiken/sozialmonitoring/atlas/Stadtbezirke/out/Profil/report_Stadtbezirke_i20_2011.html'
list_df = pd.read_html(url, header = 0)
df_income = list_df[1]

In [8]:
df_income.rename(columns = {'Steuerpflichtiges Einkommen - 2011':'Borough', 'Steuerpflichtiges Einkommen - 2011.1':'Net income per person'}, inplace = True)
df_income.head()

Unnamed: 0,Borough,Net income per person
0,Nord,114.4
1,Degerloch,114.0
2,Birkach,111.9
3,Sillenbuch,109.5
4,Vaihingen,107.7
