# Applied Data Science Capstone Project #

## 1. Introduction ##

### 1.1. Background ###

Istanbul is the largest city in Turkey, constituting the country's economic, cultural, and historical heart. Istanbul is a transcontinental city in Eurasia, with its commercial and historical centre lying on the European side and about a third of its population living on the Asian side of Eurasia. With a population of 15.07 million as of December 31 2018, the city forms the largest urban agglomeration in Europe as well as the largest in the Middle East, and the sixth-largest city proper in the world. Istanbul's vast area of 2,063 square miles is coterminous with Istanbul Province, of which the city is the administrative capital.

Istanbul has population density is 6,530 people per square mile and it has 39 districts in total.

Due to its historical significance and being in the intersection of three continents Istanbul is also known for great food. This makes Istanbul a desirable destination for tourist attraction and food related business opportunies.

### 1.2. Problem ###

Istanbul already has a lot of restaurants. How should we decide the best location for another one?

In this project, I will examine and find the best location for opening a new restaurant.

### 1.3. Interest ###

The results of this project can be helpful for investors, employees, suppliers, and food enthusiasts.

## 2. Data Acquisition and Cleaning ##

### 2.1. Data sources ###

- [Second-level Administrative Divisions, Turkey, 2015](https://geo.nyu.edu/catalog/stanford-nj696zj1674) from Spatial Data Repository of NYU: A geojson file with the geometry objects of the districts of the cities of Turkey. For this project, I will extract the rows for Istanbul and find the center points for each borough.

- [Foursquare API](https://developer.foursquare.com/): Will be used for finding the restaurants in each borough.

### 2.2. Data Acquisition and Cleaning ###

Install and import required libraries

In [1]:
import pandas as pd
!pip install geopandas
import geopandas as gpd



Read the geojson file into a dataframe

In [2]:
df = gpd.read_file("https://geo.nyu.edu/download/file/stanford-nj696zj1674-geojson.json")

Look at the first 5 rows of the dataframe

In [3]:
df.head()

Unnamed: 0,id,id_0,iso,name_0,id_1,name_1,id_2,name_2,hasc_2,ccn_2,cca_2,type_2,engtype_2,nl_name_2,varname_2,geometry
0,nj696zj1674.1,235,TUR,Turkey,1,Çanakkale,1,Çan,TR.CK.CA,0,,District,District,,,"(POLYGON ((26.98407936 39.86386108, 26.9731197..."
1,nj696zj1674.2,235,TUR,Turkey,1,Çanakkale,2,Ayvacık,TR.CK.AY,0,,District,District,,,"(POLYGON ((26.44069481 39.51625061, 26.4404163..."
2,nj696zj1674.3,235,TUR,Turkey,1,Çanakkale,3,Bayramiç,TR.CK.BA,0,,District,District,,,"(POLYGON ((26.4226017 39.69835663, 26.42402458..."
3,nj696zj1674.4,235,TUR,Turkey,1,Çanakkale,4,Biga,TR.CK.BI,0,,District,District,,,"(POLYGON ((27.57680511 40.31188583, 27.5807285..."
4,nj696zj1674.5,235,TUR,Turkey,1,Çanakkale,5,Bozcaada,TR.CK.BO,0,,District,District,,,"(POLYGON ((26.06097221 39.9406929, 26.06097221..."


Filter the necessary columns into df2

In [4]:
df2 = df[['name_1', 'name_2', 'geometry']]

Filter the rows with city name 'Istanbul'

In [5]:
df2 = df2[df2['name_1'] == 'Istanbul']

Reset index

In [6]:
df2.reset_index(inplace=True, drop=True)

Take a look at the dataframe

In [7]:
df2.head()

Unnamed: 0,name_1,name_2,geometry
0,Istanbul,Çatalca,"(POLYGON ((28.54563522 41.38847351, 28.5436115..."
1,Istanbul,Çekmekoy,"(POLYGON ((29.17944717 41.02779007, 29.1788482..."
2,Istanbul,Adalar,"(POLYGON ((29.0518055 40.91485977, 29.0518055 ..."
3,Istanbul,Ümraniye,"(POLYGON ((29.17944717 41.02779007, 29.1770515..."
4,Istanbul,Üsküdar,"(POLYGON ((28.20846939 41.06986237, 28.2070827..."


Since all boroughs are of Istanbul now, drop the "name_1" column

In [8]:
df2.drop(['name_1'], axis=1, inplace=True)

Correct the column names

In [9]:
df2.rename(columns= {'name_2': 'Borough'}, inplace=True)

Take another look at the dataframe

In [10]:
df2.head()

Unnamed: 0,Borough,geometry
0,Çatalca,"(POLYGON ((28.54563522 41.38847351, 28.5436115..."
1,Çekmekoy,"(POLYGON ((29.17944717 41.02779007, 29.1788482..."
2,Adalar,"(POLYGON ((29.0518055 40.91485977, 29.0518055 ..."
3,Ümraniye,"(POLYGON ((29.17944717 41.02779007, 29.1770515..."
4,Üsküdar,"(POLYGON ((28.20846939 41.06986237, 28.2070827..."


Delete first dataframe to save memory

In [11]:
del df

Find the centers of the boroughs

In [12]:
df2['Center'] = df2['geometry'].centroid

Find the latitude and longitude of each borough center

In [13]:
df2['Longitude'] = df2['Center'].apply(lambda p: p.x)
df2['Latitude'] = df2['Center'].apply(lambda p: p.y)

Look at the dataframe again

In [14]:
df2

Unnamed: 0,Borough,geometry,Center,Longitude,Latitude
0,Çatalca,"(POLYGON ((28.54563522 41.38847351, 28.5436115...",POINT (28.40395065280459 41.3036808013895),28.403951,41.303681
1,Çekmekoy,"(POLYGON ((29.17944717 41.02779007, 29.1788482...",POINT (29.27865428644358 41.08152978930723),29.278654,41.08153
2,Adalar,"(POLYGON ((29.0518055 40.91485977, 29.0518055 ...",POINT (29.09589217657206 40.87084018625993),29.095892,40.87084
3,Ümraniye,"(POLYGON ((29.17944717 41.02779007, 29.1770515...",POINT (29.12329130594628 41.03728585521165),29.123291,41.037286
4,Üsküdar,"(POLYGON ((28.20846939 41.06986237, 28.2070827...",POINT (28.28742452864804 41.15882206752408),28.287425,41.158822
5,Arnavutkoy,"(POLYGON ((28.83302498 41.13262177, 28.8330497...",POINT (28.6924409915991 41.22317279934558),28.692441,41.223173
6,Atasehir,"(POLYGON ((29.06557083 41.01100922, 29.0655708...",POINT (29.11740618068368 41.00336549589552),29.117406,41.003365
7,Avcılar,"(POLYGON ((28.76737785 40.992836, 28.76796532 ...",POINT (28.72650253033136 41.02166473235334),28.726503,41.021665
8,Şişli,"(POLYGON ((29.00967216 41.10829926, 29.0085907...",POINT (28.98859883919462 41.10578935974578),28.988599,41.105789
9,Şile,"(POLYGON ((29.36600304 41.04473114, 29.3568592...",POINT (29.58191381839693 41.10711500169273),29.581914,41.107115
