# Battle of the Neighborhoods


### Introduction

This project is for who is planning to open a Coffee House in Ankara, Turkey. This project suggests the best locations for Coffee Houses in Ankara. Ankara is the capital of Turkey with a population of 5M. Turkish culture has deep roots in coffee. Also, it is getting popular because of its affordable prices. Coffee consumption has increased 13% and average consumption is 1.1 kg/person in 2018. This report explores which neighborhoods of Ankara have the most as well as the best Coffee Houses. Also, this project answers the questions “Where should I open an Coffee House?” and “Where should I stay If I want a tasty coffee?”

### Data

District of Ankara are obtained from https://en.wikipedia.org/wiki/Ankara_Province

Latitude and Longitude values are obtained by using "geocoder".

All data related to locations will be obtaine by using FourSquare API and Python Libraries.


In [12]:
!pip install geocoder
!pip install bs4
import requests
import pandas as pd
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
import geocoder



In [13]:
wiki_link = 'https://en.wikipedia.org/wiki/Ankara_Province'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wiki_page = requests.get(wiki_link, headers = headers)
wiki_page

<Response [200]>

In [14]:
soup = BeautifulSoup(wiki_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

In [15]:
rows = table.find_all('tr')

In [16]:
columns = [i.text.replace('\n', '') for i in rows[0].find_all('th')]
columns

['District', 'Population (2017)', 'Area (km²)', 'Density (per km²)']

In [17]:
df_ankara = pd.DataFrame(columns = columns)

In [18]:
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df_ankara = df_ankara.append(pd.Series(values, index = columns), ignore_index = True)

        df_ankara

In [19]:
df_ankara.head()

Unnamed: 0,District,Population (2017),Area (km²),Density (per km²)
0,Akyurt,32.863,369.0,89.0
1,Altındağ,371.366,123.0,3.019
2,Ayaş,12.289,1.041,12.0
3,Bala,21.682,1.851,12.0
4,Beypazarı,48.476,1.697,29.0


In [20]:
def get_latlng(arcgis_geocoder):
    
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Ankara, Turkey'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

districts = df_ankara['District']    
coordinates = [get_latlng(districts) for districts in districts.tolist()]

df_ankara_loc = df_ankara

df_ankara_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_ankara_loc['Latitude'] = df_ankara_coordinates['Latitude']
df_ankara_loc['Longitude'] = df_ankara_coordinates['Longitude']

df_ankara_loc.drop(columns="Population (2017)", axis=1, inplace=True)
df_ankara_loc.drop(columns="Density (per km²)", axis=1, inplace=True)
df_ankara_loc.drop(columns="Area (km²)", axis=1, inplace=True)
df_ankara_loc.head()

Unnamed: 0,District,Latitude,Longitude
0,Akyurt,40.13082,33.08719
1,Altındağ,39.94171,32.85445
2,Ayaş,40.01516,32.3327
3,Bala,39.55391,33.12352
4,Beypazarı,40.16811,31.92052


In [21]:
import numpy as np
import json 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print("Libraries imported")

Libraries imported


In [22]:
from geopy.geocoders import Nominatim 

address = "Çankaya, Ankara"

geolocator = Nominatim(user_agent = "Ankara_explorer")

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print("The geographical coordinates of Ankara are {}, {}.".format(latitude, longitude))

The geographical coordinates of Ankara are 39.9207893, 32.8540412.


In [23]:
map_ankara = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, label in zip(df_ankara_loc["Latitude"], df_ankara_loc["Longitude"], df_ankara_loc["District"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_ankara)  
    
map_ankara

In [24]:
CLIENT_ID = "FBMIJHTR42BI0I5JUWFNREL3NTHT553IQUQ3AM1NGMQDDSHJ"
CLIENT_SECRET = "OVH1MFLGAQG4O4IKBKKHQLZJTDBL2QNFSV2R1NKA4YUBINQC" 
VERSION = "20180605"

In [25]:
df_ankara_loc.loc [0, "District"]
df_ankara_loc.loc [0, "District"]

'Akyurt'

In [26]:
neighborhood_latitude = df_ankara_loc.loc[0, "Latitude"]
neighborhood_longitude = df_ankara_loc.loc[0, "Longitude"] 

neighborhood_name = df_ankara_loc.loc[0, "District"] 

print("Latitude and longitude values of the neighborhood {} are {}, {}.".format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and longitude values of the neighborhood Akyurt are 40.13082000000003, 33.08719000000008.


In [27]:
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=FBMIJHTR42BI0I5JUWFNREL3NTHT553IQUQ3AM1NGMQDDSHJ&client_secret=OVH1MFLGAQG4O4IKBKKHQLZJTDBL2QNFSV2R1NKA4YUBINQC&v=20180605&ll=40.13082000000003,33.08719000000008&radius=500&limit=100'

In [28]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6120b8dfe6d1c6746b3d24ee'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Akyurt',
  'headerFullLocation': 'Akyurt, Ankara',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 17,
  'suggestedBounds': {'ne': {'lat': 40.13532000450003,
    'lng': 33.093064640939495},
   'sw': {'lat': 40.126319995500026, 'lng': 33.08131535906066}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5399e2be498ee5525aaee721',
       'name': 'Akyurt kültür parkı',
       'location': {'lat': 40.1331231812071,
        'lng': 33.08507121328348,
        'labeledLatLngs': [{'label': 'display',
          'lat': 40.1331231812071,
          'lng': 33.0850712132

In [29]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [30]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON


filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]


nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(20)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Akyurt kültür parkı,Park,40.133123,33.085071
1,Meşhur Köfteci Bodur,Turkish Restaurant,40.130719,33.086776
2,Yurtalan Restaurant,Kebab Restaurant,40.132672,33.084306
3,Beyazıt Sofrası,Steakhouse,40.130542,33.081901
4,Onur pide ve kebap salonu,Kebab Restaurant,40.130807,33.086422
5,Yurtalan,Turkish Restaurant,40.132743,33.084385
6,Akyurt Sofrası,Turkish Restaurant,40.130854,33.086709
7,Benliler Supermarket,Convenience Store,40.130252,33.081423
8,Yurtalan Restaurant,Doner Restaurant,40.132641,33.084492
9,şişman pasta&cafe,Café,40.13082,33.085025


In [31]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

17 venues were returned by Foursquare.


In [32]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
ankara_venues = getNearbyVenues(names=df_ankara_loc['District'],
                                   latitudes=df_ankara_loc['Latitude'],
                                   longitudes=df_ankara_loc['Longitude']
                                  )

Akyurt
Altındağ
Ayaş
Bala
Beypazarı
Çamlıdere
Çankaya
Çubuk
Elmadağ
Etimesgut
Evren
Gölbaşı
Güdül
Haymana
Kahramankazan
Kalecik
Keçiören
Kızılcahamam
Mamak
Nallıhan
Polatlı
Pursaklar
Sincan
Şereflikoçhisar
Yenimahalle
Urban (9 districts)Altındağ, Çankaya, Etimesgut, Gölbaşı, Keçiören, Mamak, Pursaklar, Sincan, Yenimahalle
TOTAL


In [34]:
print(ankara_venues.shape)
ankara_venues.head(250)

(1674, 7)


Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Akyurt,40.13082,33.08719,Akyurt kültür parkı,40.133123,33.085071,Park
1,Akyurt,40.13082,33.08719,Yurtalan Restaurant,40.132672,33.084306,Kebab Restaurant
2,Akyurt,40.13082,33.08719,Beyazıt Sofrası,40.130542,33.081901,Steakhouse
3,Akyurt,40.13082,33.08719,Meşhur Köfteci Bodur,40.130719,33.086776,Turkish Restaurant
4,Akyurt,40.13082,33.08719,Onur pide ve kebap salonu,40.130807,33.086422,Kebab Restaurant
...,...,...,...,...,...,...,...
245,Beypazarı,40.16811,31.92052,Sağlık Meslek Spor Salonu,40.173256,31.933203,Trail
246,Beypazarı,40.16811,31.92052,Tolunay Özaka Sağlık Meslek Lisesi,40.173212,31.933535,Volleyball Court
247,Beypazarı,40.16811,31.92052,optimum,40.157738,31.913723,Shopping Mall
248,Beypazarı,40.16811,31.92052,akyazı,40.172203,31.936853,Residential Building (Apartment / Condo)


In [35]:
ankara_venues.groupby("District").count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Akyurt,28,28,28,28,28,28
Altındağ,100,100,100,100,100,100
Ayaş,36,36,36,36,36,36
Bala,13,13,13,13,13,13
Beypazarı,72,72,72,72,72,72
Elmadağ,53,53,53,53,53,53
Etimesgut,100,100,100,100,100,100
Evren,5,5,5,5,5,5
Gölbaşı,100,100,100,100,100,100
Güdül,10,10,10,10,10,10


In [36]:
print('There are {} unique venue categories.'.format(len(ankara_venues['Venue Category'].unique())))

There are 225 unique venue categories.


In [37]:
ankara_onehot = pd.get_dummies(ankara_venues[['Venue Category']], prefix="", prefix_sep="")

ankara_onehot['Neighborhood'] = ankara_venues['District'] 

fixed_columns = [ankara_onehot.columns[-1]] + list(ankara_onehot.columns[:-1])
ankara_onehot = ankara_onehot[fixed_columns]

ankara_onehot

Unnamed: 0,Neighborhood,ATM,Advertising Agency,Airport Lounge,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Turkish Home Cooking Restaurant,Turkish Restaurant,Used Bookstore,Video Game Store,Vineyard,Volleyball Court,Water Park,Wedding Hall,Women's Store,Yoga Studio
0,Akyurt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Akyurt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Akyurt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Akyurt,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,Akyurt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1669,TOTAL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1670,TOTAL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1671,TOTAL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1672,TOTAL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [38]:
ankara_onehot.shape

(1674, 226)

In [39]:
ankara_grouped = ankara_onehot.groupby('Neighborhood').mean().reset_index()
ankara_grouped

Unnamed: 0,Neighborhood,ATM,Advertising Agency,Airport Lounge,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Turkish Home Cooking Restaurant,Turkish Restaurant,Used Bookstore,Video Game Store,Vineyard,Volleyball Court,Water Park,Wedding Hall,Women's Store,Yoga Studio
0,Akyurt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altındağ,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.02,0.0,...,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ayaş,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bala,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Beypazarı,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.013889,...,0.013889,0.083333,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0
5,Elmadağ,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0
6,Etimesgut,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,...,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
7,Evren,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Gölbaşı,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
9,Güdül,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [40]:
ankara_grouped.shape


(27, 226)

In [41]:
num_top_venues = 10

for hood in ankara_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ankara_grouped[ankara_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Akyurt----
                venue  freq
0                Farm  0.14
1  Turkish Restaurant  0.11
2                Café  0.07
3    Kebab Restaurant  0.07
4                Park  0.07
5   Convenience Store  0.04
6              Garden  0.04
7   Electronics Store  0.04
8             Stadium  0.04
9    Botanical Garden  0.04


----Altındağ----
                venue  freq
0                Café  0.10
1      History Museum  0.07
2       Jewelry Store  0.05
3               Hotel  0.05
4  Turkish Restaurant  0.04
5       Historic Site  0.04
6          Restaurant  0.04
7             Theater  0.04
8         Coffee Shop  0.03
9        Antique Shop  0.03


----Ayaş----
                     venue  freq
0       Turkish Restaurant  0.14
1                     Park  0.08
2                     Lake  0.06
3               Restaurant  0.06
4                 Tea Room  0.06
5            Deli / Bodega  0.03
6  Fruit & Vegetable Store  0.03
7               Food Court  0.03
8  Comfort Food Restaurant  0.03
9    

In [42]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [43]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
ankara_venues_sorted = pd.DataFrame(columns=columns)
ankara_venues_sorted['Neighborhood'] = ankara_grouped['Neighborhood']

for ind in np.arange(ankara_grouped.shape[0]):
    ankara_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ankara_grouped.iloc[ind, :], num_top_venues)

ankara_venues_sorted.head(27)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Akyurt,Farm,Turkish Restaurant,Café,Kebab Restaurant,Park,Health & Beauty Service,Stadium,Garden,Basketball Stadium,Big Box Store
1,Altındağ,Café,History Museum,Jewelry Store,Hotel,Historic Site,Restaurant,Turkish Restaurant,Theater,Antique Shop,Furniture / Home Store
2,Ayaş,Turkish Restaurant,Park,Tea Room,Lake,Restaurant,Plaza,Shopping Mall,Café,Coffee Shop,Comfort Food Restaurant
3,Bala,Turkish Restaurant,Business Service,Electronics Store,Café,Mountain,Steakhouse,Park,Bakery,Big Box Store,Arcade
4,Beypazarı,Turkish Restaurant,History Museum,Restaurant,Tea Room,Scenic Lookout,Motel,Bakery,Garden,Hotel,Bed & Breakfast
5,Elmadağ,Café,Park,Campground,Bar,Kebab Restaurant,Dessert Shop,Convenience Store,Hobby Shop,Jewelry Store,Kofte Place
6,Etimesgut,Turkish Restaurant,Café,Restaurant,Bagel Shop,Steakhouse,Arcade,Snack Place,Shopping Mall,Kebab Restaurant,Dessert Shop
7,Evren,Park,Grocery Store,Steakhouse,River,Yoga Studio,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Flower Shop
8,Gölbaşı,Café,Turkish Restaurant,Park,Restaurant,Coffee Shop,Arcade,Breakfast Spot,Dessert Shop,Middle Eastern Restaurant,Steakhouse
9,Güdül,Plaza,Department Store,Turkish Restaurant,Furniture / Home Store,Athletics & Sports,Electronics Store,Castle,Botanical Garden,Diner,Fish & Chips Shop


In [44]:
from sklearn.cluster import KMeans


In [46]:
ks = 3

ankara_grouped_clustering = ankara_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=ks, random_state=0).fit(ankara_grouped_clustering)

kmeans.labels_[1:10]

array([1, 1, 1, 1, 1, 1, 2, 1, 0], dtype=int32)