# Coffee Shop in Baku

<h2>Introduction</h2>

A group of friends have decided to pull together some funding and open up a new coffee shop. But when it came down to deciding the most suitable location for the new place, opinions of the funders have split. Some wanted it to be located in the most busy suburb where there are already a lot of similar places, some wanted the new place to be located where no other (or, at least, few) coffee shops exist. Some proposed it to be in the most business-heavy suburb so that there will constantly be traffic around. 

While it is impossible to predict with any amount of certainty whether or not a new coffee shop will be successful, the aim of this project is to illuminate the potential suburbs to have the new coffee shop in and to provide the insights to the decision-makers.

<h2>Data Sources</h2>

Data to be used in this project will come from two separate sources:

1) http://www.mapcrow.info/ - this web-sites provides an easy way to extract suburbs names from complicated datasource of OpenStreetMap. By parsing this site we will be able to extract names and geographical coordinates of all city-wide suburbs without having to go through large database of objects in OSM project

2) Foursquare - this is a required step. Foursquare data will be used to determine popular locations in each suburb and help us define what would be the predominant category for each. 

In [28]:
# Importing modules to work with

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import re

#These 2 are useless, but I'm leaving them here for illustration purposes
import geocoder
import geopy

#This is for the visualizations
import folium

In [2]:
#Converting Mapcrow page to an html code
html = 'http://www.mapcrow.info/Baku-AZ-suburbs'
r = requests.get(html).text
soup = BeautifulSoup(r, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="Neighborhoods,suburbs, Baku neighborhoods, Baku suburbs, neighborhoods" name="keywords"/>
  <meta content="Neighborhoods or suburbs in Baku" name="description"/>
  <title>
   Neighborhoods and Suburbs -- Baku
  </title>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="http://www.mapcrow.info/colscols.css" rel="stylesheet"/>
  <link href="http://www.mapcrow.info/apple-touch-icon.png" rel="apple-touch-icon"/>
  <link href="http://www.mapcrow.info/apple-touch-icon-152x152.png" rel="apple-touch-icon" sizes="152x152"/>
  <link href="http://www.mapcrow.info/apple-touch-icon-167x167.png" rel="apple-touch-icon" sizes="167x167"/>
  <link href="http://www.mapcrow.info/apple-touch-icon-180x180.png" rel="apple-touch-icon" sizes="180x180"/>
  <link href="http://www.mapcrow.info/icon-hires.png" rel="icon" sizes

In [51]:
#Here we verify that information we require contains in button class
button =  [b.get('onclick')for b in soup.find_all('button')]
print(button)

["maparea('40.4082172','49.8079787','1-ci Mikrorayon'); return false;", "maparea('40.4090360','49.8187820','3-ci Mikrorayon'); return false;", "maparea('40.4164879','49.8116795','4-cü Mikrorayon'); return false;", "maparea('40.3411911','49.8076646','Badamdar'); return false;", "maparea('40.3516731','49.8319920','Bail'); return false;", "maparea('40.4228331','49.9624408','Bakikhanov'); return false;", "maparea('40.4187325','49.8497934','Dərnəgül'); return false;", "maparea('40.3935465','49.8954830','Keşlə'); return false;", "maparea('40.3953725','49.8794118','Montin'); return false;", "maparea('40.3988273','49.9794018','Qaraçuxur'); return false;", "maparea('40.4310813','49.8362475','Rəsulzadə'); return false;", "maparea('40.4333944','49.7624363','Sulutəpə'); return false;", "maparea('40.4203738','49.7665269','Xocəsən'); return false;", "maparea('40.3794440','49.8019244','Yasamal'); return false;", "maparea('40.4302954','50.0374178','Yeni Suraxanı'); return false;", "maparea('40.6238164

In [52]:
#We extract the data we require from the button class
button = [m.replace("maparea(",'') for m in button]
button = [m.replace("); return false;",'') for m in button]
#button = [m.replace("\"",'') for m in button]
button

["'40.4082172','49.8079787','1-ci Mikrorayon'",
 "'40.4090360','49.8187820','3-ci Mikrorayon'",
 "'40.4164879','49.8116795','4-cü Mikrorayon'",
 "'40.3411911','49.8076646','Badamdar'",
 "'40.3516731','49.8319920','Bail'",
 "'40.4228331','49.9624408','Bakikhanov'",
 "'40.4187325','49.8497934','Dərnəgül'",
 "'40.3935465','49.8954830','Keşlə'",
 "'40.3953725','49.8794118','Montin'",
 "'40.3988273','49.9794018','Qaraçuxur'",
 "'40.4310813','49.8362475','Rəsulzadə'",
 "'40.4333944','49.7624363','Sulutəpə'",
 "'40.4203738','49.7665269','Xocəsən'",
 "'40.3794440','49.8019244','Yasamal'",
 "'40.4302954','50.0374178','Yeni Suraxanı'",
 "'40.6238164','49.5576092','Zeynalabdin Tağıyev'"]

In [63]:
#This step is required to clean-up the data from html
#Latitude and longitude needs to be floats and names should be stripped of '' for better readability.

df = pd.DataFrame((m.split(',') for m in button), columns=['lat','lon','hood'])
df['lat'] = df['lat'].str.replace("\'",'').astype('float', inplace=True)
df['lon'] = df['lon'].str.replace("\'",'').astype('float', inplace=True)
df['hood'] = df['hood'].str.replace("\'",'')
df

Unnamed: 0,lat,lon,hood
0,40.408217,49.807979,1-ci Mikrorayon
1,40.409036,49.818782,3-ci Mikrorayon
2,40.416488,49.811679,4-cü Mikrorayon
3,40.341191,49.807665,Badamdar
4,40.351673,49.831992,Bail
5,40.422833,49.962441,Bakikhanov
6,40.418732,49.849793,Dərnəgül
7,40.393546,49.895483,Keşlə
8,40.395373,49.879412,Montin
9,40.398827,49.979402,Qaraçuxur


In [64]:
#Defining what the area will be and where Toronto is
address = 'Baku, Azerbaijan'

geolocator = geopy.Nominatim(user_agent="qaqulik")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Baku are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Baku are 40.3754434, 49.8326748.


In [67]:
# create map of Baku using latitude and longitude values
baku = folium.Map(location=[latitude, longitude], zoom_start=12)

#add markers to map
#each marker represents neighborhood of Baku

for lat, lng, hood in zip(df['lat'], df['lon'], df['hood']):
    label = '{}'.format(hood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(baku)  
    
baku

In [68]:
#Defining API credentials
CLIENT_ID = 'VR0U1WLKIP3ZMFOBPHGN1KHDPULY4AICGYLTLMVZOLTUBAWL' # your Foursquare ID
CLIENT_SECRET = '03URU1P3WMPEEJHKEMHH5RH55TSKP3MGF4XQM3NN3BRF3HEX' # your Foursquare Secret
VERSION = '20200220' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VR0U1WLKIP3ZMFOBPHGN1KHDPULY4AICGYLTLMVZOLTUBAWL
CLIENT_SECRET:03URU1P3WMPEEJHKEMHH5RH55TSKP3MGF4XQM3NN3BRF3HEX


In [69]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [76]:
#Borrowing and slightly modifying the function from the lab

def getNearbyVenues(names, latitudes, longitudes, radius=3000, limit=200):
    
    """This function is to populate required category of venues for the selected boroughs"""
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Hood', 
                  'Hood Latitude', 
                  'Hood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [77]:
baku_venues = getNearbyVenues(names=df['hood'],
                                   latitudes=df['lat'],
                                   longitudes=df['lon']
                                  )

1-ci Mikrorayon
3-ci Mikrorayon
4-cü Mikrorayon
Badamdar
Bail
Bakikhanov
Dərnəgül
Keşlə
Montin
Qaraçuxur
Rəsulzadə
Sulutəpə
Xocəsən
Yasamal
Yeni Suraxanı
Zeynalabdin Tağıyev


In [80]:
baku_venues.head()

Unnamed: 0,Hood,Hood Latitude,Hood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1-ci Mikrorayon,40.408217,49.807979,Edem Fitness & Spa,40.407561,49.811484,Gym Pool
1,1-ci Mikrorayon,40.408217,49.807979,Bravo Hipermarket 20 Yanvar,40.401658,49.810682,Food & Drink Shop
2,1-ci Mikrorayon,40.408217,49.807979,Beyaz Lotus Spa Merkezi,40.402687,49.806551,Spa
3,1-ci Mikrorayon,40.408217,49.807979,Atlantis,40.402555,49.806106,Restaurant
4,1-ci Mikrorayon,40.408217,49.807979,Qafqaz Baku City Hotel Fitness&SPA Club 16,40.396232,49.816437,Gym


In [78]:
baku_venues.groupby('Hood').count()

Unnamed: 0_level_0,Hood Latitude,Hood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Hood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1-ci Mikrorayon,100,100,100,100,100,100
3-ci Mikrorayon,100,100,100,100,100,100
4-cü Mikrorayon,100,100,100,100,100,100
Badamdar,90,90,90,90,90,90
Bail,100,100,100,100,100,100
Bakikhanov,70,70,70,70,70,70
Dərnəgül,100,100,100,100,100,100
Keşlə,95,95,95,95,95,95
Montin,100,100,100,100,100,100
Qaraçuxur,60,60,60,60,60,60


In [79]:
print('There are {} uniques categories.'.format(len(baku_venues['Venue Category'].unique())))

There are 150 uniques categories.


In [81]:
#One hot encoding the categories of the venues
baku_onehot = pd.get_dummies(baku_venues[['Venue Category']], prefix="", prefix_sep="")
#And adding back the indentifiying columns
baku_onehot[['Hood', 'Hood Latitude', 'Hood Longitude']] = baku_venues[['Hood', 'Hood Latitude', 'Hood Longitude']]
#Then we rearrange the columns so that the columns with id info will be in front
cols = baku_onehot.columns.tolist()
cols = cols[-3:] + cols[:-3]
baku_onehot = baku_onehot[cols]
baku_onehot.head()

Unnamed: 0,Hood,Hood Latitude,Hood Longitude,Airport,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,...,Theme Park Ride / Attraction,Theme Restaurant,Track Stadium,Trail,Train Station,Turkish Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store
0,1-ci Mikrorayon,40.408217,49.807979,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1-ci Mikrorayon,40.408217,49.807979,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1-ci Mikrorayon,40.408217,49.807979,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1-ci Mikrorayon,40.408217,49.807979,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1-ci Mikrorayon,40.408217,49.807979,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [82]:
#Now let's group the categories by the Neighborhood
baku_grouped = baku_onehot.groupby(['Hood', 'Hood Latitude', 'Hood Longitude']).mean().reset_index()
baku_grouped

Unnamed: 0,Hood,Hood Latitude,Hood Longitude,Airport,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,...,Theme Park Ride / Attraction,Theme Restaurant,Track Stadium,Trail,Train Station,Turkish Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store
0,1-ci Mikrorayon,40.408217,49.807979,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
1,3-ci Mikrorayon,40.409036,49.818782,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
2,4-cü Mikrorayon,40.416488,49.811679,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0
3,Badamdar,40.341191,49.807665,0.0,0.0,0.0,0.0,0.011111,0.0,0.011111,...,0.022222,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.011111
4,Bail,40.351673,49.831992,0.0,0.01,0.0,0.0,0.02,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0
5,Bakikhanov,40.422833,49.962441,0.014286,0.0,0.0,0.014286,0.0,0.0,0.014286,...,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.014286,0.014286
6,Dərnəgül,40.418732,49.849793,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
7,Keşlə,40.393546,49.895483,0.0,0.0,0.010526,0.0,0.010526,0.0,0.010526,...,0.0,0.0,0.010526,0.010526,0.0,0.052632,0.0,0.0,0.0,0.0
8,Montin,40.395373,49.879412,0.0,0.0,0.0,0.0,0.01,0.01,0.01,...,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0
9,Qaraçuxur,40.398827,49.979402,0.0,0.0,0.0,0.016667,0.0,0.0,0.016667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [96]:
#This function returns top n venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [97]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Hood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
baku_venues_sorted = pd.DataFrame(columns=columns)
baku_venues_sorted['Hood'] = baku_grouped['Hood']

for ind in np.arange(baku_grouped.shape[0]):
    baku_venues_sorted.iloc[ind, 1:] = return_most_common_venues(baku_grouped.iloc[ind, :], num_top_venues)

baku_venues_sorted.head()

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1-ci Mikrorayon,Restaurant,Lounge,Park,Hotel,Middle Eastern Restaurant,Comfort Food Restaurant,Spa,Athletics & Sports,Grocery Store,Gym
1,3-ci Mikrorayon,Restaurant,Lounge,Hotel,Park,Comfort Food Restaurant,Gym / Fitness Center,Middle Eastern Restaurant,Spa,Café,Kebab Restaurant
2,4-cü Mikrorayon,Restaurant,Hotel,Park,Lounge,Turkish Restaurant,Comfort Food Restaurant,Café,Middle Eastern Restaurant,Spa,Supermarket
3,Badamdar,Restaurant,Hotel,Park,Café,Lounge,Tea Room,Eastern European Restaurant,Theme Park Ride / Attraction,Dessert Shop,Office
4,Bail,Park,Restaurant,Café,Coffee Shop,Plaza,Hotel,Caucasian Restaurant,Historic Site,Middle Eastern Restaurant,Spa


In [98]:
baku_venues_sorted

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1-ci Mikrorayon,Restaurant,Lounge,Park,Hotel,Middle Eastern Restaurant,Comfort Food Restaurant,Spa,Athletics & Sports,Grocery Store,Gym
1,3-ci Mikrorayon,Restaurant,Lounge,Hotel,Park,Comfort Food Restaurant,Gym / Fitness Center,Middle Eastern Restaurant,Spa,Café,Kebab Restaurant
2,4-cü Mikrorayon,Restaurant,Hotel,Park,Lounge,Turkish Restaurant,Comfort Food Restaurant,Café,Middle Eastern Restaurant,Spa,Supermarket
3,Badamdar,Restaurant,Hotel,Park,Café,Lounge,Tea Room,Eastern European Restaurant,Theme Park Ride / Attraction,Dessert Shop,Office
4,Bail,Park,Restaurant,Café,Coffee Shop,Plaza,Hotel,Caucasian Restaurant,Historic Site,Middle Eastern Restaurant,Spa
5,Bakikhanov,Restaurant,Department Store,Café,Hotel,Fast Food Restaurant,Tea Room,Pub,Soccer Stadium,Farmers Market,Women's Store
6,Dərnəgül,Restaurant,Hotel,Turkish Restaurant,Park,Steakhouse,Comfort Food Restaurant,Eastern European Restaurant,Lounge,Café,Middle Eastern Restaurant
7,Keşlə,Restaurant,Hotel,Turkish Restaurant,Lounge,Eastern European Restaurant,Tea Room,Coffee Shop,Café,Brewery,Playground
8,Montin,Hotel,Restaurant,Café,Park,Lounge,Coffee Shop,Tea Room,Turkish Restaurant,Middle Eastern Restaurant,Gym / Fitness Center
9,Qaraçuxur,Café,Restaurant,Comfort Food Restaurant,Department Store,Fast Food Restaurant,Grocery Store,Park,Tea Room,Diner,Pub
