# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

This project's goal is to find an optimal kind of venue (increased probability of high demand) to be opened in the downtown of a capital city. More specifically, this report aims to people interested in opening a business in the center of 
 **Athens, Greece**.

To maximize the chances of success data from centers of seven big cities will be compared to conclude, according to Athens' downtown data, **what is the recommended type of venue to open**. Instead of extensive globalization, there is a prospect that the downtown of Athens may miss something.

The data will be collected via foursquare API. To increase models odds for a satisfying result will be checked the **venues for all city centers**. Cities that will be explored are **Cape Town, Moscow, Paris, Rio De Janeriro, San Francisco, Seoul, Sydney**.

## Data <a name="data"></a>

Following data sources will be needed to extract/generate the required information:
* number of venues, their type and location in every downtown 
* a venue being frequently tipped indicates that people are interested in the venue and would like to share their experience with all other users. So, we will use tips per category as feature for the final recommendation. Because Venue Tips is a premium endpoint (it is not free) we limit our search in 10 venues. 

The above data will be obtained by **Foursquare API**

**To Explore venues**
> `https://api.foursquare.com/v2/venues/`**explore**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&limit=`**LIMIT**

### City Centers

Let's first define the address for every capital city center.

* **Athens**: Ermou street, Athens, Greece
* **Cape Town**: Cape Town City Centre, Cape Town, South Africa
* **Moscow**: Red Square, Moscow, Russia
* **Paris**: 1st arrondissement of Paris, Paris, France
* **Rio de Janeiro**: Centro, Rio de Janeiro, Brasil
* **San Francisco**: Union Square, San Francisco, CA, USA
* **Seoul**: Gangnam, Seoul, South Korea
* **Sydney**: Circular Quay, Sydney NSW, Australia

### Import necessary Libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed
Libraries imported.


### Define Foursquare Credentials and Version

In [2]:
CLIENT_ID = 'OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV' # your Foursquare ID
CLIENT_SECRET = 'XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG' # your Foursquare Secret
VERSION = '20200702'
LIMIT = 50
radius = 4500 # Radius from the center to create the downtown area
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV
CLIENT_SECRET:XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG


### Converting every city's center address to its latitude and longitude coordinates.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>foursquare_agent</em>, as shown below.

In [3]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Rio de Janeiro, Brasil

In [4]:
addr_Rio = 'Centro, Rio de Janeiro, Brasil'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Rio)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: -22.9043934 -43.1830653
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [5]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=-22.9043934,-43.1830653&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [6]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [7]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [8]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.venuePage.id,venue.location.neighborhood
0,e-0-57fbcc8a498e8cafef9efce4-0,0,"[{'summary': 'This spot is popular', 'type': '...",57fbcc8a498e8cafef9efce4,Hamburgueria SA,"R. Miguel Couto, 42",R. da Alfândega,-22.902379,-43.179329,"[{'label': 'display', 'lat': -22.9023792035566...",...,BR,Rio de Janeiro,RJ,Brasil,"[R. Miguel Couto, 42 (R. da Alfândega), Rio de...","[{'id': '4bf58dd8d48988d16c941735', 'name': 'B...",0,[],,
1,e-0-51b74a30498eba88fb25605b-1,0,"[{'summary': 'This spot is popular', 'type': '...",51b74a30498eba88fb25605b,L'Atelier du Cuisinier,"Rua Teófilo Otoni, 97",,-22.900518,-43.180231,"[{'label': 'display', 'lat': -22.9005180236074...",...,BR,Rio de Janeiro,RJ,Brasil,"[Rua Teófilo Otoni, 97, Rio de Janeiro, RJ, 20...","[{'id': '4bf58dd8d48988d10c941735', 'name': 'F...",0,[],,
2,e-0-546b5817498e8875ea325b57-2,0,"[{'summary': 'This spot is popular', 'type': '...",546b5817498e8875ea325b57,Starbucks,"R. Miguel Couto, 7",,-22.903484,-43.178457,"[{'label': 'display', 'lat': -22.9034838760111...",...,BR,Rio de Janeiro,RJ,Brasil,"[R. Miguel Couto, 7, Rio de Janeiro, RJ, 20070...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],,
3,e-0-4eef53ae93add02fce44d093-3,0,"[{'summary': 'This spot is popular', 'type': '...",4eef53ae93add02fce44d093,Starbucks,"R. Gonçalves Dias, 51",,-22.904947,-43.178935,"[{'label': 'display', 'lat': -22.9049465047897...",...,BR,Rio de Janeiro,RJ,Brasil,"[R. Gonçalves Dias, 51, Rio de Janeiro, RJ, 20...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],,
4,e-0-4b058721f964a5202e8122e3-4,0,"[{'summary': 'This spot is popular', 'type': '...",4b058721f964a5202e8122e3,Rio Scenarium,"R. do Lavradio, 20",,-22.908255,-43.183938,"[{'label': 'display', 'lat': -22.9082552015834...",...,BR,Rio de Janeiro,RJ,Brasil,"[R. do Lavradio, 20, Rio de Janeiro, RJ, 20230...","[{'id': '4bf58dd8d48988d1e5931735', 'name': 'M...",0,[],,


In [9]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Hamburgueria SA,Burger Joint,"R. Miguel Couto, 42",R. da Alfândega,-22.902379,-43.179329,"[{'label': 'display', 'lat': -22.9023792035566...",443,20070-030,BR,Rio de Janeiro,RJ,Brasil,"[R. Miguel Couto, 42 (R. da Alfândega), Rio de...",,57fbcc8a498e8cafef9efce4
1,L'Atelier du Cuisinier,French Restaurant,"Rua Teófilo Otoni, 97",,-22.900518,-43.180231,"[{'label': 'display', 'lat': -22.9005180236074...",520,20090-070,BR,Rio de Janeiro,RJ,Brasil,"[Rua Teófilo Otoni, 97, Rio de Janeiro, RJ, 20...",,51b74a30498eba88fb25605b
2,Starbucks,Coffee Shop,"R. Miguel Couto, 7",,-22.903484,-43.178457,"[{'label': 'display', 'lat': -22.9034838760111...",483,20070-030,BR,Rio de Janeiro,RJ,Brasil,"[R. Miguel Couto, 7, Rio de Janeiro, RJ, 20070...",,546b5817498e8875ea325b57
3,Starbucks,Coffee Shop,"R. Gonçalves Dias, 51",,-22.904947,-43.178935,"[{'label': 'display', 'lat': -22.9049465047897...",427,20050-030,BR,Rio de Janeiro,RJ,Brasil,"[R. Gonçalves Dias, 51, Rio de Janeiro, RJ, 20...",,4eef53ae93add02fce44d093
4,Rio Scenarium,Music Venue,"R. do Lavradio, 20",,-22.908255,-43.183938,"[{'label': 'display', 'lat': -22.9082552015834...",439,20230-070,BR,Rio de Janeiro,RJ,Brasil,"[R. do Lavradio, 20, Rio de Janeiro, RJ, 20230...",,4b058721f964a5202e8122e3


In [10]:
rio_ven = dataframe_filtered['categories'].value_counts()
rio_ven

Coffee Shop                  5
Bookstore                    5
Historic Site                4
Theater                      4
Art Museum                   3
Church                       3
Music Venue                  3
Middle Eastern Restaurant    2
Tram Station                 2
Chocolate Shop               1
Italian Restaurant           1
Japanese Restaurant          1
Pedestrian Plaza             1
Burger Joint                 1
Outdoor Sculpture            1
Plaza                        1
Peruvian Restaurant          1
History Museum               1
Brazilian Restaurant         1
Concert Hall                 1
French Restaurant            1
Flea Market                  1
Supermarket                  1
Garden                       1
Gym / Fitness Center         1
Monument / Landmark          1
Hostel                       1
Science Museum               1
Name: categories, dtype: int64

Let's create a dataframe that will contain the type of venues for every city

In [11]:
ven_types = pd.DataFrame(rio_ven)
ven_types.columns = ['Rio']
ven_types.index.name = 'Venue Type'
ven_types

Unnamed: 0_level_0,Rio
Venue Type,Unnamed: 1_level_1
Coffee Shop,5
Bookstore,5
Historic Site,4
Theater,4
Art Museum,3
Church,3
Music Venue,3
Middle Eastern Restaurant,2
Tram Station,2
Chocolate Shop,1


Let's create a dataframe with the number of TIPS from Rio.

In [12]:
cat_tips1 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips1.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips1.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [13]:
cat_tips1

Unnamed: 0,Tips,Category
0,36.0,Burger Joint
1,21.0,French Restaurant
2,52.0,Coffee Shop
3,352.0,Coffee Shop
4,534.0,Music Venue
5,15.0,Bookstore
6,53.0,Bookstore


### Seoul, South Korea

In [14]:
addr_Seoul = 'Gangnam, Seoul, South Korea'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Seoul)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: 37.4976977 127.0276828
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [15]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=37.4976977,127.0276828&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [16]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [17]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [18]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.cc,venue.location.neighborhood,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet
0,e-0-599bf6559de23b461b6d4029-0,0,"[{'summary': 'This spot is popular', 'type': '...",599bf6559de23b461b6d4029,CUCHARA (쿠차라),서초구 서초대로74길 11,37.497337,127.026569,"[{'label': 'display', 'lat': 37.49733713145521...",106,...,KR,서초2동,서울특별시,서울특별시,대한민국,"[서초구 서초대로74길 11, 서초2동, 서초구, 서울특별시, 06620, 대한민국]","[{'id': '4bf58dd8d48988d1c1941735', 'name': 'M...",0,[],
1,e-0-599ae767123a1921c39b8a5d-1,0,"[{'summary': 'This spot is popular', 'type': '...",599ae767123a1921c39b8a5d,Starbucks Reserve (스타벅스 리저브),강남구 강남대로 390,37.497843,127.02875,"[{'label': 'display', 'lat': 37.49784342127571...",95,...,KR,역삼1동,서울특별시,서울특별시,대한민국,"[강남구 강남대로 390 (강남R점), 역삼1동, 강남구, 서울특별시, 06232,...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],강남R점
2,e-0-5a2105c0c5b11c3dd7c465a4-2,0,"[{'summary': 'This spot is popular', 'type': '...",5a2105c0c5b11c3dd7c465a4,Paul Bassett (폴바셋),서초구 서초대로 411,37.497881,127.025702,"[{'label': 'display', 'lat': 37.49788138576333...",176,...,KR,서초4동,서울특별시,서울특별시,대한민국,"[서초구 서초대로 411 (GT타워점), 서초4동, 서울특별시, 서울특별시, 066...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],GT타워점
3,e-0-57772f2e498e9a080e16b6e2-3,0,"[{'summary': 'This spot is popular', 'type': '...",57772f2e498e9a080e16b6e2,KAKAO Friends Flagship Store (카카오프렌즈),서초구 강남대로 429,37.500894,127.026339,"[{'label': 'display', 'lat': 37.50089415291258...",375,...,KR,역삼1동,서울특별시,서울특별시,대한민국,"[서초구 강남대로 429 (강남플래그십스토어), 역삼1동, 서초구, 서울특별시, 0...","[{'id': '52f2ab2ebcbc57f1066b8b1b', 'name': 'S...",0,[],강남플래그십스토어
4,e-0-53a25c5b498ee8842b348739-4,0,"[{'summary': 'This spot is popular', 'type': '...",53a25c5b498ee8842b348739,NIKE (나이키),강남구 강남대로 446,37.502416,127.025836,"[{'label': 'display', 'lat': 37.502416, 'lng':...",549,...,KR,,서울특별시,서울특별시,대한민국,"[강남구 강남대로 446 (강남 플래그십 스토어), 역삼1동, 서울특별시, 서울특별...","[{'id': '4bf58dd8d48988d1f2941735', 'name': 'S...",0,[],강남 플래그십 스토어


In [19]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,crossStreet,id
0,CUCHARA (쿠차라),Mexican Restaurant,서초구 서초대로74길 11,37.497337,127.026569,"[{'label': 'display', 'lat': 37.49733713145521...",106,06620,KR,서초2동,서울특별시,서울특별시,대한민국,"[서초구 서초대로74길 11, 서초2동, 서초구, 서울특별시, 06620, 대한민국]",,599bf6559de23b461b6d4029
1,Starbucks Reserve (스타벅스 리저브),Coffee Shop,강남구 강남대로 390,37.497843,127.02875,"[{'label': 'display', 'lat': 37.49784342127571...",95,06232,KR,역삼1동,서울특별시,서울특별시,대한민국,"[강남구 강남대로 390 (강남R점), 역삼1동, 강남구, 서울특별시, 06232,...",강남R점,599ae767123a1921c39b8a5d
2,Paul Bassett (폴바셋),Coffee Shop,서초구 서초대로 411,37.497881,127.025702,"[{'label': 'display', 'lat': 37.49788138576333...",176,06615,KR,서초4동,서울특별시,서울특별시,대한민국,"[서초구 서초대로 411 (GT타워점), 서초4동, 서울특별시, 서울특별시, 066...",GT타워점,5a2105c0c5b11c3dd7c465a4
3,KAKAO Friends Flagship Store (카카오프렌즈),Souvenir Shop,서초구 강남대로 429,37.500894,127.026339,"[{'label': 'display', 'lat': 37.50089415291258...",375,06612,KR,역삼1동,서울특별시,서울특별시,대한민국,"[서초구 강남대로 429 (강남플래그십스토어), 역삼1동, 서초구, 서울특별시, 0...",강남플래그십스토어,57772f2e498e9a080e16b6e2
4,NIKE (나이키),Sporting Goods Shop,강남구 강남대로 446,37.502416,127.025836,"[{'label': 'display', 'lat': 37.502416, 'lng':...",549,135-931,KR,,서울특별시,서울특별시,대한민국,"[강남구 강남대로 446 (강남 플래그십 스토어), 역삼1동, 서울특별시, 서울특별...",강남 플래그십 스토어,53a25c5b498ee8842b348739


Let's check Seoul's number of different type of venues.

In [20]:
seoul_ven = dataframe_filtered['categories'].value_counts()
seoul_ven

Coffee Shop             7
Bakery                  5
BBQ Joint               5
Sake Bar                4
Korean Restaurant       3
Noodle House            3
Japanese Restaurant     2
Hotel                   2
Burger Joint            2
Italian Restaurant      1
Gym / Fitness Center    1
Dive Bar                1
Supermarket             1
Sporting Goods Shop     1
Seafood Restaurant      1
Lounge                  1
Chinese Restaurant      1
Video Game Store        1
Café                    1
Janguh Restaurant       1
Trail                   1
Souvenir Shop           1
Mexican Restaurant      1
Bookstore               1
Gym                     1
Dance Studio            1
Name: categories, dtype: int64

In [21]:
seoul_ven = pd.DataFrame(seoul_ven)
seoul_ven.columns = ['Seoul']
seoul_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=seoul_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')

In [115]:
ven_types

Unnamed: 0_level_0,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1
Coffee Shop,6.0,5.0
BBQ Joint,5.0,
Korean Restaurant,3.0,
Sake Bar,3.0,
Bakery,3.0,
Japanese Restaurant,2.0,1.0
Noodle House,2.0,
Burger Joint,2.0,1.0
Hotel,2.0,
Dive Bar,1.0,


Let's create a dataframe with the number of TIPS from Seoul.

In [22]:
cat_tips2 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips2.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips2.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [23]:
cat_tips2

Unnamed: 0,Tips,Category
0,27.0,Mexican Restaurant
1,7.0,Coffee Shop
2,2.0,Coffee Shop
3,13.0,Souvenir Shop
4,8.0,Sporting Goods Shop
5,3.0,Trail
6,41.0,Gym


### San Francisco

In [24]:
addr_SanFrancisco= 'Union Square, San Francisco, CA, USA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_SanFrancisco)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: 37.7879363 -122.40751740318035
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [25]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=37.7879363,-122.40751740318035&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [26]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [27]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [28]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.photos.count,venue.photos.groups,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name,venue.location.neighborhood,venue.venuePage.id
0,e-0-4b4bd8caf964a5207ba926e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b4bd8caf964a5207ba926e3,The Archive,315 Sutter St,btwn Stockton & Grant Ave.,37.789494,-122.405766,"[{'label': 'display', 'lat': 37.78949409500821...",...,0,[],,,,,,,,
1,e-0-528d4fe211d2543b7663f4fd-1,0,"[{'summary': 'This spot is popular', 'type': '...",528d4fe211d2543b7663f4fd,Saint Laurent,108 Geary St,,37.787774,-122.405412,"[{'label': 'display', 'lat': 37.78777380886315...",...,0,[],,,,,,,,
2,e-0-551cfcaf498e23f2c0115449-2,0,"[{'summary': 'This spot is popular', 'type': '...",551cfcaf498e23f2c0115449,Maison Margiela,134 Maiden Ln,,37.788261,-122.405765,"[{'label': 'display', 'lat': 37.78826107452542...",...,0,[],,,,,,,,
3,e-0-4a6f3531f964a5209cd51fe3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4a6f3531f964a5209cd51fe3,The Olympic Club,524 Post St,at Taylor St,37.788181,-122.411067,"[{'label': 'display', 'lat': 37.78818105686167...",...,0,[],,,,,,,,
4,e-0-58cc4d2fe0adac17bbf4838e-4,0,"[{'summary': 'This spot is popular', 'type': '...",58cc4d2fe0adac17bbf4838e,Pushkin,380 Bush St,at Kearny St,37.790943,-122.403877,"[{'label': 'display', 'lat': 37.79094301071348...",...,0,[],2135211.0,https://www.grubhub.com/restaurant/pushkin-380...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,


In [29]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,The Archive,Men's Store,315 Sutter St,btwn Stockton & Grant Ave.,37.789494,-122.405766,"[{'label': 'display', 'lat': 37.78949409500821...",231,94108,US,San Francisco,CA,United States,"[315 Sutter St (btwn Stockton & Grant Ave.), S...",,4b4bd8caf964a5207ba926e3
1,Saint Laurent,Boutique,108 Geary St,,37.787774,-122.405412,"[{'label': 'display', 'lat': 37.78777380886315...",186,94108,US,San Francisco,CA,United States,"[108 Geary St, San Francisco, CA 94108, United...",,528d4fe211d2543b7663f4fd
2,Maison Margiela,Boutique,134 Maiden Ln,,37.788261,-122.405765,"[{'label': 'display', 'lat': 37.78826107452542...",158,94108,US,San Francisco,CA,United States,"[134 Maiden Ln, San Francisco, CA 94108, Unite...",,551cfcaf498e23f2c0115449
3,The Olympic Club,Gym / Fitness Center,524 Post St,at Taylor St,37.788181,-122.411067,"[{'label': 'display', 'lat': 37.78818105686167...",313,94102,US,San Francisco,CA,United States,"[524 Post St (at Taylor St), San Francisco, CA...",,4a6f3531f964a5209cd51fe3
4,Pushkin,Russian Restaurant,380 Bush St,at Kearny St,37.790943,-122.403877,"[{'label': 'display', 'lat': 37.79094301071348...",463,94104,US,San Francisco,CA,United States,"[380 Bush St (at Kearny St), San Francisco, CA...",,58cc4d2fe0adac17bbf4838e


Let's check San Fransisco's number of different type of venues. 

In [30]:
san_francisco_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
san_francisco_ven.columns = ['San Francisco']
san_francisco_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=san_francisco_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')

In [34]:
ven_types.head()

Unnamed: 0_level_0,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Coffee Shop,3.0,7.0,5.0
Art Museum,3.0,,3.0
Park,3.0,,
Pizza Place,3.0,,
Theater,2.0,,4.0


Let's create a dataframe with the number of TIPS from San Francisco.

In [35]:
cat_tips3 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips3.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips3.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [36]:
cat_tips3

Unnamed: 0,Tips,Category
0,4.0,Men's Store
1,0.0,Boutique
2,1.0,Boutique
3,8.0,Gym / Fitness Center
4,18.0,Russian Restaurant
5,7.0,Wine Shop
6,155.0,Garden


### Sydney, Australia

In [37]:
addr_Sydney= 'Circular Quay, Sydney NSW, Australia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Sydney)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: -33.86153 151.21005289845323
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [38]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=-33.86153,151.21005289845323&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [39]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [40]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [41]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.venuePage.id,venue.location.neighborhood,venue.location.crossStreet
0,e-0-4b058762f964a520648f22e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b058762f964a520648f22e3,The Opera House to the Botanic Gardens Walk,Macquarie St.,-33.860914,151.213221,"[{'label': 'display', 'lat': -33.8609141115099...",300,...,Sydney,NSW,Australia,"[Macquarie St., Sydney NSW 2000, Australia]","[{'id': '4bf58dd8d48988d159941735', 'name': 'T...",0,[],,,
1,e-0-50d0ddd7e4b0707f741b55f1-1,0,"[{'summary': 'This spot is popular', 'type': '...",50d0ddd7e4b0707f741b55f1,Cabrito Coffee Traders,"Ground Floor, 10-14 Bulletin Place",-33.862516,151.209324,"[{'label': 'display', 'lat': -33.8625161266321...",128,...,Sydney,NSW,Australia,"[Ground Floor, 10-14 Bulletin Place, Sydney NS...","[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",0,[],44233971.0,,
2,e-0-4b058760f964a520988e22e3-2,0,"[{'summary': 'This spot is popular', 'type': '...",4b058760f964a520988e22e3,Opera Bar,"Sydney Opera House, Macquarie Street",-33.858409,151.213976,"[{'label': 'display', 'lat': -33.8584089, 'lng...",502,...,Sydney,NSW,Australia,"[Sydney Opera House, Macquarie Street, Sydney ...","[{'id': '4bf58dd8d48988d11e941735', 'name': 'C...",0,[],,,
3,e-0-4be371a4d27a20a1ae5a925b-3,0,"[{'summary': 'This spot is popular', 'type': '...",4be371a4d27a20a1ae5a925b,BridgeClimb Sydney,3 Cumberland St.,-33.857518,151.207832,"[{'label': 'display', 'lat': -33.8575178783097...",491,...,The Rocks,NSW,Australia,"[3 Cumberland St., The Rocks NSW 2000, Australia]","[{'id': '56aa371be4b08b9a8d573520', 'name': 'T...",0,[],,,
4,e-0-4e3dd3ecd22d102e8547e4cc-4,0,"[{'summary': 'This spot is popular', 'type': '...",4e3dd3ecd22d102e8547e4cc,The Tea Cosy,33 George Street,-33.857413,151.208561,"[{'label': 'display', 'lat': -33.8574126897852...",478,...,The Rocks,NSW,Australia,"[33 George Street, The Rocks NSW 2000, Australia]","[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",0,[],,The Rocks,


In [42]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,crossStreet,id
0,The Opera House to the Botanic Gardens Walk,Trail,Macquarie St.,-33.860914,151.213221,"[{'label': 'display', 'lat': -33.8609141115099...",300,2000,AU,Sydney,NSW,Australia,"[Macquarie St., Sydney NSW 2000, Australia]",,,4b058762f964a520648f22e3
1,Cabrito Coffee Traders,Café,"Ground Floor, 10-14 Bulletin Place",-33.862516,151.209324,"[{'label': 'display', 'lat': -33.8625161266321...",128,2000,AU,Sydney,NSW,Australia,"[Ground Floor, 10-14 Bulletin Place, Sydney NS...",,,50d0ddd7e4b0707f741b55f1
2,Opera Bar,Cocktail Bar,"Sydney Opera House, Macquarie Street",-33.858409,151.213976,"[{'label': 'display', 'lat': -33.8584089, 'lng...",502,2000,AU,Sydney,NSW,Australia,"[Sydney Opera House, Macquarie Street, Sydney ...",,,4b058760f964a520988e22e3
3,BridgeClimb Sydney,Tour Provider,3 Cumberland St.,-33.857518,151.207832,"[{'label': 'display', 'lat': -33.8575178783097...",491,2000,AU,The Rocks,NSW,Australia,"[3 Cumberland St., The Rocks NSW 2000, Australia]",,,4be371a4d27a20a1ae5a925b
4,The Tea Cosy,Café,33 George Street,-33.857413,151.208561,"[{'label': 'display', 'lat': -33.8574126897852...",478,2000,AU,The Rocks,NSW,Australia,"[33 George Street, The Rocks NSW 2000, Australia]",The Rocks,,4e3dd3ecd22d102e8547e4cc


Let's check Sydney's number of different type of venues.

In [43]:
sydeny_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
sydeny_ven.columns = ['Sydney']
sydeny_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=sydeny_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')

In [44]:
ven_types

Unnamed: 0_level_0,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Park,4.0,3.0,,
Café,4.0,1.0,1.0,
Pub,3.0,,,
Theater,3.0,2.0,,4.0
Cocktail Bar,2.0,,,
...,...,...,...,...
Peruvian Restaurant,,,,1.0
Brazilian Restaurant,,,,1.0
Flea Market,,,,1.0
Hostel,,,,1.0


Let's create a dataframe with the number of TIPS from Sydney.

In [45]:
cat_tips4 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips4.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips4.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [46]:
cat_tips4

Unnamed: 0,Tips,Category
0,11.0,Trail
1,84.0,Café
2,254.0,Cocktail Bar
3,46.0,Tour Provider
4,30.0,Café
5,111.0,Art Museum
6,3.0,Café


### Moscow, Russia

In [47]:
addr_Moscow = 'Red Square, Moscow, Russia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Moscow)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: 55.7536283 37.62137960067377
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [48]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=55.7536283,37.62137960067377&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [49]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [50]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [51]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.neighborhood,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.venuePage.id
0,e-0-4bb3345942959c74d79d212c-0,0,"[{'summary': 'This spot is popular', 'type': '...",4bb3345942959c74d79d212c,Red Square (Красная площадь),Красная пл.,55.753595,37.621031,"[{'label': 'display', 'lat': 55.753595, 'lng':...",22,...,Красная площадь,Москва,Москва,Россия,"[Красная пл., 109012, Москва, Россия]","[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",0,[],,
1,e-0-4bee5d152c082d7f2b5d3042-1,0,"[{'summary': 'This spot is popular', 'type': '...",4bee5d152c082d7f2b5d3042,St. Basil's Cathedral (Храм Василия Блаженного),Красная пл.,55.752524,37.62311,"[{'label': 'display', 'lat': 55.75252441045641...",163,...,,Москва,Москва,Россия,"[Красная пл. (пл. Васильевский Спуск), 109012,...","[{'id': '4bf58dd8d48988d132941735', 'name': 'C...",0,[],пл. Васильевский Спуск,
2,e-0-4e27dd77aeb75df8caa65347-2,0,"[{'summary': 'This spot is popular', 'type': '...",4e27dd77aeb75df8caa65347,Dior,"Красная пл., 3",55.754835,37.62082,"[{'label': 'display', 'lat': 55.75483478480466...",138,...,,Москва,Москва,Россия,"[Красная пл., 3, Москва, Россия]","[{'id': '4bf58dd8d48988d104951735', 'name': 'B...",0,[],,
3,e-0-4bfbb199565f76b04ccf05db-3,0,"[{'summary': 'This spot is popular', 'type': '...",4bfbb199565f76b04ccf05db,The Kremlin (Кремль),Красная пл.,55.751999,37.617734,"[{'label': 'display', 'lat': 55.751999, 'lng':...",291,...,,Москва,Москва,Россия,"[Красная пл., 101000, Москва, Россия]","[{'id': '4bf58dd8d48988d126941735', 'name': 'G...",0,[],,
4,e-0-4d4069bec5eaa1cd8a6fa150-4,0,"[{'summary': 'This spot is popular', 'type': '...",4d4069bec5eaa1cd8a6fa150,Nikolskaya Street (Никольская улица),Никольская ул.,55.757629,37.623115,"[{'label': 'display', 'lat': 55.75762907948795...",458,...,,Москва,Москва,Россия,"[Никольская ул., 109012, Москва, Россия]","[{'id': '52e81612bcbc57f1066b7a25', 'name': 'P...",0,[],,


In [52]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,crossStreet,id
0,Red Square (Красная площадь),Plaza,Красная пл.,55.753595,37.621031,"[{'label': 'display', 'lat': 55.753595, 'lng':...",22,109012.0,RU,Красная площадь,Москва,Москва,Россия,"[Красная пл., 109012, Москва, Россия]",,4bb3345942959c74d79d212c
1,St. Basil's Cathedral (Храм Василия Блаженного),Church,Красная пл.,55.752524,37.62311,"[{'label': 'display', 'lat': 55.75252441045641...",163,109012.0,RU,,Москва,Москва,Россия,"[Красная пл. (пл. Васильевский Спуск), 109012,...",пл. Васильевский Спуск,4bee5d152c082d7f2b5d3042
2,Dior,Boutique,"Красная пл., 3",55.754835,37.62082,"[{'label': 'display', 'lat': 55.75483478480466...",138,,RU,,Москва,Москва,Россия,"[Красная пл., 3, Москва, Россия]",,4e27dd77aeb75df8caa65347
3,The Kremlin (Кремль),Government Building,Красная пл.,55.751999,37.617734,"[{'label': 'display', 'lat': 55.751999, 'lng':...",291,101000.0,RU,,Москва,Москва,Россия,"[Красная пл., 101000, Москва, Россия]",,4bfbb199565f76b04ccf05db
4,Nikolskaya Street (Никольская улица),Pedestrian Plaza,Никольская ул.,55.757629,37.623115,"[{'label': 'display', 'lat': 55.75762907948795...",458,109012.0,RU,,Москва,Москва,Россия,"[Никольская ул., 109012, Москва, Россия]",,4d4069bec5eaa1cd8a6fa150


Let's check Moscow's number of different type of venues.

In [53]:
moscow_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
moscow_ven.columns = ['Moscow']
moscow_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=moscow_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')

In [55]:
ven_types.head()

Unnamed: 0_level_0,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Plaza,5.0,1.0,,,1.0
Hotel,4.0,2.0,1.0,2.0,
Coffee Shop,3.0,1.0,3.0,7.0,5.0
Yoga Studio,2.0,,1.0,,
Art Museum,2.0,1.0,3.0,,3.0


Let's create a dataframe with the number of TIPS from Moscow.

In [56]:
cat_tips5 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips5.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips5.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [57]:
cat_tips5

Unnamed: 0,Tips,Category
0,923.0,Plaza
1,132.0,Church
2,10.0,Boutique
3,225.0,Government Building
4,113.0,Pedestrian Plaza
5,248.0,Park
6,26.0,Jewelry Store


### Cape Town, S. Africa

In [58]:
addr_CapeTown = 'Cape Town City Centre, Cape Town, South Africa'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_CapeTown)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: -33.9224221 18.4263523
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [59]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=-33.9224221,18.4263523&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [60]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [61]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [62]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.location.neighborhood,venue.venuePage.id
0,e-0-4b6b08e2f964a5202aee2be3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b6b08e2f964a5202aee2be3,City Hall,Grand Parade,-33.925185,18.423783,"[{'label': 'display', 'lat': -33.9251846837303...",388,...,iKapa,Western Cape,iNingizimu Afrika,"[Grand Parade, iKapa, 8001, iNingizimu Afrika]","[{'id': '4bf58dd8d48988d129941735', 'name': 'C...",0,[],,,
1,e-0-4b5342ebf964a520b69427e3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4b5342ebf964a520b69427e3,Artscape Theatre,D F Malan St,-33.919537,18.429297,"[{'label': 'display', 'lat': -33.9195370489033...",420,...,iKapa,Western Cape,iNingizimu Afrika,"[D F Malan St (Hertzog Blvd), iKapa, 8000, iNi...","[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",0,[],Hertzog Blvd,,
2,e-0-4bcede8d29d4b7132923a9dc-2,0,"[{'summary': 'This spot is popular', 'type': '...",4bcede8d29d4b7132923a9dc,Fugard Theatre,Harrington St,-33.927179,18.424411,"[{'label': 'display', 'lat': -33.9271788077195...",559,...,iKapa,Western Cape,iNingizimu Afrika,"[Harrington St (Caledon St), iKapa, 8001, iNin...","[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",0,[],Caledon St,,
3,e-0-4fa0c477e4b0c33ec64258bf-3,0,"[{'summary': 'This spot is popular', 'type': '...",4fa0c477e4b0c33ec64258bf,Truth Coffee HQ,36 Buitenkant St,-33.928286,18.422795,"[{'label': 'display', 'lat': -33.9282856873653...",730,...,iKapa,Western Cape,iNingizimu Afrika,[36 Buitenkant St (between Barrack & Commercia...,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",0,[],between Barrack & Commercial,,
4,e-0-5766b8aecd100453bf08c5ab-4,0,"[{'summary': 'This spot is popular', 'type': '...",5766b8aecd100453bf08c5ab,Virgin Active Foreshore,,-33.917849,18.430692,"[{'label': 'display', 'lat': -33.9178492367140...",647,...,iKapa,Western Cape,iNingizimu Afrika,"[iKapa, iNingizimu Afrika]","[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",0,[],,,


In [63]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,City Hall,City Hall,Grand Parade,-33.925185,18.423783,"[{'label': 'display', 'lat': -33.9251846837303...",388,8001.0,ZA,iKapa,Western Cape,iNingizimu Afrika,"[Grand Parade, iKapa, 8001, iNingizimu Afrika]",,,4b6b08e2f964a5202aee2be3
1,Artscape Theatre,Theater,D F Malan St,-33.919537,18.429297,"[{'label': 'display', 'lat': -33.9195370489033...",420,8000.0,ZA,iKapa,Western Cape,iNingizimu Afrika,"[D F Malan St (Hertzog Blvd), iKapa, 8000, iNi...",Hertzog Blvd,,4b5342ebf964a520b69427e3
2,Fugard Theatre,Theater,Harrington St,-33.927179,18.424411,"[{'label': 'display', 'lat': -33.9271788077195...",559,8001.0,ZA,iKapa,Western Cape,iNingizimu Afrika,"[Harrington St (Caledon St), iKapa, 8001, iNin...",Caledon St,,4bcede8d29d4b7132923a9dc
3,Truth Coffee HQ,Café,36 Buitenkant St,-33.928286,18.422795,"[{'label': 'display', 'lat': -33.9282856873653...",730,8005.0,ZA,iKapa,Western Cape,iNingizimu Afrika,[36 Buitenkant St (between Barrack & Commercia...,between Barrack & Commercial,,4fa0c477e4b0c33ec64258bf
4,Virgin Active Foreshore,Gym / Fitness Center,,-33.917849,18.430692,"[{'label': 'display', 'lat': -33.9178492367140...",647,,ZA,iKapa,Western Cape,iNingizimu Afrika,"[iKapa, iNingizimu Afrika]",,,5766b8aecd100453bf08c5ab


Let's check Cape Town's number of different type of venues.

In [64]:
capetown_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
capetown_ven.columns = ['Cape Town']
capetown_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=capetown_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')
ven_types.head(50)

Unnamed: 0_level_0,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Coffee Shop,9.0,3.0,1.0,3.0,7.0,5.0
Hotel,4.0,4.0,2.0,1.0,2.0,
Bakery,3.0,,1.0,,5.0,
Theater,3.0,1.0,3.0,2.0,,4.0
Seafood Restaurant,2.0,,,,1.0,
Waterfront,2.0,,,,,
Café,2.0,,4.0,1.0,1.0,
Shopping Mall,2.0,,1.0,,,
Market,1.0,,,1.0,,
Bagel Shop,1.0,,,,,


Let's create a dataframe with the number of TIPS from Cape Town.

In [65]:
cat_tips6 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips6.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips6.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [66]:
cat_tips6

Unnamed: 0,Tips,Category
0,13.0,City Hall
1,18.0,Theater
2,13.0,Theater
3,204.0,Café
4,4.0,Gym / Fitness Center
5,19.0,Coffee Shop
6,25.0,Bar


### Paris, France

In [67]:
addr_Paris = '1st arrondissement of Paris, Paris, France'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Paris)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius from city's center, to determine downtown area, is:", radius, "meters")

Coordinates: 48.8646144 2.334396
Radius from city's center, to determine downtown area, is: 4500 meters


Define a URL

In [68]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=48.8646144,2.334396&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [69]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [70]:
items = results['response']['groups'][0]['items']

Process JSON and convert it to a clean dataframe

In [71]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.location.neighborhood,venue.venuePage.id
0,e-0-4ba8b650f964a520f5e839e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4ba8b650f964a520f5e839e3,Jardin du Palais Royal,Palais Royal,48.864941,2.337728,"[{'label': 'display', 'lat': 48.86494061245833...",246,...,Paris,Île-de-France,France,"[Palais Royal, 75001 Paris, France]","[{'id': '4bf58dd8d48988d15a941735', 'name': 'G...",0,[],,,
1,e-0-4adcda09f964a520ed3321e3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4adcda09f964a520ed3321e3,Palais Royal,Place du Palais Royal,48.863236,2.337127,"[{'label': 'display', 'lat': 48.86323576771446...",252,...,Paris,Île-de-France,France,"[Place du Palais Royal, 75001 Paris, France]","[{'id': '4deefb944765f83613cdba6e', 'name': 'H...",0,[],,,
2,e-0-4f6dabf5003944083fe0002e-2,0,"[{'summary': 'This spot is popular', 'type': '...",4f6dabf5003944083fe0002e,Vestige de la Forteresse du Louvre,Palais du Louvre,48.861577,2.333508,"[{'label': 'display', 'lat': 48.86157701632968...",344,...,Paris,Île-de-France,France,"[Palais du Louvre, 75001 Paris, France]","[{'id': '4deefb944765f83613cdba6e', 'name': 'H...",0,[],,,
3,e-0-4b071505f964a520dcf622e3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4b071505f964a520dcf622e3,Place du Palais Royal,Place du Palais Royal,48.862523,2.336688,"[{'label': 'display', 'lat': 48.86252338167934...",286,...,Paris,Île-de-France,France,"[Place du Palais Royal, 75001 Paris, France]","[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",0,[],,,
4,e-0-4adcda10f964a520af3521e3-4,0,"[{'summary': 'This spot is popular', 'type': '...",4adcda10f964a520af3521e3,Musée du Louvre,Rue de Rivoli,48.860847,2.33644,"[{'label': 'display', 'lat': 48.86084691113991...",445,...,Paris,Île-de-France,France,"[Rue de Rivoli (Place du Carrousel), 75001 Par...","[{'id': '4bf58dd8d48988d18f941735', 'name': 'A...",0,[],Place du Carrousel,Le Louvre,


In [72]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Jardin du Palais Royal,Garden,Palais Royal,48.864941,2.337728,"[{'label': 'display', 'lat': 48.86494061245833...",246,75001,FR,Paris,Île-de-France,France,"[Palais Royal, 75001 Paris, France]",,,4ba8b650f964a520f5e839e3
1,Palais Royal,Historic Site,Place du Palais Royal,48.863236,2.337127,"[{'label': 'display', 'lat': 48.86323576771446...",252,75001,FR,Paris,Île-de-France,France,"[Place du Palais Royal, 75001 Paris, France]",,,4adcda09f964a520ed3321e3
2,Vestige de la Forteresse du Louvre,Historic Site,Palais du Louvre,48.861577,2.333508,"[{'label': 'display', 'lat': 48.86157701632968...",344,75001,FR,Paris,Île-de-France,France,"[Palais du Louvre, 75001 Paris, France]",,,4f6dabf5003944083fe0002e
3,Place du Palais Royal,Plaza,Place du Palais Royal,48.862523,2.336688,"[{'label': 'display', 'lat': 48.86252338167934...",286,75001,FR,Paris,Île-de-France,France,"[Place du Palais Royal, 75001 Paris, France]",,,4b071505f964a520dcf622e3
4,Musée du Louvre,Art Museum,Rue de Rivoli,48.860847,2.33644,"[{'label': 'display', 'lat': 48.86084691113991...",445,75001,FR,Paris,Île-de-France,France,"[Rue de Rivoli (Place du Carrousel), 75001 Par...",Place du Carrousel,Le Louvre,4adcda10f964a520af3521e3


Let's check Paris' number of different type of venues.

In [73]:
paris_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
paris_ven.columns = ['Paris']
paris_ven.index.name = 'Venue Type'

# Add them to the ven_types dataframe
ven_types = pd.merge(left=paris_ven, right=ven_types, how='outer', left_on='Venue Type', right_on='Venue Type')
ven_types.head(50)

Unnamed: 0_level_0,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Plaza,8.0,,5.0,1.0,,,1.0
Hotel,6.0,4.0,4.0,2.0,1.0,2.0,
Historic Site,3.0,,,,,,4.0
Garden,2.0,,,1.0,1.0,,1.0
Art Museum,2.0,,2.0,1.0,3.0,,3.0
Bookstore,2.0,1.0,2.0,1.0,,1.0,5.0
Bridge,1.0,,,1.0,,,
Corsican Restaurant,1.0,,,,,,
Ice Cream Shop,1.0,1.0,1.0,1.0,1.0,,
Lounge,1.0,,,,,1.0,


Let's create a dataframe with the number of TIPS from Cape Town.

In [74]:
cat_tips7 = pd.DataFrame()

for row in dataframe_filtered.index.values.tolist()[0:7]: # Limit to 7 venues because venue tips is a premium endpoint
    venue_id = dataframe_filtered.loc[row, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    
    # Send GET request for result
    result = requests.get(url).json()
    
    # Get the number of TIPS
    cat_tips7.loc[row, 'Tips'] = result['response']['venue']['tips']['count']
    cat_tips7.loc[row, 'Category'] = venue_id = dataframe_filtered.loc[row, 'categories']

In [75]:
cat_tips7

Unnamed: 0,Tips,Category
0,97.0,Garden
1,31.0,Historic Site
2,3.0,Historic Site
3,10.0,Plaza
4,2268.0,Art Museum
5,15.0,Bookstore
6,27.0,Historic Site


### Athens, Greece

In [76]:
addr_Athens = 'Ermou street, Athens, Greece'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(addr_Athens)
latitude = location.latitude
longitude = location.longitude
print("Coordinates:", latitude, longitude)
print("Radius, from city's center to determine downtown area, is:", radius, "meters")

Coordinates: 37.9764533 23.7281421
Radius, from city's center to determine downtown area, is: 4500 meters


Define a URL

In [77]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OKAJYH2AGQSQZB1NTE1YFEODZ1TGV3UN3N4XQGB5WQRXFFQV&client_secret=XRGKGGB4XWMJU0DRE2VPO2LJMQGTFV40NPL22XDJLTFBOFPG&ll=37.9764533,23.7281421&v=20200702&radius=4500&limit=50'

Send GET request and examine results

In [78]:
results = requests.get(url).json()
'There are {} venues in a radius of {} m from the downtown`s center. '.format(len(results['response']['groups'][0]['items']), radius)

'There are 50 venues in a radius of 4500 m from the downtown`s center. '

Get relevant part of JSON

In [79]:
items = results['response']['groups'][0]['items']

In [80]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head(5)

  """Entry point for launching an IPython kernel.


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.neighborhood,venue.venuePage.id
0,e-0-5335a5fd498e6694ffbf5d0e-0,0,"[{'summary': 'This spot is popular', 'type': '...",5335a5fd498e6694ffbf5d0e,Feyrouz,Καρόρη 23,Αγάθωνος,37.978112,23.727798,"[{'label': 'display', 'lat': 37.97811208203773...",...,GR,Αθήνα,Αττική,Ελλάδα,"[Καρόρη 23 (Αγάθωνος), 105 51 Αθήνα, Αττική, Ε...","[{'id': '4bf58dd8d48988d115941735', 'name': 'M...",0,[],,
1,e-0-539eaa70498ee5525acfe39b-1,0,"[{'summary': 'This spot is popular', 'type': '...",539eaa70498ee5525acfe39b,Πριγκιπώ,Κολοκοτρώνη 34,Ρόμβης,37.977653,23.730065,"[{'label': 'display', 'lat': 37.97765306946321...",...,GR,Αθήνα,Αττική,Ελλάδα,"[Κολοκοτρώνη 34 (Ρόμβης), 105 62 Αττική, Αττικ...","[{'id': '4bf58dd8d48988d111951735', 'name': 'J...",0,[],,
2,e-0-5a7b74fca8eb6032a443d0dd-2,0,"[{'summary': 'This spot is popular', 'type': '...",5a7b74fca8eb6032a443d0dd,Smak,Ρόμβης 21,,37.97742,23.730072,"[{'label': 'display', 'lat': 37.97742029621175...",...,GR,Αθήνα,Αττική,Ελλάδα,"[Ρόμβης 21, 105 60 Αθήνα, Αττική, Ελλάδα]","[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",0,[],Monastiraki,478074669.0
3,e-0-51a0b3d7498e40837d593d93-3,0,"[{'summary': 'This spot is popular', 'type': '...",51a0b3d7498e40837d593d93,Falafellas,Αιόλου 51,,37.978444,23.728052,"[{'label': 'display', 'lat': 37.97844416706925...",...,GR,Αθήνα,Αττική,Ελλάδα,"[Αιόλου 51, 105 51 Αθήνα, Αττική, Ελλάδα]","[{'id': '4bf58dd8d48988d10b941735', 'name': 'F...",0,[],Ψυρρή,
4,e-0-562fe4dd498ec23804102768-4,0,"[{'summary': 'This spot is popular', 'type': '...",562fe4dd498ec23804102768,Kuko's The Bar,Καλαμιώτου 4,Καλαμιώτου,37.976678,23.728876,"[{'label': 'display', 'lat': 37.97667793391077...",...,GR,Αθήνα,Αττική,Ελλάδα,"[Καλαμιώτου 4 (Καλαμιώτου), 105 63 Αθήνα, Αττι...","[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",0,[],Κεντρο,355622453.0


In [81]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(5)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Feyrouz,Middle Eastern Restaurant,Καρόρη 23,Αγάθωνος,37.978112,23.727798,"[{'label': 'display', 'lat': 37.97811208203773...",187,105 51,GR,Αθήνα,Αττική,Ελλάδα,"[Καρόρη 23 (Αγάθωνος), 105 51 Αθήνα, Αττική, Ε...",,5335a5fd498e6694ffbf5d0e
1,Πριγκιπώ,Jewelry Store,Κολοκοτρώνη 34,Ρόμβης,37.977653,23.730065,"[{'label': 'display', 'lat': 37.97765306946321...",215,105 62,GR,Αθήνα,Αττική,Ελλάδα,"[Κολοκοτρώνη 34 (Ρόμβης), 105 62 Αττική, Αττικ...",,539eaa70498ee5525acfe39b
2,Smak,Fast Food Restaurant,Ρόμβης 21,,37.97742,23.730072,"[{'label': 'display', 'lat': 37.97742029621175...",200,105 60,GR,Αθήνα,Αττική,Ελλάδα,"[Ρόμβης 21, 105 60 Αθήνα, Αττική, Ελλάδα]",Monastiraki,5a7b74fca8eb6032a443d0dd
3,Falafellas,Falafel Restaurant,Αιόλου 51,,37.978444,23.728052,"[{'label': 'display', 'lat': 37.97844416706925...",221,105 51,GR,Αθήνα,Αττική,Ελλάδα,"[Αιόλου 51, 105 51 Αθήνα, Αττική, Ελλάδα]",Ψυρρή,51a0b3d7498e40837d593d93
4,Kuko's The Bar,Bar,Καλαμιώτου 4,Καλαμιώτου,37.976678,23.728876,"[{'label': 'display', 'lat': 37.97667793391077...",69,105 63,GR,Αθήνα,Αττική,Ελλάδα,"[Καλαμιώτου 4 (Καλαμιώτου), 105 63 Αθήνα, Αττι...",Κεντρο,562fe4dd498ec23804102768


**Let's check Athens' number of different type of venues and add them into a dataframe** 

In [82]:
athens_ven = pd.DataFrame(dataframe_filtered['categories'].value_counts())
athens_ven.columns = ['Athens']
athens_ven.index.name = 'Venue Type'

* At the moment, we have gathered all types of venues at a radius of ~4.5 km around city centers. 
* Each city's data is limited to 50 venues. 
* We will keep categories that appear in 2 out of 5 cities, at least. 'Cleaning' our data in that way to avoid local outliers.  

In [83]:
athens_ven

Unnamed: 0_level_0,Athens
Venue Type,Unnamed: 1_level_1
Bar,7
Café,6
Historic Site,4
Coffee Shop,4
Cocktail Bar,4
Dessert Shop,3
Pizza Place,2
Wine Bar,2
Falafel Restaurant,2
Whisky Bar,1


Concatenate the category tips from cat_tips dataframes (1~7) to one dataframe. Then, we will group the categories with the average number of TIPS per category.

In [84]:
category_tips = pd.concat([cat_tips1, cat_tips2 , cat_tips3, cat_tips4 , cat_tips5, cat_tips6, cat_tips7], ignore_index=True)

In [86]:
category_tips.columns = ['Tips', 'Venue Type']

In [88]:
category_tips.head()

Unnamed: 0,Tips,Venue Type
0,36.0,Burger Joint
1,21.0,French Restaurant
2,52.0,Coffee Shop
3,352.0,Coffee Shop
4,534.0,Music Venue


In [89]:
category_tips = category_tips.groupby(['Venue Type']).mean()

In [90]:
category_tips.sort_values(['Tips'], ascending=False, inplace=True)

In [91]:
category_tips

Unnamed: 0_level_0,Tips
Venue Type,Unnamed: 1_level_1
Art Museum,1189.5
Music Venue,534.0
Plaza,466.5
Cocktail Bar,254.0
Park,248.0
Government Building,225.0
Church,132.0
Garden,126.0
Pedestrian Plaza,113.0
Coffee Shop,86.4


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting venue types that our not so famous in Athens while they are famous in the majority of the other 7 cities. We will limit our analysis to area ~4.5 km around city center and to 50 venues per city center. Regarding tips number we will limit our analysis to 7 due to API's endpoint restrictions (limit at 50 premium calls per day).

In first step we have collected the required data: location and type (category) of every venue within 4.5 km from city center for every city. We have also collected tips number for the first 10 venues per city.

Second step in our analysis will be calculation and exploration of most common venue types. Then we will normalize the data according to min-max normalization.

Finally, we will apply a simple **Multiple-Criteria Decision Analysis MCDA** to conclude, according to our data, to the suggested type of venue that will have increased odds of success in the downtown of Athens.

## Analysis <a name="analysis"></a>

In this basic explanatory data analysis we will derive some useful info from our data. 

Let's drop the categories that appear only in one city center (outliers).

In [92]:
missing_types = ven_types.isnull()
ven_types_temp = ven_types.copy(deep=True)
for row in missing_types.index.values.tolist():
    if missing_types.loc[row, :].value_counts()[0] < 2:
        ven_types_temp.drop(row, axis=0, inplace=True)
ven_types_temp.head(15)        

Unnamed: 0_level_0,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Plaza,8.0,,5.0,1.0,,,1.0
Hotel,6.0,4.0,4.0,2.0,1.0,2.0,
Historic Site,3.0,,,,,,4.0
Garden,2.0,,,1.0,1.0,,1.0
Art Museum,2.0,,2.0,1.0,3.0,,3.0
Bookstore,2.0,1.0,2.0,1.0,,1.0,5.0
Bridge,1.0,,,1.0,,,
Ice Cream Shop,1.0,1.0,1.0,1.0,1.0,,
Lounge,1.0,,,,,1.0,
Seafood Restaurant,1.0,2.0,,,,1.0,


We will merge the results from Athens with the main dataframe (df).

In [93]:
df = pd.DataFrame()
df = pd.merge(left=athens_ven, right=ven_types_temp, how='outer', left_on='Venue Type', right_on='Venue Type')

In [96]:
df.head(50)

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bar,7.0,,,,,,,
Café,6.0,,2.0,,4.0,1.0,1.0,
Historic Site,4.0,3.0,,,,,,4.0
Coffee Shop,4.0,1.0,9.0,3.0,1.0,3.0,7.0,5.0
Cocktail Bar,4.0,,1.0,,2.0,,,
Dessert Shop,3.0,,,,,,,
Pizza Place,2.0,,,1.0,1.0,3.0,,
Wine Bar,2.0,1.0,,,,2.0,,
Falafel Restaurant,2.0,,,,,,,
Whisky Bar,1.0,,,,,,,


Let's concatenate similar venue types to simplify our dataframe.

In [97]:
# Sum venue types that contain words café and coffee to a venue type called Café
df.loc['Café', :] = df[df.index.str.contains('Café') | df.index.str.contains('Coffee')].sum(axis=0)

# Drop the row Coffee Shop, it is unnecessary
df.drop('Coffee Shop', axis=0, inplace=True)

In [98]:
# Sum venue types that contain words gym to a venue type called Gym
df.loc['Gym', :] = df[df.index.str.contains('Gym') | df.index.str.contains('Fitness')].sum(axis=0)

# Drop the row Coffee Shop, it is unnecessary
df.drop('Gym / Fitness Center', axis=0, inplace=True)

In [101]:
df.head()

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bar,7.0,,,,,,,
Café,10.0,1.0,11.0,3.0,5.0,4.0,8.0,5.0
Historic Site,4.0,3.0,,,,,,4.0
Cocktail Bar,4.0,,1.0,,2.0,,,
Dessert Shop,3.0,,,,,,,


**We will add a column that will show the most common venue type among cities and a column with the sum of veues from every category.**

In [102]:
df[(df.isnull() == True)] = 0
df['Most Common'] = 0 # Initialize the column
for row in df.index.values.tolist():
    df.loc[row, 'Most Common'] = (df.loc[row ,:] != 0).value_counts()[1]

In [104]:
df.head(50)

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio,Most Common
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bar,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
Café,10.0,1.0,11.0,3.0,5.0,4.0,8.0,5.0,8
Historic Site,4.0,3.0,0.0,0.0,0.0,0.0,0.0,4.0,3
Cocktail Bar,4.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,3
Dessert Shop,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
Pizza Place,2.0,0.0,0.0,1.0,1.0,3.0,0.0,0.0,4
Wine Bar,2.0,1.0,0.0,0.0,0.0,2.0,0.0,0.0,3
Falafel Restaurant,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
Whisky Bar,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
Jewelry Store,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


We will add a column that will sum venues for every venue type.

In [105]:
df['Sum'] = df['Cape Town'] + df['Moscow'] + df['Sydney'] + df['San Francisco'] + df['Seoul']

**Sort the dataframe in descending order of the Most Common (venue type) and Sum. In that way, we will see the most popular venue types.**

In [106]:
df.sort_values(['Most Common', 'Sum'], ascending=False, inplace=True)
df.head(15)

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio,Most Common,Sum
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Café,10.0,1.0,11.0,3.0,5.0,4.0,8.0,5.0,8,31.0
Bookstore,1.0,2.0,1.0,2.0,1.0,0.0,1.0,5.0,7,5.0
Hotel,0.0,6.0,4.0,4.0,2.0,1.0,2.0,0.0,6,13.0
Theater,0.0,1.0,3.0,1.0,3.0,2.0,0.0,4.0,6,9.0
Ice Cream Shop,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,6,4.0
Plaza,1.0,8.0,0.0,5.0,1.0,0.0,0.0,1.0,5,6.0
Art Museum,0.0,2.0,0.0,2.0,1.0,3.0,0.0,3.0,5,6.0
Concert Hall,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,5,3.0
Park,0.0,0.0,1.0,1.0,4.0,3.0,0.0,0.0,4,9.0
Pizza Place,2.0,0.0,0.0,1.0,1.0,3.0,0.0,0.0,4,5.0


**We will merge the results from Category Tips too.**

In [108]:
df = pd.merge(left=df, right=category_tips, how='outer', left_on='Venue Type', right_on='Venue Type')

**We will clean the dataframe from venue types that exist only in 3 ou of 8 cities and less.**

In [109]:
df = df.drop(df[(df['Most Common'] < 3)].index.values.tolist(), axis=0)

In [110]:
df = df.dropna()
df

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio,Most Common,Sum,Tips
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Café,10.0,1.0,11.0,3.0,5.0,4.0,8.0,5.0,8.0,31.0,80.25
Bookstore,1.0,2.0,1.0,2.0,1.0,0.0,1.0,5.0,7.0,5.0,27.666667
Theater,0.0,1.0,3.0,1.0,3.0,2.0,0.0,4.0,6.0,9.0,15.5
Plaza,1.0,8.0,0.0,5.0,1.0,0.0,0.0,1.0,5.0,6.0,466.5
Art Museum,0.0,2.0,0.0,2.0,1.0,3.0,0.0,3.0,5.0,6.0,1189.5
Park,0.0,0.0,1.0,1.0,4.0,3.0,0.0,0.0,4.0,9.0,248.0
Gym,0.0,0.0,2.0,0.0,0.0,1.0,2.0,1.0,4.0,5.0,41.0
Garden,0.0,2.0,0.0,0.0,1.0,1.0,0.0,1.0,4.0,2.0,126.0
Church,0.0,1.0,0.0,1.0,0.0,1.0,0.0,3.0,4.0,2.0,132.0
Cocktail Bar,4.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,3.0,3.0,254.0


Similarly to **Content-Based Recommendation Systems** will be applied the same techniques to recommend a venue type with the highest prospects of success according to our data from foursquare API.

We're going to use a simple **MCDA**. Firstly, we will assign weights for every feature (cities, most common, tips and sum). Then will normalize the values of the dataframe. Finally, we will multiply weights with the datadrame's normalized values and then summing up the resulting table by column. This operation is actually a dot product between a matrix and a vector, so we can simply accomplish by calling Pandas's "dot" function.

Ranks sum to 1:

* assign a negative rank for **Athens** because we need to increase our chances to open an uncommon venue.
* assign a big rank to **Most Common**, in that way will advance a venue type that is widespread in the 7 cities and together with the negative rank of Athens will suggest venue types common in 7 cities except Athens.
* assign a big rank to **Tips** based on the principle that the higher the number of tips, which indicates that people are interested in the venue and would like to share their experience with all other users, the higher the popularity.
* split the other ranks to the **remaining cities and Sum**, we do not consider them so important.

In [132]:
weights = np.array([-1.0, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.85, 0.05, 0.75]) 

In [133]:
weights

array([-1.  ,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.85,
        0.05,  0.75])

### We apply *Min-Max normalization* which is one of the most common ways to normalize data. 
* for every feature, the minimum value of that feature gets transformed into a 0
* the maximum value gets transformed into a 1 
* and every other value gets transformed into a decimal between 0 and 1.

In [134]:
from sklearn import preprocessing

x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)

In [135]:
df = pd.DataFrame(x_scaled, index=df.index, columns=df.columns)
df

Unnamed: 0_level_0,Athens,Paris,Cape Town,Moscow,Sydney,San Francisco,Seoul,Rio,Most Common,Sum,Tips
Venue Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Café,1.0,0.125,1.0,0.6,1.0,1.0,1.0,1.0,1.0,1.0,0.055153
Bookstore,0.1,0.25,0.090909,0.4,0.2,0.0,0.125,1.0,0.8,0.16129,0.010363
Theater,0.0,0.125,0.272727,0.2,0.6,0.5,0.0,0.8,0.6,0.290323,0.0
Plaza,0.1,1.0,0.0,1.0,0.2,0.0,0.0,0.2,0.4,0.193548,0.384157
Art Museum,0.0,0.25,0.0,0.4,0.2,0.75,0.0,0.6,0.4,0.193548,1.0
Park,0.0,0.0,0.090909,0.2,0.8,0.75,0.0,0.0,0.2,0.290323,0.198041
Gym,0.0,0.0,0.181818,0.0,0.0,0.25,0.25,0.2,0.2,0.16129,0.021721
Garden,0.0,0.25,0.0,0.0,0.2,0.25,0.0,0.2,0.2,0.064516,0.094123
Church,0.0,0.125,0.0,0.2,0.0,0.25,0.0,0.6,0.2,0.064516,0.099233
Cocktail Bar,0.4,0.0,0.090909,0.0,0.4,0.0,0.0,0.0,0.0,0.096774,0.203152


### We will use *dot function* to multiply weights vector with our dataframe. And then we will sort the dataframe according to the results produced by the dot operation. 

In [136]:
results = pd.DataFrame(df.dot(weights), columns = ['Scores'])
results.sort_values(['Scores'], ascending=False, inplace=True)

Finally, let's get the results from our analysis.

In [137]:
results

Unnamed: 0_level_0,Scores
Venue Type,Unnamed: 1_level_1
Art Museum,1.209677
Bookstore,0.699133
Plaza,0.657795
Theater,0.649402
Park,0.425092
Church,0.306401
Garden,0.288818
Gym,0.238446
Café,0.227615
Pedestrian Plaza,0.101763


To conclude our analysis we will choose ranks hihger than 50 %.

In [138]:
print("Suggested venue types for downtown Athens are the following: ")
results[(results["Scores"] > 0.5)]

Suggested venue types for downtown Athens are the following: 


Unnamed: 0_level_0,Scores
Venue Type,Unnamed: 1_level_1
Art Museum,1.209677
Bookstore,0.699133
Plaza,0.657795
Theater,0.649402


## Results and Discussion <a name="results"></a>

Project's analysis shows that although there is a broad diversity of venues in Athens, there is a potential for venue types that are not so widespread in the center of Athens. The Top-four categories which score more than 0.5 are Art Museum, Bookstore, Plaza, Theater. **Art Museum outstands with a double score from the second category.**

After considering Athens venue categories the number of tips that each venue has received was calculated. This assumption based on the principle that the higher the number of tips, the more famous the venue. 

Finally, weights were set for the features (cities, number of different cities that a venue presents, number of tips) to calculate the final score for every category. For the final recommendation, the top four results proposed. It is worth mentioning that those categories are not optimal. The purpose of this analysis was to suggest a worldwide commonly known and successful venue that does not exist in Athens. Further examination of data's integrity required for more accurate results (ex. algorithm shows that there are zero hotels in the downtown of Athens, which is wrong).

## Conclusion <a name="conclusion"></a>

The scope of this project was to recommend a global venue type commonly spread in many of the datasets cities and unknown in Athens. This recommendation based on the principle that a venue type that is famous in the majority of the dataset's cities will be respectively desirable, recognizable, and profitable in Athens too. For the final decision correctness of the principle and integrity of the data must be carefully examined.