In [123]:
# imports
import pandas as pd
import numpy as np
import os # use this to access your environment variables
import requests # this will be used to call the APIs
from unidecode import unidecode # change foreign accented letters etc

# Foursquare

Send a request to Foursquare with a small radius (1000m) for all the bike stations in your city of choice. 

In [180]:
FOURSQUARE_KEY = os.getenv('FOURSQUARE_API_KEY')
YELP_KEY = os.getenv('YELP_API_KEY')
print(FOURSQUARE_KEY)
print(YELP_KEY)
# confirm both API keys have been imported successfully from Terminal

fsq3F074rL3S6i0kGeOeRWB1KhL1WLyqB4dyTOG72DKFm4A=
Crgar6c-4CFZps62GfA4OGaN1ClkE5OfzQoHOVBrC3njmSxaimHL02E2srHd7-WnFVSrXuD4xQpXuzn6KwIS9MwTqT67Xxwm4ddGwHES6kS7aTEWrutBhqA4KHSuZnYx


In [62]:
def get_venues_fs(latitude, longitude, radius, api_key, categories, limit):
    """
    Get amenities and POIs from Foursquare API call
    Args:
        latitude (float): latitude for query (must be combined with longitude)
        longitude (float): longitude for query (must be combined with latitude)
        api_key (str): foursquare API to use for query (imported in line above)
        categories (str) : Foursquare-recognized place type. If not passed no place_type will be specified. Separate ids with commas
    
    Returns:
        response: response object from the requests library.
    """
    url = "https://api.foursquare.com/v3/places/search"
    
    headers = {
        "Accept": "application/json",
        "Authorization": api_key
    }
    
    params = {
        "ll": f"{latitude},{longitude}",
        "radius": radius,
        "categories": categories,
        "limit": limit
    }
    
    response = requests.get(url, headers=headers, params=params)
    
    if response.status_code == 200: # 200 is success
        return response.json()
    else:
        response.raise_for_status()
        

**Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)**

In [125]:
# testing
# latitudes to test: original 43.664467, new 43.601162, other 43.657763
# longitudes to test: original -79.414783, new -79.504160, other -79.389165

categories = '10035,13003,13065,16000' # See following cell bullet points for category names - bars, restaurants, live shows, outdoors

res = get_venues_fs(latitude=43.657763, longitude=-79.389165, radius=800, api_key=FOURSQUARE_KEY, categories=categories, limit=50)

print(len(res['results']))
print("\n")
print(res['results'][3]) # main body of a specific venue result
print("\n")

print(res['results'][3]['name'])  # this is how you access the name, we could loop through to get names

print(res['results'][3]['location']['address']) # accessing the address for the dataframe

print(res['results'][3]['categories'][0]['name']) # accessing the first general category of the venue, most only have one
print("\n")

for venue in res['results']:
    name = venue['name']
    address = venue['location'].get('address', 'Address not available')
    first_category = venue['categories'][0]['name']
    category_names = [category['name'] for category in venue['categories']] # there are multiple categories for some venues
    print(f"{name}, {address}, {first_category}, {category_names}")

50


{'fsq_id': '557231e3498e540f05f3083c', 'categories': [{'id': 13032, 'name': 'Cafe, Coffee, and Tea House', 'short_name': 'Cafe, Coffee, and Tea House', 'plural_name': 'Cafes, Coffee, and Tea Houses', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_', 'suffix': '.png'}}, {'id': 13043, 'name': 'Donut Shop', 'short_name': 'Donuts', 'plural_name': 'Donut Shops', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/donuts_', 'suffix': '.png'}}, {'id': 13065, 'name': 'Restaurant', 'short_name': 'Restaurant', 'plural_name': 'Restaurants', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_', 'suffix': '.png'}}], 'chains': [{'id': 'd5719cc0-d890-0132-61d3-7a163eb2a6fc', 'name': 'Tim Hortons'}], 'closed_bucket': 'VeryLikelyOpen', 'distance': 326, 'geocodes': {'main': {'latitude': 43.658539, 'longitude': -79.385251}, 'roof': {'latitude': 43.658539, 'longitude': -79.385251}}, 'link': '/v3/places/557231e3498e540f05f3083c', 'location':

**Notes on the test API Call and justification of parameters:**
- Downtown Toronto is very densely packed with bike stations and the amenities I will be searching for. The default radius is 5km which is too large to capture the relationship between bike/dock availability and amenities. The appropriate range would be either **1km or even 800m**
- Foursquare Category IDs of interest are: 10035 (Performing Arts Venue), 12013 (College and University amenities), 13003 (Bars), 13065 (Restaurants)
- Other candidates: 16000 (landmarks & outdoors), 17057 (Food & Beverage Retail), 18000 (Sports & Recreaction)

**Notes on the information to return in dataframe**
- Name of Venue
- Address
- Venue Categories - I am spot checking a few different bike stations because I want to capture different terms in more general categories e.g. **bars**, **restaurants**
- The latitude and longitude used to call the results belongs to the bike dock

The unified dataframe is about the bike docks/stations and this data will return number of bars, number of restaurants, performing arts venues, outdoor venues

**Below: Testing with one response how we can clean up features / desired information for the real df**

**Put your parsed results into a DataFrame**

In [91]:
search_categories = '10035,13003,13065,16000' # See following cell bullet points for category names - bars, restaurants, live shows, outdoor parks

res = get_venues_fs(latitude=43.664467, longitude=-79.414783, radius=800, api_key=FOURSQUARE_KEY, categories=search_categories, limit=50)

name_list = []
address_list = []
category_list = []

for venue in res['results']:
    name_list.append(venue['name']) 
    address_list.append(venue['location'].get('address', 'Address not available')) 
    category_list.append(venue['categories'][0]['name']) 


test_venue_dictionary = {
    'name': name_list,
    'address': address_list,
    'category': category_list
}

test_venue_df = pd.DataFrame(test_venue_dictionary)
test_venue_df
# csv_file_path = '../data/test_venue_df.csv' # the file path to put new saves into the data directory
# test_venue_df.to_csv(csv_file_path, index=False)

Unnamed: 0,name,address,category
0,Ninetails Coffee Bar,651 Bloor St W,Coffee Shop
1,Snakes & Lattes,600 Bloor St W,Café
2,Rustle & Still Café,605 Bloor St W,Café
3,Korean Village Restaurant,628 Bloor St W,BBQ Joint
4,Buk Chang Dong Soon Tofu,691 Bloor St W,Korean Restaurant
5,Insomnia Restaurant and Lounge,563 Bloor St W,Restaurant
6,Basecamp Climbing,677 Bloor St W,Rock Climbing Spot
7,Christie Pits Park,750 Bloor St W,Park
8,Sam James Coffee Bar,297 Harbord St,Café
9,Banjara Indian Cuisine,796 Bloor St W,Indian Restaurant


I have saved a sample request in csv format for easy access without having to make another API call.
How I want to manipulate the data:
- This table represents the results of one such bike station so we will have 820 of these calls.
- For each one i'd like to record the number of cafes, number of bar/restaurants/pub, number of park/playground and number of live venues
- This is ultimately one data point for each search category for each of these returned dataframes e.g. this one above
- For this I'm going to need to catch some general terms e.g. Pub/bar/restaurant (whether Irish pub, greek restaurant, BBQ Joint) via Regex

In [150]:
data_test = pd.read_csv('../data/test_venue_df.csv')
data_test.head(10)

Unnamed: 0,name,address,category
0,Ninetails Coffee Bar,651 Bloor St W,Coffee Shop
1,Snakes & Lattes,600 Bloor St W,Café
2,Rustle & Still Café,605 Bloor St W,Café
3,Korean Village Restaurant,628 Bloor St W,BBQ Joint
4,Buk Chang Dong Soon Tofu,691 Bloor St W,Korean Restaurant
5,Insomnia Restaurant and Lounge,563 Bloor St W,Restaurant
6,Basecamp Climbing,677 Bloor St W,Rock Climbing Spot
7,Christie Pits Park,750 Bloor St W,Park
8,Sam James Coffee Bar,297 Harbord St,Café
9,Banjara Indian Cuisine,796 Bloor St W,Indian Restaurant


In [152]:
# need to clean up the category to lowercase and remove accents
data_test['category'] = data_test['category'].str.lower().apply(unidecode)
data_test.head(10)

Unnamed: 0,name,address,category
0,Ninetails Coffee Bar,651 Bloor St W,coffee shop
1,Snakes & Lattes,600 Bloor St W,cafe
2,Rustle & Still Café,605 Bloor St W,cafe
3,Korean Village Restaurant,628 Bloor St W,bbq joint
4,Buk Chang Dong Soon Tofu,691 Bloor St W,korean restaurant
5,Insomnia Restaurant and Lounge,563 Bloor St W,restaurant
6,Basecamp Climbing,677 Bloor St W,rock climbing spot
7,Christie Pits Park,750 Bloor St W,park
8,Sam James Coffee Bar,297 Harbord St,cafe
9,Banjara Indian Cuisine,796 Bloor St W,indian restaurant


In [162]:
# Define regex patterns - to search within the category column. These are loose 'contain' matches
bar_restaurant_pattern = r'\b(bar|restaurant|lounge|bbq|pub|grill|burger|chicken|diner|pizzeria|tavern|night club|nightclub)\b'
cafe_pattern = r'\b(cafe|coffee|coffee shop|tea|bakery|donut|deli)\b'
park_pattern = r'\b(park|playground|monument|plaza)\b'
live_venue_pattern = r'\b(concert|music venue|music|comedy|live|theater)\b'

In [156]:
# Apply regex patterns to create new columns
data_test['bar/restaurant'] = data_test['category'].str.contains(bar_restaurant_pattern, case=False, regex=True).astype(int)
data_test['cafe'] = data_test['category'].str.contains(cafe_pattern, case=False, regex=True).astype(int)
data_test['park'] = data_test['category'].str.contains(park_pattern, case=False, regex=True).astype(int)
data_test['live_venue'] = data_test['category'].str.contains(live_venue_pattern, case=False, regex=True).astype(int)
data_test

  data_test['bar/restaurant'] = data_test['category'].str.contains(bar_restaurant_pattern, case=False, regex=True).astype(int)
  data_test['cafe'] = data_test['category'].str.contains(cafe_pattern, case=False, regex=True).astype(int)
  data_test['park'] = data_test['category'].str.contains(park_pattern, case=False, regex=True).astype(int)
  data_test['live_venue'] = data_test['category'].str.contains(live_venue_pattern, case=False, regex=True).astype(int)


Unnamed: 0,name,address,category,bar/restaurant,cafe,park,live_venue
0,Ninetails Coffee Bar,651 Bloor St W,coffee shop,0,1,0,0
1,Snakes & Lattes,600 Bloor St W,cafe,0,1,0,0
2,Rustle & Still Café,605 Bloor St W,cafe,0,1,0,0
3,Korean Village Restaurant,628 Bloor St W,bbq joint,1,0,0,0
4,Buk Chang Dong Soon Tofu,691 Bloor St W,korean restaurant,1,0,0,0
5,Insomnia Restaurant and Lounge,563 Bloor St W,restaurant,1,0,0,0
6,Basecamp Climbing,677 Bloor St W,rock climbing spot,0,0,0,0
7,Christie Pits Park,750 Bloor St W,park,0,0,1,0
8,Sam James Coffee Bar,297 Harbord St,cafe,0,1,0,0
9,Banjara Indian Cuisine,796 Bloor St W,indian restaurant,1,0,0,0


In [164]:
data_test['bar/restaurant'].sum()
data_test['park'].sum()

8

In [166]:
# csv_file_path = '../data/test_venue_df_cleaned.csv' # the file path to put this cleaned table as a sample into the repository
# data_test.to_csv(csv_file_path, index=False)

# Yelp

**Send a request to Yelp with a small radius (1000m) for all the bike stations in your city of choice.**

My notes on Yelp Fusion API (https://docs.developer.yelp.com/docs/fusion-intro):
- They have a number of different types of search including search for events, search by business ID, others... see https://docs.developer.yelp.com/docs/resources-categories
- To compare closest with Foursquare we will use the standard businesses search https://docs.developer.yelp.com/reference/v3_business_search
- The starter API package offers review count and average review star rating
- First 300 calls are free and it's 7.99 for 1000 API calls
- We would need to make at least 820 api calls for each of the bike stations

In [198]:
def get_venues_yelp(latitude, longitude, radius, yelp_key, categories, limit):
    """
    Get amenities and POIs from Yelp API call
    Args:
        latitude (float): latitude for query (must be combined with longitude)
        longitude (float): longitude for query (must be combined with latitude)
        yelp_key (str): yelp API to use for query (see imports in first cell)
        categories (str) : Place type, with Yelp API they are descriptive eg. "bar,restaurant" unlike Foursquare's 5-digit codes. Separate ids with commas
    Returns:
        response: response object from the requests library.
    """
    url = "https://api.yelp.com/v3/businesses/search"
    
    headers = {
        "Accept": "application/json",
        "Authorization": "Bearer " + yelp_key # strange api key syntax "Bearer "
    }
    
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "radius": radius,
        "categories": categories,
        "limit": limit
    }
    
    response = requests.get(url, headers=headers, params=params)
    
    if response.status_code == 200: # 200 is success
        return response.json()
    else:
        print(response)
        response.raise_for_status()   

In [200]:
# testing

yelp_categories = 'cafes,bars,restaurants,parks,musicvenues' # bars, restaurants, live shows, outdoors

yelp_res = get_venues_yelp(latitude=43.664467, longitude=-79.414783, radius=800, yelp_key=YELP_KEY, categories=yelp_categories, limit=50)

test_response = yelp_res

In [202]:
test_response

{'businesses': [{'id': 'RrqcZLX05djKo1GVLMXMDQ',
   'alias': 'korean-village-restaurant-toronto',
   'name': 'Korean Village Restaurant',
   'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/I-0wVgAyB2OWoSI-nELiUA/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/korean-village-restaurant-toronto?adjust_creative=fJ3m_5J6aNJaibvcOfXcdA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=fJ3m_5J6aNJaibvcOfXcdA',
   'review_count': 452,
   'categories': [{'alias': 'korean', 'title': 'Korean'}],
   'rating': 4.0,
   'coordinates': {'latitude': 43.66461, 'longitude': -79.41444},
   'transactions': [],
   'price': '$$',
   'location': {'address1': '628 Bloor Street W',
    'address2': '',
    'address3': '',
    'city': 'Toronto',
    'zip_code': 'M6G 1K7',
    'country': 'CA',
    'state': 'ON',
    'display_address': ['628 Bloor Street W',
     'Toronto, ON M6G 1K7',
     'Canada']},
   'phone': '+14165360290',
   'display_phone': '+1 416-536-0290',
 

Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)

Put your parsed results into a DataFrame

# Comparing Results

Which API provided you with more complete data? Provide an explanation. 

Yelp API did not work thus far, but there are more granular items available from the API call such as:
- The reviews of the establishment
- Price level
- Attributes e.g. their private quote capability, disability/wheelchair accessibility, can you reserve via Yelp

Aside from average review score, not many of these features seem useful for this project, many advanced features are also available on premium tier pricing
- Ambience of the establishment
- Pet friendly establishments
- "Liked by" e.g. specific demographics and whether they liked the establishment


Get the top 10 restaurants according to their rating