Join the data from Part 1 with the data from Part 2 to create a new dataframe.

In [60]:
# imports
import pandas as pd
import numpy as np
import os # use this to access your environment variables
import requests # this will be used to call the APIs

FOURSQUARE_KEY = os.getenv('FOURSQUARE_API_KEY')
YELP_KEY = os.getenv('YELP_API_KEY')

# you can print to confirm both API keys have been imported successfully from Terminal

In [128]:
bike_stations = pd.read_csv("../data/bike_stations_sat_3pm_clean.csv")

**This bike_stations data is based on the Saturday, 3pm citybikes API call**

In [130]:
bike_stations

Unnamed: 0,id,name,latitude,longitude,free_bikes,empty_slots,bike_availability,ll
0,fb337bbed72e2be090071e199899b2be,Queen St E / Woodward Ave,43.665269,-79.319796,18,1,94.74,"43.665269,-79.319796"
1,4ff88d5880e71aa40d34cfe5d09b0ca7,Primrose Ave / Davenport Rd,43.671420,-79.445947,1,14,6.67,"43.67142,-79.445947"
2,a09c67c0b419654d907c9134b108e328,Queen St E / Rhodes Ave,43.666224,-79.317693,12,11,52.17,"43.666224,-79.317693"
3,d6a9daee68070a8b106cfb598d81308c,Bond St / Queen St E,43.653236,-79.376716,5,32,13.51,"43.653236,-79.376716"
4,8f8af40d9388c8a3962559e8681d3db7,Church St / Alexander St,43.663722,-79.380288,3,32,8.57,"43.663722,-79.380288"
...,...,...,...,...,...,...,...,...
821,9be5f078a1ed47fc11cd3cee45260f63,Kennedy Rd/Ranstone Gdns (Jack Goodlad Park),43.741906,-79.271819,1,10,9.09,"43.741906,-79.271819"
822,4ae37f3bddfb819954a15143d277dbd9,Eglinton Ave E / Brimley Rd,43.736953,-79.247984,8,11,42.11,"43.736953,-79.247984"
823,e7968ab22d9a15db0673f463144428eb,College Park South,43.659457,-79.382365,7,11,38.89,"43.659457,-79.382365"
824,62acc308c0f93ff09d28e06c73afc3ec,165 McRae Dr,43.705875,-79.368006,4,11,26.67,"43.705875,-79.368006"


In [104]:
# testing the ability to parse for POI category, number of POIs for a sample API response, need the non-cleaned version
venue_df = pd.read_csv("../data/test_venue_df.csv")

In [153]:
sample_bike_df = bike_stations.sample(n=5)

In [155]:
sample_bike_df

Unnamed: 0,id,name,latitude,longitude,free_bikes,empty_slots,bike_availability,ll
599,3a73375ca8c3f3fc85b7b32ec080be5e,Greenwood Ave / Sammon Ave,43.686987,-79.334782,1,10,9.09,"43.686987,-79.334782"
91,d06ed2c21f6ff4e4cbc77b64e65bd801,Crawford St / Queen St W,43.645665,-79.415345,15,3,83.33,"43.645665,-79.415345"
389,fb2618cee3456166ccc52c3ceeae13c6,Moore Park,43.693256,-79.383238,0,14,0.0,"43.693256,-79.383238"
296,b7461d8effda846e2f78dfbee059c433,Lake Shore Blvd W / Thirty Ninth Street,43.592742,-79.54033,19,0,100.0,"43.592742,-79.54033"
584,84e8e3827807916c9c3ac9b60404346d,Firvalley Ct / Warden Ave,43.703211,-79.278715,2,16,11.11,"43.703211,-79.278715"


**Strategy is as follows:
Steps for the 10 stations in sample_bike_df, ensure it works for the 820 rows in the whole city bike dataset**:

1. Define my get_venues_fs function to call the Foursquare API. Set it up so that it can take a concatenated ll string, like the ll column of bike_stations and sample_bike_df.
2. Initialize a new bike_stations df with the columns to fill in for # of POIs, and # of POIs in certain categories.
3. Loop through each bike station, make the get_venues_fs call with its ll column to find venue results with the data in step 2. 
4. With the returned results, we'll parse for the number of POI categories, and the number of POIs generally (know that the upper limit of returns is 50, radius of the call is 800m to make <=50 results meaningful)
5. We should have a dataframe returned containing 1) the original ll call which we can use to join the bike_stations table later if we wish, 2) the number of POIs and number of establishments within POI categories, to train the model

Notes:

**I am doing step 2 here and not in yelp_foursquare_EDA to avoid bringing city_bikes data into the yelp_foursquare notebook, so any cross-referencing of city bikes with venue data will happen here**.

**In the likely event that some bike station ll's will retrieve establishments that were retrieved by another bike station (e.g. in Downtown Toronto where several stations will be within 800m of each other), I am retrieving the fsq_id for the establishments which will be checked for dupes, with dupes removed, before removing the column in cleaning to join it with a cleaned bike_stations df**

In [71]:
# Step 1 from Strategy: Defining the foursquare API Call here

def get_venues_fs(ll, radius, api_key, categories, limit):
    """
    Get amenities and POIs from Foursquare API call
    Args:
        latitude (float): latitude for query (must be combined with longitude)
        longitude (float): longitude for query (must be combined with latitude)
        api_key (str): foursquare API to use for query (imported in line above)
        categories (str) : Foursquare-recognized place type. If not passed no place_type will be specified. Separate ids with commas
    
    Returns:
        response: response object from the requests library.
    """
    url = "https://api.foursquare.com/v3/places/search"
    
    headers = {
        "Accept": "application/json",
        "Authorization": api_key
    }
    
    params = {
        "ll": ll,
        "radius": radius,
        "categories": categories,
        "limit": limit
    }
    
    response = requests.get(url, headers=headers, params=params)
    
    if response.status_code == 200: # 200 is success
        return response.json()
    else:
        response.raise_for_status()

categories = '10035,13003,13065,16000' # Category codes - bars, restaurants, live shows, outdoors


In [157]:
# Step 2: get the bike station dataframe ready for the Foursquare API data to fill in in step 3

sample_bike_df['n_pois'] = 0
sample_bike_df['n_bar_restaurant'] = 0
sample_bike_df['n_live'] = 0
sample_bike_df['n_park'] = 0
sample_bike_df

Unnamed: 0,id,name,latitude,longitude,free_bikes,empty_slots,bike_availability,ll,n_pois,n_bar_restaurant,n_live,n_park
599,3a73375ca8c3f3fc85b7b32ec080be5e,Greenwood Ave / Sammon Ave,43.686987,-79.334782,1,10,9.09,"43.686987,-79.334782",0,0,0,0
91,d06ed2c21f6ff4e4cbc77b64e65bd801,Crawford St / Queen St W,43.645665,-79.415345,15,3,83.33,"43.645665,-79.415345",0,0,0,0
389,fb2618cee3456166ccc52c3ceeae13c6,Moore Park,43.693256,-79.383238,0,14,0.0,"43.693256,-79.383238",0,0,0,0
296,b7461d8effda846e2f78dfbee059c433,Lake Shore Blvd W / Thirty Ninth Street,43.592742,-79.54033,19,0,100.0,"43.592742,-79.54033",0,0,0,0
584,84e8e3827807916c9c3ac9b60404346d,Firvalley Ct / Warden Ave,43.703211,-79.278715,2,16,11.11,"43.703211,-79.278715",0,0,0,0


In [151]:
# Step 3:

# lists for the augmented bike_stations for the model - number of POIs, bars/restaurants/live venues, parks to populate columns for the model training
n_poi_list = []
n_barresto_list = []
n_live_list = []
n_park_list = []

# lists for the venues returned via the API call

fsq_id_list = []
ll_list = []
name_list = []
address_list = []
category_list = []

#make sure we can correctly access ll in bike_stations - all good
for index, row in sample_bike_df.iterrows():
    print(f"{row['name']}, {row['ll']}")

    # res = get_venues_fs(ll=0,0, radius=800, api_key=FOURSQUARE_KEY, categories=categories, limit=50)

Bay St / Albert St, 43.653264,-79.382458
Lynn Williams St / Pirandello St , 43.639125,-79.414232
Humber River Hospital , 43.725379,-79.487197
Scott St / The Esplanade, 43.646597,-79.375309
Howard St / Sherbourne St, 43.671258,-79.376367
Joe Shuster Way / King St W , 43.640397,-79.423742
Ellesmere Rd / Markham Rd, 43.776829,-79.2305
Yonge St / St Clair Ave, 43.688505,-79.394005
Granby St / Church St - SMART, 43.66146,-79.378946
Howard Park Ave / Dundas St W - SMART, 43.6521,-79.4486


In [None]:
# Json normalize!!!

Provide a visualization that you used as part of your EDA process. Explain the initial pattern or relationship you discoved through this visualization. 

# Database

Put all your results in an SQLite3 database (remember, SQLite stores its databases as files in your local machine - make sure to create your database in your project's data/ directory!)

Look at the data before and after the join to validate your data.