## Data

In order to determine the best location for Agios Coffee, the following key factors will be considered:
* the number of coffee shops and cafes in a neighborhood
* the number of and distance to activities (ex. beach, museums, theaters)

The data sources are the following:
* Foursquare API for coffee shop/cafe number and location, as well as activity number and location
* Socrata Open Data API (SODA), specifically the USC LA data set, provided Neighborhood information

## Neighborhood Data
The neighborhood data from SODA provides vertex data for the polygon that forms each neighborhood of LA as well as the geometric center of the neighborhood. The following code scrapes this data, and saves it to a PANDAS data frame.

In [1]:
import requests

url = "https://usc.data.socrata.com/resource/9utn-waje.json"
response = requests.get(url).json()
response[1]

{'set': 'L.A. County Neighborhoods (Current)',
 'slug': 'adams-normandie',
 'the_geom': {'type': 'MultiPolygon',
  'coordinates': [[[[-118.30900800000012, 34.03741099912408],
     [-118.30040800000013, 34.0373119991241],
     [-118.29150800000001, 34.03681199912407],
     [-118.29140800000012, 34.025511999124234],
     [-118.305408, 34.025711999124255],
     [-118.30900800000012, 34.025611999124216],
     [-118.30900800000012, 34.03741099912408]]]]},
 'kind': 'L.A. County Neighborhood (Current)',
 'external_i': 'adams-normandie',
 'name': 'Adams-Normandie',
 'display_na': 'Adams-Normandie L.A. County Neighborhood (Current)',
 'sqmi': '0.805350187789',
 'type': 'segment-of-a-city',
 'latitude': '-118.30020800000011',
 'longitude': '34.031461499124156',
 'location': 'POINT(34.031461499124156 -118.30020800000011)'}

In [2]:
import pandas as pd

# define column names
column_names = ['Neighborhood', 'Latitude', 'Longitude']

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude


In [3]:
# loop that iterates through all the data and  saves it into the data frame

for data in response:
    nh = data['name']
    nh_lat = data['latitude']
    nh_long = data['longitude']
    
    neighborhoods = neighborhoods.append({'Neighborhood':nh, 'Latitude':nh_lat, 'Longitude':nh_long}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Acton,-118.16981019229348,34.49735523924085
1,Adams-Normandie,-118.30020800000013,34.03146149912416
2,Agoura Hills,-118.75988450000015,34.146736499122795
3,Agua Dulce,-118.3171036690717,34.50492699979684
4,Alhambra,-118.1365120000002,34.08553899912357


Foursquare API Credentials (removed for publishing)

In [4]:
CLIENT_ID = 'CXK04REOP30TX1AJDHAWETG2NJVWV4FZULDNRZYV0W514MZ0' # your Foursquare ID
CLIENT_SECRET = 'JKP0CWYL5DELPV1PKRO0VZ2E3G5SV5UTJCQXD44XX5G5DUOO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
ACCESS_TOKEN = 'ZAQTWMXJD3MUDMBIQ1OUIWDMELCQMLDJG4GTHHORHYKKQIE2'

The following function collects coffee shop and coffee roaster data for a given neighborhood

In [5]:
#radius is equivalent to roughly 1.06 miles

def getNearbyCoffee(names, latitudes, longitudes, radius=1700):
    
    coffee_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
        # This the category ID for coffee shops, and coffee roasters, respectively
        for catID in {'4bf58dd8d48988d1e0931735', '5e18993feee47d000759b256'}:
        
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lng, 
                lat,
                radius, 
                LIMIT,
                catID)
            
            # make the GET request
            #print(requests.get(url).json())
            results = requests.get(url).json()["response"]['groups'][0]['items']
                
            # return only relevant information for each nearby venue
            coffee_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng']) for v in results])#,  
                #v['venue']['categories'][0]['name']) for v in results])

        nearby_coffee = pd.DataFrame([item for coffee_list in coffee_list for item in coffee_list])
        nearby_coffee.columns = ['Neighborhood', 
                      'Neighborhood Latitude', 
                      'Neighborhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude']#, 
                      #'Venue Category']
    
    return(nearby_coffee)

Call the previous function and saves it into a data frame, and saves the data frame into a CSV file

In [6]:
#call foursquare API to get all coffee shops, 
LA_coffee = getNearbyCoffee(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'])

In [7]:
LA_coffee.to_csv('LA_coffee.csv',index=False)

Following code removes any duplicate venues

In [8]:
LA_coffee_scrub = LA_coffee.drop_duplicates().reset_index(drop=True)
LA_coffee_scrub

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Acton,-118.16981019229348,34.497355239240846,Perkin' Up Coffee House,34.490303,-118.160742
1,Acton,-118.16981019229348,34.497355239240846,The Rustic Cafe & Bakery,34.490163,-118.160684
2,Adams-Normandie,-118.30020800000011,34.031461499124156,Ignatius Cafe,34.031772,-118.293006
3,Adams-Normandie,-118.30020800000011,34.031461499124156,With Love Market & Cafe,34.038540,-118.291786
4,Adams-Normandie,-118.30020800000011,34.031461499124156,Blu Elefant Café,34.039827,-118.303951
...,...,...,...,...,...,...
3196,Woodland Hills,-118.61521650000006,34.159408692550485,Starbucks,34.168600,-118.615833
3197,Woodland Hills,-118.61521650000006,34.159408692550485,Starbucks,34.168614,-118.615454
3198,Woodland Hills,-118.61521650000006,34.159408692550485,The Coffee Bean & Tea Leaf,34.168214,-118.603009
3199,Woodland Hills,-118.61521650000006,34.159408692550485,Starbucks,34.157034,-118.605543


This function returns the entertainment venues in a neighborhood

In [9]:
def getNearbyEntertainment(names, latitudes, longitudes, radius=1700):
    
    venue_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
        # This the category ID for arts and entertainment
        catID = '4d4b7104d754a06370d81259'
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lng, 
            lat,
            radius, 
            LIMIT,
            catID)
            
        # make the GET request
        #print(requests.get(url).json())
        results = requests.get(url).json()["response"]['groups'][0]['items']
                
        # return only relevant information for each nearby venue
        venue_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venue_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                      'Neighborhood Latitude', 
                      'Neighborhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return(nearby_venues)

Runs the function and shows the head of the newly formed data frame

In [10]:
LA_venues = getNearbyEntertainment(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'])
LA_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acton,-118.16981019229348,34.49735523924085,Dancin In Acton,34.490303,-118.160742,Arts & Entertainment
1,Adams-Normandie,-118.30020800000013,34.03146149912416,Haunted Play: Delusion,34.03586,-118.306248,Indie Theater
2,Adams-Normandie,-118.30020800000013,34.03146149912416,The Ray Stark Family Theatre (SCA 108),34.023434,-118.286181,Movie Theater
3,Adams-Normandie,-118.30020800000013,34.03146149912416,Korean National Association Memorial Hall,34.025176,-118.296893,Museum
4,Adams-Normandie,-118.30020800000013,34.03146149912416,Hannon Theater Company,34.044132,-118.299148,Theater


In [11]:
LA_venues.to_csv('LA_venues.csv',index=False)