# Capstone Project - Battle of the Neighborhoods
___
This file contributes to the final Coursera Capstone Project as part of the IBM Data Science Proffesional Certificate program

I will review each section of the assignment as reference.

#### Week 1 – Part 1
Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.
This submission will eventually become your Introduction/Business Problem section in your final report. 

#### Week 1 – Part 2
Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.
This submission will eventually become your Data section in your final report.



## Introduction
___
Relocating for work can be intimidating – even more so when you have to relocate to another country which you have never been to before and now you have to take your partner or family into account as well. It’s an overwhelming life event that makes you start thinking - **what should I do, where do I begin and how do I approach this?**

The idea for this Capstone Project is to show how leveraging location data from FourSquare and other data sources can assist you with making decisions when having to relocate. 
This solution is targeted to assist any individual who must relocate to another country, to identify ideal locations of which neighborhood to live in whilst taking the family's needs into account 

To solve the problem I will follow the systematic data science methodology to our scenario where I will:

1. Understand the problem and identify our approach - *This will be captured in our scenario description*
2. Identify the required data
3. Collect and understand the data – *I will make use of web scraping and data files from open data sources. I will look at and understand what information is available in the data and identify what we need to use to solve our problem*
4. Data preparation – *I will clean and manipulate the data to ensure that I am only working with information relevant to solve the problem*
5. Data analysis – *I will develop graphs and charts to assist me in understanding and visualizing the data*
6. Model the data – *I will incorporate machine learning to assist in modeling the data*
7. Model evaluation - *I will then evaluate the model to ensure it's usefull for the target audience*


#### Scenario Description

In this scenario the individual needs to relocate to Boston, Massachusetts, United States where he will be working. He needs to take his partner (who is a chef) into consideration as she will be relocating with him and will also have to look for a new job. He also need to take his child into consideration for a school.  

He starts off by listing his and his family's basic wants and needs to identify how to approach the problem and to identify what data is required. 

|Basic Needs| Family Wants|
|-----------|-------------|
|Security|To live in a safe environment|
|Close to working locations| Must have loads of restaurants in the area, Close to Museum|
|Public Elementary school|Neighborhood close to school|

The individual, who works at The Museum of Science, will have to identify a neighborhood to live in. It should be in a safe environment, reasonably located from his work, but also be in a area where there are many restaurants to provide a wide opportunity range for his partner to look for a job and have a public Elementary school in the vicinity for his kid.

Now that the problem has been defined, our approach will be to predict the best neighborhood to live in based on the data gathered.

## Data
___ 
#### Data requirements
From the basic wants and needs list in the scenario description, I can see that I require the following data:
1. Crime data
2. School data
3. FourSquare data

I will thus make use of open source data from Analyze Boston which is the City of Boston's open data hub. From here I can obtain the crime data as well as school data. This will assist me in identifying safer neighborhoods to live in and where the options are for Elementary schools.

I'll also make use of the FourSquare API to query for geographical data on restaurants as well as to identify top recommendations for the restaurants. It will be ideal to stay in the neighborhood or close to the neighborhood where there are many, but also top recommended restaurants  as a potential work option for the wife. I'll also make use of FourSquare to identify the location of the museum so that it can be taken into consideration for a neigborhood location.

For each dataset used in this study I'll follow a similar approach to first explain where the data comes from, what is contained in the data and how I prepared the data. Thereafter I will capture in the methodology how I analysed and visualised the data.

### Boston Crime Data
___

The dataset can be downloaded from the [Boston data portal]( https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system). This is a dataset containing records for crime incident reports as provided by the Boston Police Department and includes types of incidents as well as when and where it occurred.

This data, when analysed, will allow me to identify lower crime neighborhoods and to consider these as possible options for neighborhood selections

I first install all libraries that will be required to read the data.
The data is then extracted and read into a pandas dataframe.

In [1]:
#Libraries to import
!pip install requests
!pip install beautifulsoup4
!pip install lxml
!pip install xlrd

import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
%matplotlib inline 

import requests
from bs4 import BeautifulSoup

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/cb/a1/c698cf319e9cfed6b17376281bd0efc6bfc8465698f54170ef60a485ab5d/beautifulsoup4-4.8.2-py3-none-any.whl (106kB)
[K     |████████████████████████████████| 112kB 29.3MB/s eta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/05/cf/ea245e52f55823f19992447b008bcbb7f78efc5960d77f6c34b5b45b36dd/soupsieve-2.0-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.2 soupsieve-2.0
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/dd/ba/a0e6866057fc0bbd17192925c1d63a3b85cf522965de9bc02364d08e5b84/lxml-4.5.0-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 11.8MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.0
Collecting xlrd
[?25l  Downloading https://files.pythonhosted.or

My dataset source is identified and read into a dataframe

In [3]:
#Datasets

CD='https://data.boston.gov/dataset/6220d948-eae2-4e4b-8723-2dc8e67722a3/resource/12cb3883-56f5-47de-afa5-3b1cf61b257b/download/tmp9z6400g0.csv'

In [4]:
dfCD = pd.read_csv(CD)
dfCD.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,TESTTEST2,423,,ASSAULT - AGGRAVATED,External,,0,2019-10-16 00:00:00,2019,10,Wednesday,0,,RIVERVIEW DR,,,"(0.00000000, 0.00000000)"
1,I92102201,3301,,VERBAL DISPUTE,E13,583.0,0,2019-12-20 03:08:00,2019,12,Friday,3,,DAY ST,42.325122,-71.107779,"(42.32512200, -71.10777900)"
2,I92097173,3115,,INVESTIGATE PERSON,C11,355.0,0,2019-10-23 00:00:00,2019,10,Wednesday,0,,GIBSON ST,42.297555,-71.059709,"(42.29755500, -71.05970900)"
3,I92094519,3126,,WARRANT ARREST - OUTSIDE OF BOSTON WARRANT,D14,765.0,0,2019-11-22 07:50:00,2019,11,Friday,7,,BROOKS ST,42.35512,-71.162678,"(42.35512000, -71.16267800)"
4,I92089785,3005,,SICK ASSIST,E13,574.0,0,2019-11-05 18:00:00,2019,11,Tuesday,18,,WASHINGTON ST,42.309718,-71.104294,"(42.30971800, -71.10429400)"


From the dataframe you can see there is information such as incident number, type of offense,district, date and location. Not all of these attributes are required. The following data was imported to create a new dataframe, since we only need to know the type of offense and where it occured so that we can monitor the trend with regards to the crime in the neighborhoods.
- Offense Code Group
- Offense Description
- Year
- Month
- Street
- Lat
- Long

An example of the Crime datatframe with only the required data can be seen below. The columns that we won't be using was removed.

In [None]:
#dfCD.columns.values
dfCD=dfCD[[ 'OFFENSE_CODE_GROUP','OFFENSE_DESCRIPTION','YEAR', 'MONTH', 'STREET', 'Lat', 'Long', 'Location']]
dfCD

The data is then processed and all rows containing no data is removed. 

In [11]:
dfCD2 = dfCD.dropna()
dfCD2

Unnamed: 0,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,YEAR,MONTH,STREET,Lat,Long,Location
15,Auto Theft,AUTO THEFT,2019,10,LINCOLN ST,42.259518,-71.121563,"(42.25951765, -71.12156299)"
16,Auto Theft,AUTO THEFT,2019,10,METROPOLITAN AVE,42.262092,-71.116710,"(42.26209214, -71.11670964)"
17,Auto Theft,AUTO THEFT - LEASED/RENTED VEHICLE,2019,10,ALLSTON ST,42.352375,-71.135096,"(42.35237455, -71.13509584)"
18,Auto Theft,AUTO THEFT,2019,10,SAINT JAMES AVE,42.349476,-71.076402,"(42.34947586, -71.07640150)"
19,Auto Theft,AUTO THEFT - LEASED/RENTED VEHICLE,2019,10,N MEAD ST,42.381846,-71.066551,"(42.38184582, -71.06655134)"
...,...,...,...,...,...,...,...,...
426854,Warrant Arrests,WARRANT ARREST,2019,5,NEW SUDBURY ST,42.361839,-71.059765,"(42.36183857, -71.05976489)"
426855,Warrant Arrests,WARRANT ARREST,2019,5,NEW SUDBURY ST,42.361839,-71.059765,"(42.36183857, -71.05976489)"
426856,Warrant Arrests,WARRANT ARREST,2019,5,NEW SUDBURY ST,42.361839,-71.059765,"(42.36183857, -71.05976489)"
426857,Warrant Arrests,WARRANT ARREST,2019,5,NEW SUDBURY ST,42.361839,-71.059765,"(42.36183857, -71.05976489)"


I then obtain a list of all the crime groups so that I can change my dataframe to only contain crimes considered serious. This is really up to each individual to decide on what is considered serious or not.

In [12]:
CrimeGroupIndex = dfCD2.OFFENSE_CODE_GROUP.unique()
CrimeGroupIndex

array(['Auto Theft', 'Investigate Property', 'Investigate Person',
       'Vandalism', 'Verbal Disputes', 'Motor Vehicle Accident Response',
       'Aggravated Assault', 'Residential Burglary', 'Larceny',
       'Firearm Violations', 'Medical Assistance', 'Simple Assault',
       'Missing Person Reported', 'Robbery', 'Property Lost',
       'Violations', 'Firearm Discovery', 'Warrant Arrests', 'Other',
       'Ballistics', 'Towed', 'Drug Violation', 'Fire Related Reports',
       'Fraud', 'Disorderly Conduct', 'Larceny From Motor Vehicle',
       'Police Service Incidents', 'Missing Person Located', 'Harassment',
       'Property Found', 'Liquor Violation', 'Property Related Damage',
       'Confidence Games', 'Commercial Burglary',
       'Recovered Stolen Property', 'Homicide', 'Other Burglary',
       'Assembly or Gathering Violations', 'Counterfeiting',
       'Prisoner Related Incidents', 'License Plate Related Incidents',
       'Restraining Order Violations', 'Search Warrants',


Offenses not considered as serious crimes were removed from the dataset i.e. Investigate Person, Medical Assistance, Investigate Property, Warrant for arrest, Animal Incidents, Violations etc. 

In [13]:
CrimeGroups=['Auto Theft', 'Vandalism', 'Aggravated Assault', 'Residential Burglary', 'Larceny','Simple Assault',
             'Robbery', 'Fraud', 'Larceny From Motor Vehicle','Commercial Burglary','Homicide', 
              'Other Burglary','Prostitution','Explosives', 'Arson','Manslaughter',
             'HUMAN TRAFFICKING','HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE','Burglary - No Property Taken']

dfCDgroup=dfCD2[dfCD2.OFFENSE_CODE_GROUP.isin(CrimeGroups)]
dfCDgroup

Unnamed: 0,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,YEAR,MONTH,STREET,Lat,Long,Location
15,Auto Theft,AUTO THEFT,2019,10,LINCOLN ST,42.259518,-71.121563,"(42.25951765, -71.12156299)"
16,Auto Theft,AUTO THEFT,2019,10,METROPOLITAN AVE,42.262092,-71.116710,"(42.26209214, -71.11670964)"
17,Auto Theft,AUTO THEFT - LEASED/RENTED VEHICLE,2019,10,ALLSTON ST,42.352375,-71.135096,"(42.35237455, -71.13509584)"
18,Auto Theft,AUTO THEFT,2019,10,SAINT JAMES AVE,42.349476,-71.076402,"(42.34947586, -71.07640150)"
19,Auto Theft,AUTO THEFT - LEASED/RENTED VEHICLE,2019,10,N MEAD ST,42.381846,-71.066551,"(42.38184582, -71.06655134)"
...,...,...,...,...,...,...,...,...
426836,Larceny,LARCENY IN A BUILDING UNDER $50,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426837,Larceny,LARCENY IN A BUILDING UNDER $50,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426843,Simple Assault,ASSAULT & BATTERY,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426844,Simple Assault,ASSAULT & BATTERY,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"


I also renamed all related incidents to a single category i.e. Other Burglary and Residential Burglary will become just Burglary and Simple Assault will become Assault etc.

I then lastly removed any other columns I realise I wont be needing any further such as the Offense Description.

In [14]:
replace={'Residential Burglary':'Burglary','Commercial Burglary':'Burglary','Other Burglary':'Burglary','Burglary - No Property Taken':'Burglary',
         'Aggravated Assault':'Assault','Simple Assault':'Assault','Larceny From Motor Vehicle':'Larceny','HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE':'HUMAN TRAFFICKING'}

dfCDgroup=dfCDgroup.replace({"OFFENSE_CODE_GROUP":replace})
dfCDgroup=dfCDgroup.drop(['OFFENSE_DESCRIPTION'],axis=1)
dfCDgroup

Unnamed: 0,OFFENSE_CODE_GROUP,YEAR,MONTH,STREET,Lat,Long,Location
15,Auto Theft,2019,10,LINCOLN ST,42.259518,-71.121563,"(42.25951765, -71.12156299)"
16,Auto Theft,2019,10,METROPOLITAN AVE,42.262092,-71.116710,"(42.26209214, -71.11670964)"
17,Auto Theft,2019,10,ALLSTON ST,42.352375,-71.135096,"(42.35237455, -71.13509584)"
18,Auto Theft,2019,10,SAINT JAMES AVE,42.349476,-71.076402,"(42.34947586, -71.07640150)"
19,Auto Theft,2019,10,N MEAD ST,42.381846,-71.066551,"(42.38184582, -71.06655134)"
...,...,...,...,...,...,...,...
426836,Larceny,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426837,Larceny,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426843,Assault,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"
426844,Assault,2018,12,BROOKLEDGE ST,42.309563,-71.089902,"(42.30956305, -71.08990197)"


My crime data is now ready to be analyzed

### Boston School Data
___

The dataset can be downloaded from the [Boston data portal](https://data.boston.gov/dataset/buildbps-facilities-and-educational-data-for-boston-public-schools). This dataset gives general information about each school building, the type of schools as well as location data. This dataset contains much more information relating to school building investments, but contains more information than the standard public school dataset which we might be interested in.

The data is first extracted and read into a pandas dataframe.

In [54]:

SD='https://data.boston.gov/dataset/b2c5a9d3-609d-49ec-906c-b0e850a8d62a/resource/dd3e5406-0f4c-47ad-9c3d-da1213ccb868/download/buildbps.xlsx'
dfSD = pd.read_excel(SD)
dfSD.head()

Unnamed: 0,SMMA_Identifier,SMMA_Only_For_Map,BPS_School_Name,BPS_Historical_Name,SMMA_Abbreviated_Name,BPS_Address,BRA_Neighborhood,SMMA_latitude,SMMA_longitude,SMMA_Typology,...,SMMA_EA_K8_Adequacy_Cafeteria,SMMA_EA_K8_Adequacy_Stage,SMMA_EA_K8_Adequacy_Medical,SMMA_EA_K8_Adequacy_Administration,SMMA_EA_K8_Adequacy_Custodial,SMMA_EA_K8_Adequacy_Network,SMMA_EA_K8_Adequacy_Other_1,SMMA_EA_K8_Adequacy_Other_2,SMMA_EA_K8_Adequacy_Other_3,SMMA_EA_K8_Overall_EFE_spaces
0,031,,"Adams, Samuel Elementary",Adams,Adams,"165 Webster St East Boston, MA 02128",East Boston,42.365553,-71.034917,Elementary School,...,,,,,,,,,,
1,078,,"Alighieri, Dante Montessori School",Alighieri,Alighieri,"37 Gove Street East Boston, MA 02128",East Boston,42.371565,-71.037608,Elementary School,...,,,,,,,,,,
2,045A,,Another Course to College*,Taft,ACC*,"20 Warren Street Brighton, MA 02135",Allston,42.350354,-71.145582,High School,...,,,,,,,,,,
3,012,,Baldwin Early Learning Pilot Academy,Baldwin ELC,Baldwin,"121 Corey Rd Brighton, MA 02135",Brighton,42.342037,-71.140529,Early Learning,...,,,,,,,,,,
4,087,,"Bates, Phineas Elementary",Bates,Bates,"426 Beech St Roslindale, MA 02131",Roslindale,42.277663,-71.135353,Elementary School,...,,,,,,,,,,


From the dataset, the information seems quite cryptic. But there are pdf documents on the website that describes the data keys and I'll make use of those file to extract and rename the columns that I need. 
I first extract the information with regards to the dataset.

In [55]:
print (dfSD.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 141 entries, 0 to 140
Columns: 251 entries, SMMA_Identifier to SMMA_EA_K8_Overall_EFE_spaces
dtypes: datetime64[ns](1), float64(35), int64(4), object(211)
memory usage: 276.6+ KB
None


From the information I can see that there are 141 data entries and 251 columns


The following describes the data keys and associated information we require:
- **BPS_School_Name:** School Name
- **BPS_Address:** School Address
- **BRA_Neighborhood  :** School Neighborhood    
- **SMMA_latitude:** Latitude
- **SMMA_longitude:** Longitude
- **SMMA_Typology:** Type of School


Next I'll create a new dataframe that contains only the above mentioned data columns. Column prefixes specify the source and category of information for each field but I'll remove that information when cleaning up the dataset and give the columns more meaningful headers.

In [56]:
#Dataframe to only contain mentioned columns
dfSD=dfSD[['BPS_School_Name','BPS_Address','BRA_Neighborhood',    
           'SMMA_latitude','SMMA_longitude','SMMA_Typology']]
#Rename Columns
dfSD.rename(columns={'BPS_School_Name':'School_Name','BPS_Address':'Address','BRA_Neighborhood':'Neighborhood','SMMA_latitude':'Latitude','SMMA_longitude':'Longitude','SMMA_Typology':'Type'}, inplace=True)

dfSD

Unnamed: 0,School_Name,Address,Neighborhood,Latitude,Longitude,Type
0,"Adams, Samuel Elementary","165 Webster St East Boston, MA 02128",East Boston,42.365553,-71.034917,Elementary School
1,"Alighieri, Dante Montessori School","37 Gove Street East Boston, MA 02128",East Boston,42.371565,-71.037608,Elementary School
2,Another Course to College*,"20 Warren Street Brighton, MA 02135",Allston,42.350354,-71.145582,High School
3,Baldwin Early Learning Pilot Academy,"121 Corey Rd Brighton, MA 02135",Brighton,42.342037,-71.140529,Early Learning
4,"Bates, Phineas Elementary","426 Beech St Roslindale, MA 02131",Roslindale,42.277663,-71.135353,Elementary School
...,...,...,...,...,...,...
136,West Roxbury Academy*,"1205 VFW Pkwy, West Roxbury, MA 02132",West Roxbury,42.282206,-71.174549,High School
137,West Zone Early Learning Center*,"200 Heath St Jamaica Plain, MA 02130",Jamaica Plain,42.326026,-71.106821,Early Learning
138,"Winship, F. Lyman Elementary","54 Dighton St Brighton, MA 02135",Brighton,42.347723,-71.154965,Elementary School
139,"Winthrop, John Elementary","35 Brookford St Dorchester, MA 02125",Roxbury,42.318387,-71.075341,Elementary School


I then filter the data to only include Elementary Schools

In [76]:
dfSD2=dfSD.loc[dfSD['Type']=="Elementary School"]
print("The new dataframe size is:", dfSD2.shape)
dfSD2.head()


The new dataframe size is: (48, 6)


Unnamed: 0,School_Name,Address,Neighborhood,Latitude,Longitude,Type
0,"Adams, Samuel Elementary","165 Webster St East Boston, MA 02128",East Boston,42.365553,-71.034917,Elementary School
1,"Alighieri, Dante Montessori School","37 Gove Street East Boston, MA 02128",East Boston,42.371565,-71.037608,Elementary School
4,"Bates, Phineas Elementary","426 Beech St Roslindale, MA 02131",Roslindale,42.277663,-71.135353,Elementary School
5,"Beethoven, Ludwig Van Elementary","5125 Washington St West Roxbury, MA 02132",West Roxbury,42.26352,-71.155824,Elementary School
6,"Blackstone, William Elementary","380 Shawmut Ave Boston, MA 02118",South End,42.341012,-71.072056,Elementary School


I can see now that from my original dataset containing 141 schools I now only have 48 Elementary schools to work with.

My school dataset is now ready to be analyzed.

### FourSquare Data
___

Now I can extract data on restaurants from FourSquare.
I'll query the FourSquare website for the top restaurants in Boston and get geographical data about the restaurants.


In [80]:
#My FourSquare credentials
CLIENT_ID = '4HM0BBH1BBDYJRAJLHCFPNVAM24JAE1ZB2LZAUY5WEV3SQVW' # your Foursquare ID
CLIENT_SECRET = 'W2XRY3B5XNT5USCKD42HBO5MNLWZNQAKUA5UTVMJTEXMBSC4' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
#print('My FourSquare credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

#Import libraries

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Folium installed
Libraries imported.


I require both the Museum location data and various restaurants' location data that's in the vicinity, as it would be ideal to have both partners' work location close to one another. I first obtain the Museum's location data based on the address.

In [99]:
address = '1 Science Park, Boston, MA 02114 '

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The Museum's coordinates are:",latitude, longitude)

The Museum's coordinates are: 42.3667323 -71.0677425


I then define a query to search for restaurants within a 50km radius from the museum and transform the data into a pandas dataframe. The location range can be changed according to the needs of the individual.

In [100]:
search_query = 'Restaurant'
radius = 50000

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
#results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
print (dataframe.shape)
dataframe.head()


(30, 25)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id,location.neighborhood
0,586d78f9de0cbc0b7f87e131,Monument Restaurant & Tavern,"[{'id': '4bf58dd8d48988d155941735', 'name': 'G...",v-1582544679,False,251 Main St,School,42.376865,-71.066053,"[{'label': 'display', 'lat': 42.37686494887256...",...,United States,"[251 Main St (School), Charlestown, MA 02129, ...",1321005.0,https://www.grubhub.com/restaurant/monument-re...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
1,4b50fb2bf964a520c53b27e3,Ninety Nine Restaurant,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1582544679,False,29-31 Austin St,,42.374881,-71.067056,"[{'label': 'display', 'lat': 42.374881, 'lng':...",...,United States,"[29-31 Austin St, Charlestown, MA 02129, Unite...",1790957.0,https://www.grubhub.com/restaurant/ninety-nine...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
2,4b1afebdf964a5200cf623e3,Billy Tse's Restaurant,"[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",v-1582544679,False,240 Commercial St,,42.363832,-71.051115,"[{'label': 'display', 'lat': 42.36383163608474...",...,United States,"[240 Commercial St, Boston, MA 02109, United S...",317519.0,https://www.grubhub.com/restaurant/billy-tse-2...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,52131703.0,
3,4b57b367f964a520af3c28e3,Last Corner Restaurant,"[{'id': '4bf58dd8d48988d147941735', 'name': 'D...",v-1582544679,False,49 High Street,Chute St,42.37657,-71.063172,"[{'label': 'display', 'lat': 42.37657, 'lng': ...",...,United States,"[49 High Street (Chute St), Boston, MA 01867, ...",,,,,,,,
4,4c91546ab641236ab6a68079,Q Restaurant,"[{'id': '52af0bd33cf9994f4e043bdd', 'name': 'H...",v-1582544679,False,660 Washington St,at Beach St.,42.351707,-71.062715,"[{'label': 'display', 'lat': 42.35170662953259...",...,United States,"[660 Washington St (at Beach St.), Boston, MA ...",,,,,,,466191906.0,


I then clean up the dataframe to only include information on the restaurants and location. I also remove any city information not pertaining to Boston.

In [101]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

df=dataframe_filtered.drop(['crossStreet'],axis=1)
df=df.loc[df['city']=="Boston"]
df

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
2,Billy Tse's Restaurant,Sushi Restaurant,240 Commercial St,42.363832,-71.051115,"[{'label': 'display', 'lat': 42.36383163608474...",1405,2109,US,Boston,MA,United States,"[240 Commercial St, Boston, MA 02109, United S...",,4b1afebdf964a5200cf623e3
3,Last Corner Restaurant,Diner,49 High Street,42.37657,-71.063172,"[{'label': 'display', 'lat': 42.37657, 'lng': ...",1157,1867,US,Boston,MA,United States,"[49 High Street (Chute St), Boston, MA 01867, ...",,4b57b367f964a520af3c28e3
4,Q Restaurant,Hotpot Restaurant,660 Washington St,42.351707,-71.062715,"[{'label': 'display', 'lat': 42.35170662953259...",1723,2111,US,Boston,MA,United States,"[660 Washington St (at Beach St.), Boston, MA ...",,4c91546ab641236ab6a68079
5,Primo's Restaurant,Pizza Place,28 Myrtle St,42.359324,-71.065583,"[{'label': 'display', 'lat': 42.35932373996034...",843,2114,US,Boston,MA,United States,"[28 Myrtle St, Boston, MA 02114, United States]",,4aa91a1af964a520fe5120e3
6,Great Taste Bakery & Restaurant,Bakery,31 Beach St,42.351291,-71.060165,"[{'label': 'display', 'lat': 42.35129067813932...",1828,2111,US,Boston,MA,United States,"[31 Beach St, Boston, MA 02111, United States]",,4ae310cef964a520399021e3
8,Pulcinella Mozzarella Bar and Restaurant,Italian Restaurant,78 Salem St,42.363693,-71.055872,"[{'label': 'display', 'lat': 42.36369287757674...",1033,2113,US,Boston,MA,United States,"[78 Salem St, Boston, MA 02113, United States]",,502bd747e4b082dc2d00820d
9,Moon Villa Restaurant,Chinese Restaurant,19 Edinboro St,42.351884,-71.059554,"[{'label': 'display', 'lat': 42.35188354712937...",1784,2111,US,Boston,MA,United States,"[19 Edinboro St (Near Beach St.), Boston, MA 0...",,4bb21286f964a52094b63ce3
10,Thornton's Restaurant & Cafe,Diner,150 Huntington Ave,42.345288,-71.08201,"[{'label': 'display', 'lat': 42.34528762816119...",2660,2115,US,Boston,MA,United States,"[150 Huntington Ave (at W Newton St), Boston, ...",,4aec58d2f964a52035c621e3
11,Montien Boston - Thai Restaurant,Thai Restaurant,63 Stuart St,42.351094,-71.064498,"[{'label': 'display', 'lat': 42.35109416020406...",1761,2116,US,Boston,MA,United States,"[63 Stuart St (Tremont St), Boston, MA 02116, ...",,4a04e5aff964a5203e721fe3
12,International Restaurant & Pub,Restaurant,184 High St,42.357344,-71.052514,,1631,2110,US,Boston,MA,United States,"[184 High St, Boston, MA 02110, United States]",,40b28c80f964a520bff71ee3


My FourSquare data is now ready to be analysed.