# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to point out the places where people shouldn't go at all. 

Since there are lots of places with very low ratings, we will try to detect them and highlight it to the users so they can avoid a bad experience. We are interested in every type of places, be it a restaurant, a park, a club, etc.. 

We will use our data science powers to generate some of the worst places based on their ratings. The evaluation for each place will be expressed clearly, so the place can be noted and avoided by end users.

All the end user will need to do is to set the city center he/she wants and we will highlight where he/she should get away.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing places where their grades are less than 6.0
* distance of the place from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* evaluation rating from the places, their type and location will be obtained using **Foursquare API**
* To start the analysis, we will extract a latitude and longitude from New York city center using the Google. For new analysis, the user will only need to specify the new latitude and longitude data.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of a specified city (in this samples is New York city center) and their low rated places. We will consider all kind of venues, the minimum rating value will be determined by the user (in this sample is 6.0) and the area radius that will be checked is = 500.

The first step was to plot a map of the specified address, marking it with a blue spot.

In the second step we collected the data of the place nearby the specified address, using the foursquare API. With the venues id we where able to access its informations, such as name, latitude, longitude and the rating.

In the third and final step, with all the venues and their ratings, we were able to filter the values based on the minimal rating value specified by the user (in this sample = 6.0). And with this filtered value, we could plot a new map, marking in red the places to be avoided by the user.

#### Installing Necessary libraries

In [None]:
!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge folium=0.5.0 --yes

#### Importing Necessary Libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


# Foursquare Connection

In [15]:
CLIENT_ID = 'PPM4QIGUKUD143FWO2JMHPNJULBEBEMC5Y3SPLKT42UTGAVM' # your Foursquare ID
CLIENT_SECRET = 'CKQTVC4V1XFUOM23AMADFERICMWVRC5KOMO1FX2W5UDPQK1S' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JQJQW40C5FYJHG3OXKKT0KCKYWPVQM4GXLJ453Z5YHA5ZISH
CLIENT_SECRET:BOONQ5BMUHZ03WYNFU2REQ115M0OGNA2XAH3BFENFTNN4SRU


#### Getting Address latitude and Logitude

##### * If you want to search for a new city, just define a new valid address below

In [3]:
address = '131 W 55th St, New York, NY' 

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7637566 -73.9796239


In [4]:
print(location)

131, West 55th Street, Midtown, Manhattan, Manhattan Community Board 5, New York County, New York, 10019, United States of America


##### * If you want to define a new value for a the minimumRating, just define a new float value below

In [None]:
minimumRating = 6.0

# Displaying specified city map, marking the city centes as blue

In [5]:
definedMap = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(definedMap)

# display map
definedMap

# Accessing Foursquare Venues

In [16]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=JQJQW40C5FYJHG3OXKKT0KCKYWPVQM4GXLJ453Z5YHA5ZISH&client_secret=BOONQ5BMUHZ03WYNFU2REQ115M0OGNA2XAH3BFENFTNN4SRU&ll=40.7637566,-73.9796239&v=20180604&radius=500&limit=30'

In [17]:
results = requests.get(url).json()

Get relevant part of the JSON file and tranform into a Pandas Df

In [18]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = pd.json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.country,location.formattedAddress,venuePage.id,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,location.neighborhood
0,4b53367af964a5208e9227e3,New York City Center,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",v-1592787725,False,131 W 55th St,btwn 6th & 7th Ave,40.763927,-73.979503,"[{'label': 'display', 'lat': 40.76392698771681...",...,United States,"[131 W 55th St (btwn 6th & 7th Ave), New York,...",35937794.0,,,,,,,
1,4b8edbebf964a520473b33e3,CitySpire,"[{'id': '4bf58dd8d48988d130941735', 'name': 'B...",v-1592787725,False,156 W 56th St,,40.763312,-73.979472,"[{'label': 'display', 'lat': 40.76331174213796...",...,United States,"[156 W 56th St, New York, NY 10019, United Sta...",,,,,,,,
2,4aee1ed1f964a52048d221e3,New York City Center Stage I,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",v-1592787725,False,131 W 55th St,btw 6th & 7th Ave,40.763932,-73.979399,"[{'label': 'display', 'lat': 40.76393200000000...",...,United States,"[131 W 55th St (btw 6th & 7th Ave), New York, ...",,,,,,,,
3,516eb849e4b042332cd7a5b3,6 1/2 Avenue,"[{'id': '52e81612bcbc57f1066b7a25', 'name': 'P...",v-1592787725,False,6 1/2 Ave,btwn W 51st & W 57th Sts,40.763458,-73.9801,"[{'label': 'display', 'lat': 40.76345848310408...",...,United States,"[6 1/2 Ave (btwn W 51st & W 57th Sts), New Yor...",,,,,,,,
4,580b8fa5d67c230d46aa5551,Black Tap,"[{'id': '4bf58dd8d48988d16c941735', 'name': 'B...",v-1592787725,False,136 W 55th St,,40.763548,-73.979763,"[{'label': 'display', 'lat': 40.76354829292842...",...,United States,"[136 W 55th St, New York, NY 10019, United Sta...",,2034711.0,https://www.seamless.com/menu/black-tap-midtow...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,


#### Define relevant information and filter it

In [19]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,New York City Center,Theater,131 W 55th St,btwn 6th & 7th Ave,40.763927,-73.979503,"[{'label': 'display', 'lat': 40.76392698771681...",21,10019.0,US,New York,NY,United States,"[131 W 55th St (btwn 6th & 7th Ave), New York,...",,4b53367af964a5208e9227e3
1,CitySpire,Building,156 W 56th St,,40.763312,-73.979472,"[{'label': 'display', 'lat': 40.76331174213796...",51,10019.0,US,New York,NY,United States,"[156 W 56th St, New York, NY 10019, United Sta...",,4b8edbebf964a520473b33e3
2,New York City Center Stage I,Theater,131 W 55th St,btw 6th & 7th Ave,40.763932,-73.979399,"[{'label': 'display', 'lat': 40.76393200000000...",27,,US,New York,NY,United States,"[131 W 55th St (btw 6th & 7th Ave), New York, ...",,4aee1ed1f964a52048d221e3
3,6 1/2 Avenue,Pedestrian Plaza,6 1/2 Ave,btwn W 51st & W 57th Sts,40.763458,-73.9801,"[{'label': 'display', 'lat': 40.76345848310408...",52,10019.0,US,New York,NY,United States,"[6 1/2 Ave (btwn W 51st & W 57th Sts), New Yor...",,516eb849e4b042332cd7a5b3
4,Black Tap,Burger Joint,136 W 55th St,,40.763548,-73.979763,"[{'label': 'display', 'lat': 40.76354829292842...",25,10019.0,US,New York,NY,United States,"[136 W 55th St, New York, NY 10019, United Sta...",,580b8fa5d67c230d46aa5551


#### We can access the place by using this

In [20]:
dataframe_filtered.id[0]

'4b53367af964a5208e9227e3'

# Exploring the Venues data

In [21]:
venue_id = dataframe_filtered.id[0]
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url

'https://api.foursquare.com/v2/venues/4b53367af964a5208e9227e3?client_id=JQJQW40C5FYJHG3OXKKT0KCKYWPVQM4GXLJ453Z5YHA5ZISH&client_secret=BOONQ5BMUHZ03WYNFU2REQ115M0OGNA2XAH3BFENFTNN4SRU&v=20180604'

In [22]:
#Getting the result
result = requests.get(url).json()
print(result['response']['venue'].keys())

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'url', 'likes', 'dislike', 'ok', 'rating', 'ratingColor', 'ratingSignals', 'allowMenuUrlEdit', 'beenHere', 'specials', 'photos', 'venuePage', 'reasons', 'description', 'page', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'popular', 'seasonalHours', 'pageUpdates', 'inbox', 'attributes', 'bestPhoto', 'colors'])


In [25]:
rating = result['response']['venue']['rating']
print(rating)

9.0


In [None]:
dataframe_filtered.id[4]

In [24]:
ratingList = []

# Extracting all the ratings values of the venues 

In [26]:
for ind in dataframe_filtered.index:
    venue_id = dataframe_filtered.id[ind]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    #rating = result['response']['venue']['rating']
    try:
        rating = result['response']['venue']['rating']
    except:
        rating = 11.0
    ratingList.append(rating)
    
print(ratingList)

[9.0, 11.0, 5.2, 11.0, 7.6, 7.3, 8.1, 7.5, 8.3, 5.9, 8.9, 11.0, 6.6, 11.0, 8.5, 8.8, 11.0, 8.9, 11.0, 11.0, 11.0, 8.1, 11.0, 6.2, 11.0, 11.0, 11.0, 11.0, 11.0, 11.0]


In [33]:
#Rating dataframe
dfRating = pd.DataFrame(ratingList, columns = ['Rating'])
#Dataframe filtered receives the dfRating
dataframe_filtered['Rating'] = dfRating
dataframe_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id,Rating
0,New York City Center,Theater,131 W 55th St,btwn 6th & 7th Ave,40.763927,-73.979503,"[{'label': 'display', 'lat': 40.76392698771681...",21,10019.0,US,New York,NY,United States,"[131 W 55th St (btwn 6th & 7th Ave), New York,...",,4b53367af964a5208e9227e3,9.0
1,CitySpire,Building,156 W 56th St,,40.763312,-73.979472,"[{'label': 'display', 'lat': 40.76331174213796...",51,10019.0,US,New York,NY,United States,"[156 W 56th St, New York, NY 10019, United Sta...",,4b8edbebf964a520473b33e3,11.0
2,New York City Center Stage I,Theater,131 W 55th St,btw 6th & 7th Ave,40.763932,-73.979399,"[{'label': 'display', 'lat': 40.76393200000000...",27,,US,New York,NY,United States,"[131 W 55th St (btw 6th & 7th Ave), New York, ...",,4aee1ed1f964a52048d221e3,5.2
3,6 1/2 Avenue,Pedestrian Plaza,6 1/2 Ave,btwn W 51st & W 57th Sts,40.763458,-73.9801,"[{'label': 'display', 'lat': 40.76345848310408...",52,10019.0,US,New York,NY,United States,"[6 1/2 Ave (btwn W 51st & W 57th Sts), New Yor...",,516eb849e4b042332cd7a5b3,11.0
4,Black Tap,Burger Joint,136 W 55th St,,40.763548,-73.979763,"[{'label': 'display', 'lat': 40.76354829292842...",25,10019.0,US,New York,NY,United States,"[136 W 55th St, New York, NY 10019, United Sta...",,580b8fa5d67c230d46aa5551,7.6


##### Places no rated yet was set as rate = 11.0. 
- I am doing this because a no rated place cannot be classified as a bad experience, and for this purpose, they are not relevant... 
- Also, it will be removed in the next line of code where we filter the date based on the minimum rating

# Filtering the data based on the minimu rating value

In [35]:
dfLowRated = dataframe_filtered[dataframe_filtered.Rating < minimumRating]
dfLowRated.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id,Rating
2,New York City Center Stage I,Theater,131 W 55th St,btw 6th & 7th Ave,40.763932,-73.979399,"[{'label': 'display', 'lat': 40.76393200000000...",27,,US,New York,NY,United States,"[131 W 55th St (btw 6th & 7th Ave), New York, ...",,4aee1ed1f964a52048d221e3,5.2
9,Hertz,Rental Car Location,"126 West 55th Street,",,40.7635,-73.979538,"[{'label': 'routing', 'lat': 40.763324, 'lng':...",29,10019.0,US,New York,NY,United States,"[126 West 55th Street,, New York, NY 10019, Un...",,4bf6bfdeb1a7a5933ea5d65b,5.9


# Create a map of the specified city:
- city center is the blue spot;
- places to be avoided are highlighted in red

In [47]:
definedMap = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=5,
    color='blue',
    popup='Center',
    fill = True,
    fill_color = 'blue',
    fill_opacity = 0.6
).add_to(definedMap)

# add the othre points as blue circle markers
for lat, lng, name in zip(dfLowRated.lat, dfLowRated.lng, dfLowRated.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=10,
        color='red',
        popup=name,
        fill = True,
        fill_color='red',
        fill_opacity=0.2
    ).add_to(definedMap)

# display map
definedMap

## Results and Discussion <a name="results"></a>

My analysis can be flexible and customizable by the user, once a new address can be set, as well as the minimum rating value that will be considered by the analysis. If the user considers a place with rating = 7.5 as a bad rating, he can set the minimum rating value as 7.5 and all places with rating under 7.5 will be highlighted in the map as red, meaning that the user must avoid that place.

When extracting the foursquare data, we are considering a limit of 30 records, once there is a limitation in the API consumption.

Also an radius area = 500 is being considered when extracting the venues results.

The result of red circles in the map will vary depending on the address inserted by the user and also based on the minimum rating inserted. The blue spot in the graph will always be the address entered by the user.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify the areas close to center with low rating, so the user could get away of a bad experience. All kind of places is being considered. By extracting the venues from the Foursquare data, we were able to access their ratings and with that we could select only the venues with low rating in order to avoid a bad experience. 

With all the low rated places in hand, we were able to highlight in the map, with red circles, which were the places to be avoid.

The sample was done with the New York city center, but can be replaced to any other place the user wants, the only thing needed it to define a new address in the beginning of the code.

With the map and visual information, the user can now avoid a bad experience.