# The Battle of Neighborhoods
___

## Introduction/Business Problem

#### Where to Go When You Want to Eat Pizza ?
Let’s say a tourist want to have only pizza while he is there. So he wants to go to a place with a high density of Pizza places around him. The problem we aim to solve is to analyze the Pizza stores’ locations in the major US cities and find the best place for our tourist so that he can have a good pizza.

### Data

I will use the FourSquare API to collect data about locations of Pizza stores in 5 major US cities which are: New York,NY, San Francisco, CA, Jersey City, NJ, Boston, MA and Chicago,IL. These are one of the most populated US cities and they will contain the best Pizza places in the US.

### Method

First step is to asses which city would have the highest Pizza store density. Four Square API will be used and "near" query will get us the venues in the cities. Also, I use the CategoryID to set it to show only Pizza Places. Also, Foursquare limits us to maximum of 100 venues per query.

This request is repeated for the 5 studied cities. Then get their top 100 venues, save them and plot them on the map for visual inspection.

Next, to get an indicator of the density of Pizza Places, calculate a centroid of the venues to get the mean longitude and latitude values. Then calculate the mean of the Euclidean distance from each venue to the mean coordinates.

In [2]:
!conda install -c conda-forge folium --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    folium-0.10.1              |             py_0          59 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    branca:          0.3.1-py_0        conda-forge
    folium:          

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [4]:
CLIENT_ID = 'JE4OPACXYLSMK1U2DLOWQNMD5FEDU1JXONGA2F3RWHOJXESE' # Foursquare ID
CLIENT_SECRET = 'WHUVRZX2LRDVLXB51R0G0PZ20BQWKUELY4HGLR0NMGZC02PB' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [5]:
# type your answer here
LIMIT = 500 # Maximum is 100
cities = ["New York, NY", 'Chicago, IL', 'San Francisco, CA', 'Jersey City, NJ', 'Boston, MA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d1ca941735") # PIZZA PLACE CATEGORY ID
    results[city] = requests.get(url).json()

In [6]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

### The Foursquare API Only gives us the nearest 100 venues.

Let's first check out their densities

In [7]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of pizza places in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

Total number of pizza places in New York, NY =  301
Showing Top 100
Total number of pizza places in Chicago, IL =  228
Showing Top 100
Total number of pizza places in San Francisco, CA =  170
Showing Top 100
Total number of pizza places in Jersey City, NJ =  127
Showing Top 100
Total number of pizza places in Boston, MA =  187
Showing Top 100


In [8]:
maps[cities[0]]   # New York, NY

In [9]:
maps[cities[1]]   # Chicago, IL

In [10]:
maps[cities[2]]   #  San Francisco, CA

In [11]:
maps[cities[3]]     #  Jersey City, NJ

In [12]:
maps[cities[4]]    #   Boston, MA

#### We can see that New York and Jersey City are the most dense cities with Pizza places. And better than that, they are just one shore away.
#### However, Let's have a concrete measure of this density.
#### For this let's use some basic statistics. I will get the mean location of the pizza places which should be near to most of them if they are really dense or far if not.
#### Next take the average of the distance of the venues to the mean coordinates.

In [13]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

New York, NY
Mean Distance from Mean coordinates
0.021835814404059482
Chicago, IL
Mean Distance from Mean coordinates
0.058164853901147166
San Francisco, CA
Mean Distance from Mean coordinates
0.02880622957662293
Jersey City, NJ
Mean Distance from Mean coordinates
0.02873806186883841
Boston, MA
Mean Distance from Mean coordinates
0.03455742252354211


In [14]:
maps[cities[0]]  # New York

In [15]:
maps[cities[1]]    # Chicago

In [16]:
maps[cities[2]]

In [17]:
maps[cities[3]]  # Jersey City

In [18]:
maps[cities[4]]  # Boston, MA

We now see that New York is his best option. 
Next best place is Jersey City which is just on the other side of the shore. Our tourist's best interest would be to book a hotel near that mean coordinate to surround himself with the 100 Pizza stores there!!
Another observation is that there is one really far away Pizza store which would possible increase its score to be beaten by New York So let's try to remove it and calculate it again

In [19]:
city = 'Jersey City, NJ'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))# Ignore the biggest distance

Jersey City, NJ
Mean Distance from Mean coordinates
0.02100140074824543


That puts Jersey City back in the first place which makes our tourist happy.