# The Battle of Neighborhoods 

#### Introduction

Toronto and New York are both very large and busy cities. Both cities are very diverse and are the financial capitals of their respective countries. A tourist company wants to know what is the best place to visit for Asians tourists. Also these tourists would rather walk than taking taxis, which means the locations should be close to each other. They ended up with two locations, Toronto and New York. The final decision will be based on number of Asians restaurants around the city.

#### Data
The data that will be used in this​ experiment will be collected from multiple sources. For New York city we are getting the data from New York University Spatial Data Repository, where the data is formatted in a JSON file, can ready to be used. In the other hand, we’re getting the data for Toronto from Wikipedia, and we will need to clean out and prepare the data to be used.  Finally, we will utilize Foursquare APIs, to get insights about people’s activity in a specific area

#### Methodology
First we will start gather the data from the sources mentioned above. We will start first with New York, and later we will collect Tornto dataset

#### New York City
New york city data will come ready from the New York University

In [20]:
# Importing libraries
import numpy as np
import pandas as pd
import json

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes b
import folium

print('Libraries imported.')

Libraries imported.


In [46]:
# Downloading the data
import requests
response = requests.get("https://cocl.us/new_york_dataset").text
download = open("newyork_data.json", "w")
download.write(response)
download.close()
print('Data downloaded!')

Data downloaded!


In [47]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
newyork_neighborhoods_data = newyork_data['features']

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

newyork_neighborhoods = pd.DataFrame(columns=column_names)
for data in newyork_neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    newyork_neighborhoods = newyork_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
newyork_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [48]:
address = 'New York, US'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinate of New York is {}, {}.'.format(latitude, longitude))

The coordinate of New York is 40.7308619, -73.9871558.


Using Folium, we visualize the data we gathered so far around New York

In [49]:
# create map of New York using latitude and longitude values
map_nyc = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(newyork_neighborhoods['Latitude'], newyork_neighborhoods['Longitude'], newyork_neighborhoods['Borough'], newyork_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)  
    
map_nyc

Now that we've gathered the required data, we will get the insights of the neighborhood using Foursquare APIs

We define a method that will use Foursquare APIs to get the top 100 venues around a point (using lat,lng)

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            25)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
    
    return(nearby_venues)

To use Foursquare APIs, we define the required client_id, client_secret and version

In [28]:
CLIENT_ID = '1YQNCKG5DNFNL11GXKOY1S1SO0OGAY1BNGOZCPMXOIX2G3YP' # your Foursquare ID
CLIENT_SECRET = '1G1AZRCCJYL4THB0QMP3FYKXXVH1PWGGY4PXGHK50LXVHWCD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

We get the venues around the neigborhood

In [29]:
newyork_venues = getNearbyVenues(names=newyork_neighborhoods['Neighborhood'],
                                   latitudes=newyork_neighborhoods['Latitude'],
                                   longitudes=newyork_neighborhoods['Longitude']
                                  )

Found 865 venues in 306 neighborhoods.


In [30]:
newyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Fieldston,40.895437,-73.905643,nicksemlerSPA,40.894942,-73.905475,Spa
1,Kingsbridge,40.881687,-73.902818,Garden Gourmet Market,40.88135,-73.903389,Gourmet Shop
2,Kingsbridge,40.881687,-73.902818,MyUnique,40.881966,-73.903584,Thrift / Vintage Store
3,Kingsbridge,40.881687,-73.902818,Dollar Tree,40.881715,-73.903187,Discount Store
4,Kingsbridge,40.881687,-73.902818,Sleepy's Riverdale,40.88158,-73.903277,Mattress Store


In [10]:
newyork_venues.shape

(865, 7)

In [31]:
newyork_resturants = newyork_venues[newyork_venues['Venue Category'].str.contains('Thai|Japanese|Sushi|Chinese|Indian|Vietnamese|Asian|Middle Eastern|Korean')]
newyork_resturants_grouped = newyork_resturants.groupby(['Venue Category']).count()
newyork_resturants_grouped = newyork_resturants_grouped.sort_values('Neighborhood', ascending=False)
newyork_resturants_grouped

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chinese Restaurant,23,23,23,23,23,23
Japanese Restaurant,8,8,8,8,8,8
Sushi Restaurant,7,7,7,7,7,7
Thai Restaurant,7,7,7,7,7,7
Indian Restaurant,6,6,6,6,6,6
Korean Restaurant,6,6,6,6,6,6
Vietnamese Restaurant,4,4,4,4,4,4
Asian Restaurant,2,2,2,2,2,2
Middle Eastern Restaurant,1,1,1,1,1,1


In [32]:
newyork_resturants_grouped['Neighborhood'].sum()

64

So the number of Asian restaurants in New York is 64

#### Toronto
Data for toronto will come from wikipedia, and as this is an html page it will need some scraping

In [33]:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [34]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)

In [35]:
soup = BeautifulSoup(response.text, 'html.parser')
canada_df = pd.DataFrame(columns=['PostalCode', 'Borough', 'Neighborhood'])
canada_df.head()

table = soup.find("table", {"class": "wikitable sortable"})
idx = 0
for row in  table.findAll("tr")[1:]:
    currentRow=row.findAll('td')
    neighborhoodLower = currentRow[2].text.rstrip("\n\r").lower()
    canada_df.loc[idx] = [currentRow[0].text,
                          currentRow[1].text.rstrip("\n\r"),
                          currentRow[2].text.rstrip("\n\r") if neighborhoodLower != 'not assigned' else currentRow[1].text.rstrip("\n\r")]
    idx = idx + 1
f  = { 'Borough': 'first', 'Neighborhood': lambda x: ', '.join(x)}
f2 = ['Borough']
canada_df = canada_df[canada_df['Borough']!= 'Not assigned']
groupdf = canada_df.groupby(['PostalCode']).agg(f).reset_index()
groupdf.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [36]:
dataframe = pd.read_csv('http://cocl.us/Geospatial_data')
dataframe = dataframe.rename(columns={'Postal Code':'PostalCode'})
dataframe.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


For this experiment our interest on Tornto, we filter the data based on that city

In [37]:
newdf = pd.merge(groupdf, dataframe, on='PostalCode', how='outer')
canada_locs = newdf[newdf['Borough'].str.contains('Toronto')].reset_index(drop=True)
canada_locs = canada_locs[['Borough','Neighborhood','Latitude','Longitude']]
canada_locs.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,East Toronto,The Beaches,43.676357,-79.293031
1,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,East Toronto,Studio District,43.659526,-79.340923
4,Central Toronto,Lawrence Park,43.72802,-79.38879


Now we try to get the location of Toronto using Nomination.geolocator

In [38]:
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinate of Toronto is {}, {}.'.format(latitude, longitude))

The coordinate of Toronto is 43.653963, -79.387207.


Using Folium, we visualize the data we gathered so far around Toronto

In [39]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(canada_locs['Latitude'], canada_locs['Longitude'], canada_locs['Borough'], canada_locs['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now that we've gathered the required data, we will get the insights of the neighborhood using Foursquare APIs

We get the venues around the neigborhood

In [40]:
toronto_venues = getNearbyVenues(names=canada_locs['Neighborhood'],
                                   latitudes=canada_locs['Latitude'],
                                   longitudes=canada_locs['Longitude']
                                  )

Found 101 venues in 38 neighborhoods.


In [41]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,Davisville,43.704324,-79.38879,Thobors Boulangerie Patisserie Café,43.704514,-79.388616,Café
2,Davisville,43.704324,-79.38879,Jules Cafe Patisserie,43.704138,-79.388413,Dessert Shop
3,Davisville,43.704324,-79.38879,XO Gelato,43.705177,-79.388793,Dessert Shop
4,Davisville,43.704324,-79.38879,Zee Grill,43.704985,-79.388476,Seafood Restaurant


In [42]:
toronto_venues.shape

(101, 7)

In [43]:
toronto_resturants = toronto_venues[toronto_venues['Venue Category'].str.contains('Thai|Japanese|Sushi|Chinese|Indian|Vietnamese|Asian|Middle Eastern|Korean')]
toronto_resturants_grouped = toronto_resturants.groupby(['Venue Category']).count()
toronto_resturants_grouped = toronto_resturants_grouped.sort_values('Neighborhood', ascending=False)
toronto_resturants_grouped

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Japanese Restaurant,4,4,4,4,4,4
Sushi Restaurant,3,3,3,3,3,3
Asian Restaurant,1,1,1,1,1,1
Thai Restaurant,1,1,1,1,1,1


In [44]:
toronto_resturants_grouped['Neighborhood'].sum()

9

So compared to the data analyzed for New York (64 restaurants), we found out that New York would be a better place to visit