# Battle of restaurants between Downtown Toronto and Manhattan

##### T. Rashid
Capstone Project



IBM Data Science Professional Certificate

## 1. Introduction

A great restaurant area must have diverse selection of food. Both Manhattan and Toronto have always been food lover’s cities. Both cities are known for their diversity in food and culture. This project is to analyze the restaurants of the Downtown Toronto and Manhattan by classifying them into seven main categories. The project is to explore the restaurants to find the most common cuisine to the least common cuisine in each of the cities using K-means Clustering and displaying them on the geographical map.

This research can be helpful for somebody who wants to open a restaurant in either of the cities. It can help the person to decide what kind of restaurants are more common in big cities. This research doesn’t solely is going to be the deciding factor but it can give the future owner of the restaurant an idea of how the restaurants related to different cuisines are clustered together in the neighborhoods of two food capital.


## 2. Data Preparation

We are going to leverage the data from Foursquare's API, of different restaurants that includes the location and category of the cuisine. We will also use some of the data from the CSV files that helps us with the name of the neighborhoods along with their longitudes and latitudes. 

We are going to leverage the ‘Venue Category’ available in the Foursquare’s database. Since, there are so many cuisines, we are going to narrow them down to seven main categories i.e. American, Latino, Euro, Asian, Casual, Middle Eastern and Other.  We will analyze the data using Segmentation and K-means Clustering and visualize it on a geographical map to get better idea how the restaurants are being clustered in the neighborhoods. 

There has been some assumptions made in terms of the data related to restaurants. For example, Indian, Afghani, Japanese and Chinese cuisines have been assigned to 'Asian' category as the countries associated with the cuisines do fall under 'Asian' continent, however, the cuisines are totally different from each other. Also, pizza is considered Italian and Tacos would have fallen under 'Latin' food category but they are put under 'Casual' category.

The analysis will be as good as the data provided. Hence, if some restaurants are not available in the Foursquare API, they won't be included in the analysis.


#### Toronto Data

Download all the dependencies

In [1]:
#!pip install beautifulsoup4
#!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import json

#!conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [2]:
#conda update -n base -c defaults conda

#### Code to scrape the following Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M which consists of 3 columns Postal Code, Borough and Neighbourhood

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup=BeautifulSoup(source.text,'lxml')
tab = str(soup.table)
display_html(tab,raw=True)

Postal Code,Community,Neighbourhood
M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
M8A,Not assigned,Not assigned
M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
M1B,Scarborough,"Malvern, Rouge"


HTML table converted to Dataframe

In [4]:
df = pd.read_html(tab)
df1=df[0]
df1.head()

Unnamed: 0,Postal Code,Community,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### Only process the cells that have an assigned borough. Ignore cells with a borough that is 'Not assigned'. Combining the neighbourhoods with same Postal code. Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough

In [5]:
# Deleting rows where Borough is 'Not assigned'
df2 = df1[df1.Community != 'Not assigned']

# Combining the neighbourhoods with same Postal code
df3 = df2.groupby(['Postal Code','Community'], sort=False).agg(', '.join)
df3.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
df3['Neighbourhood'] = np.where(df3['Neighbourhood'] == 'Not assigned',df3['Community'], df3['Neighbourhood'])

df3.head()

Unnamed: 0,Postal Code,Community,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Download CSV file that has the geographical coordinates of each postal code

In [6]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Merging Borough and Neighbourhood on Postal Code to find Latitude and Longitude

In [7]:
df4 = pd.merge(df3,lat_lon,on='Postal Code')
df4.head()

Unnamed: 0,Postal Code,Community,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


#### Use geopy library to get the latitude and longitude values of Toronto

In [8]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [9]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df4['Latitude'], df4['Longitude'], df4['Community'], df4['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Pulling all the rows from the data frame which contains 'Toronto' in their Borough.

In [10]:
df5 = df4[df4['Community'].str.contains('Toronto',regex=False)]
df5.reset_index(drop=True, inplace=True)
df5
#df5=df4
#df5.head(50)

Unnamed: 0,Postal Code,Community,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


#### Using Folium as Visualization Library to visualize above data

In [11]:
# create map of Toronto using latitude and longitude values
map_tor_borough = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(df5['Latitude'], df5['Longitude'], df5['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor_borough)  
    
map_tor_borough

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = 'P1U1IXAMY3VXBUF1PWZHLYHVW23FCG1DMDLPNZOQLVZHGTPI' # your Foursquare ID
CLIENT_SECRET = 'AKZERJCTMQKIDUT4ONFFSMBLWBVYTAVX2CWUV4OSW2TIB4XK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: P1U1IXAMY3VXBUF1PWZHLYHVW23FCG1DMDLPNZOQLVZHGTPI
CLIENT_SECRET:AKZERJCTMQKIDUT4ONFFSMBLWBVYTAVX2CWUV4OSW2TIB4XK


Get the first neighborhood's name.

In [13]:
df5.loc[0, 'Neighbourhood']

'Regent Park, Harbourfront'

Get the neighborhood's latitude and longitude values.

In [14]:
neighborhood_latitude = df5.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df5.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df5.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


#### Now, let's get the top 100 venues within a radius of 500 meters.

In [15]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    #neighborhood_latitude, 
    #neighborhood_longitude, 
    latitude, 
    longitude,
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=P1U1IXAMY3VXBUF1PWZHLYHVW23FCG1DMDLPNZOQLVZHGTPI&client_secret=AKZERJCTMQKIDUT4ONFFSMBLWBVYTAVX2CWUV4OSW2TIB4XK&v=20180605&ll=43.6534817,-79.3839347&radius=500&limit=100'

Send the GET request and examine the resutls

In [16]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f5d907a790f6c4720d07f93'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 91,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng'

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


Now we are ready to clean the json and structure it into a pandas dataframe.

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.id']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng,id
0,Downtown Toronto,Neighborhood,43.653232,-79.385296,5227bb01498e17bf485e6202
1,Nathan Phillips Square,Plaza,43.65227,-79.383516,4ad4c05ef964a520a6f620e3
2,Japango,Sushi Restaurant,43.655268,-79.385165,4ae7b27df964a52068ad21e3
3,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198,537773d1498e74a75bb75c1e
4,Poke Guys,Poke Place,43.654895,-79.385052,57bcd3b7498e652a678d0378


Let's create a function to repeat the same process to all the neighborhoods

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            v['venue']['id'],
            
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                             'id',
                  'Venue Category'
                   ]
    
    return(nearby_venues)

Run the above function on each neighborhood and create a new dataframe 

In [20]:
toronto_venues = getNearbyVenues(names=df5['Neighbourhood'],
                                   latitudes=df5['Latitude'],
                                   longitudes=df5['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

#### Filter the food and drink related places only using some keywords e.g. restaurants, coffee, taco, pizza, bar etc.

In [21]:
toronto_venues_cat = toronto_venues[toronto_venues['Venue Category'].str.contains('taco|pizza|Restaurant|sandwich|steakhouse|salad|Burger|breakfast|bistro|BBQ|Hot Dog|Fried Chicken',regex=True)]
toronto_venues_cat.reset_index(drop=True, inplace=True)
toronto_venues_cat

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,id,Venue Category
0,"Regent Park, Harbourfront",43.654260,-79.360636,Impact Kitchen,43.656369,-79.356980,5612b1cc498e3dd742af0dc8,Restaurant
1,"Regent Park, Harbourfront",43.654260,-79.360636,El Catrin,43.650601,-79.358920,51ddecee498e1ffd34185d2f,Mexican Restaurant
2,"Regent Park, Harbourfront",43.654260,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,53a22c92498ec91fda7ce133,French Restaurant
3,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Nando's,43.661728,-79.386391,52d884c5498ecf5c7cafe5ab,Portuguese Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Mercatto,43.660391,-79.387664,4a8355bff964a520d3fa1fe3,Italian Restaurant
...,...,...,...,...,...,...,...,...
399,Church and Wellesley,43.665860,-79.383160,Kokoni Izakaya,43.664181,-79.380258,4c531b60a724e21e029e3af4,Japanese Restaurant
400,Church and Wellesley,43.665860,-79.383160,Asahi Sushi,43.669874,-79.382943,4af0b965f964a52094de21e3,Sushi Restaurant
401,Church and Wellesley,43.665860,-79.383160,A&W,43.666415,-79.378235,590d735f6eda0206a58dbfd5,Fast Food Restaurant
402,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,The Green Wood,43.664728,-79.324117,58659c5703e29a1f502e034c,Restaurant


In [22]:
toronto_venues_cat['Venue Category'].unique()

array(['Restaurant', 'Mexican Restaurant', 'French Restaurant',
       'Portuguese Restaurant', 'Italian Restaurant', 'Sushi Restaurant',
       'Fried Chicken Joint', 'Chinese Restaurant', 'Thai Restaurant',
       'Burger Joint', 'Ramen Restaurant', 'New American Restaurant',
       'Japanese Restaurant', 'Fast Food Restaurant',
       'Modern European Restaurant', 'Seafood Restaurant',
       'Middle Eastern Restaurant', 'Ethiopian Restaurant',
       'American Restaurant', 'BBQ Joint', 'Latin American Restaurant',
       'Vegetarian / Vegan Restaurant', 'German Restaurant',
       'Comfort Food Restaurant', 'Asian Restaurant',
       'Moroccan Restaurant', 'Belgian Restaurant', 'Greek Restaurant',
       'Eastern European Restaurant', 'Falafel Restaurant',
       'Indian Restaurant', 'Korean Restaurant', 'Colombian Restaurant',
       'Mediterranean Restaurant', 'Brazilian Restaurant',
       'Gluten-free Restaurant', 'Vietnamese Restaurant',
       'Cuban Restaurant', 'Malay Resta

Group the categories of restaurants to perform analysis

In [49]:
# Group the types of restaurants into cusisines so that the analysis generate better results

euro = ['French Restaurant','Swiss Restaurant', 'Czech Restaurant','Austrian Restaurant',  'Belgian Restaurant','German Restaurant',
        'Eastern European Restaurant','Scandinavian Restaurant', 'Souvlaki Shop', 'Molecular Gastronomy Restaurant', 
        'Modern European Restaurant','Italian Restaurant', 'Portuguese Restaurant',  'Greek Restaurant']

middle_eastern = ['Persian Restaurant','Israeli Restaurant', 'Kosher Restaurant','Jewish Restaurant',
                    'Lebanese Restaurant',  'Falafel Restaurant','Moroccan Restaurant',
                    'Mediterranean Restaurant','Kebab Restaurant', 'Turkish Restaurant', 'Middle Eastern Restaurant']


latin = ['Mexican Restaurant','Venezuelan Restaurant','Argentinian Restaurant', 'Arepa Restaurant', 'Empanada Restaurant','South American Restaurant',
          'Paella Restaurant', 'Peruvian Restaurant','Tapas Restaurant', 'Spanish Restaurant','Caribbean Restaurant','Cuban Restaurant',
          'Latin American Restaurant', 'Brazilian Restaurant','Colombian Restaurant',  'Tex-Mex Restaurant']

asian = ['Ramen Restaurant','Soba Restaurant','Japanese Restaurant','Szechuan Restaurant','Himalayan Restaurant', 'Tibetan Restaurant',
         'South Indian Restaurant',   'North Indian Restaurant','Cantonese Restaurant', 'Shanghai Restaurant','Hotpot Restaurant',
         'Asian Restaurant','Malay Restaurant',  'Afghan Restaurant','Sushi Restaurant', 'Vietnamese Restaurant','Thai Restaurant', 
         'Poke Place', 'Sri Lankan Restaurant','Indian Restaurant', 
         'Japanese Curry Restaurant', 'Japanese Restaurant', 'Dumpling Restaurant',
         'Indonesian Restaurant', 'Udon Restaurant','Taiwanese Restaurant','Korean Restaurant', 'Noodle House',
         'Falafel Restaurant', 'Filipino Restaurant', 'Dim Sum Restaurant','Chinese Restaurant',
         'Yoshoku Restaurant']

casual = [ 'Doner Restaurant','Sandwich Place', 'Food Truck',
          'Frozen Yogurt Shop', 'Deli / Bodega', 'Dessert Shop',
          'Hot Dog Joint', 'Burger Joint', 'Breakfast Spot', 'Fondue Restaurant', 'Fast Food Restaurant','Pizza Place', 'Taco Place','Fried Chicken Joint']

american = ['Southern / Soul Food Restaurant','Theme Restaurant','Comfort Food Restaurant',  'Food & Drink Shop', 
            'Restaurant', 'American Restaurant', 'BBQ Joint', 'Theme Restaurant', 'New American Restaurant',
            'Vegetarian / Vegan Restaurant', 'Restaurant','Gluten-free Restaurant','Hawaiian Restaurant','Seafood Restaurant', 'Cajun / Creole Restaurant']

other = ['African Restaurant','Australian Restaurant', 'Ethiopian Restaurant', 'Russian Restaurant']

def conditions(m):
    if m['Venue Category'] in euro:
        return 'euro'
    if m['Venue Category'] in middle_eastern:
        return 'middle_eastern'
    if m['Venue Category'] in latino:
        return 'latino'
    if m['Venue Category'] in asian:
        return 'asian'
    if m['Venue Category'] in casual:
        return 'casual'
    if m['Venue Category'] in american:
        return 'american'
    if m['Venue Category'] in other:
        return 'other'



toronto_venues_cat['categories_class']=toronto_venues_cat.apply(conditions, axis=1)
toronto_venues_cat

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,id,Venue Category,categories_class
0,"Regent Park, Harbourfront",43.654260,-79.360636,Impact Kitchen,43.656369,-79.356980,5612b1cc498e3dd742af0dc8,Restaurant,american
1,"Regent Park, Harbourfront",43.654260,-79.360636,El Catrin,43.650601,-79.358920,51ddecee498e1ffd34185d2f,Mexican Restaurant,latino
2,"Regent Park, Harbourfront",43.654260,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,53a22c92498ec91fda7ce133,French Restaurant,euro
3,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Nando's,43.661728,-79.386391,52d884c5498ecf5c7cafe5ab,Portuguese Restaurant,euro
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Mercatto,43.660391,-79.387664,4a8355bff964a520d3fa1fe3,Italian Restaurant,euro
...,...,...,...,...,...,...,...,...,...
399,Church and Wellesley,43.665860,-79.383160,Kokoni Izakaya,43.664181,-79.380258,4c531b60a724e21e029e3af4,Japanese Restaurant,asian
400,Church and Wellesley,43.665860,-79.383160,Asahi Sushi,43.669874,-79.382943,4af0b965f964a52094de21e3,Sushi Restaurant,asian
401,Church and Wellesley,43.665860,-79.383160,A&W,43.666415,-79.378235,590d735f6eda0206a58dbfd5,Fast Food Restaurant,casual
402,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,The Green Wood,43.664728,-79.324117,58659c5703e29a1f502e034c,Restaurant,american


#### Ensuring that all the restaurants and diners have been assigned to a pre-defined category

In [50]:
toronto_venues_cat['categories_class'].unique()

array(['american', 'latino', 'euro', 'asian', 'casual', 'middle_eastern',
       'other'], dtype=object)

#### New York Data

Download New York data

In [25]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Next, let's load the data.

In [26]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

#### Tranform the data into a *pandas* dataframe
Notice how all the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [27]:
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)


for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [28]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [29]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [30]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's get the geographical coordinates of Manhattan.

In [31]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


Let's visualize neighborhoods of Manhattan.

In [32]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Now, let's get the top 100 venues that are in Manhattan within a radius of 500 meters.

In [33]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
#

#
radius = 500 # define radius
#

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=P1U1IXAMY3VXBUF1PWZHLYHVW23FCG1DMDLPNZOQLVZHGTPI&client_secret=AKZERJCTMQKIDUT4ONFFSMBLWBVYTAVX2CWUV4OSW2TIB4XK&v=20180605&ll=40.7896239,-73.9598939&radius=500&limit=100'

Send the GET request and examine the results

In [34]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f5d90c361e1700286a00643'},
 'response': {'headerLocation': 'Central Park',
  'headerFullLocation': 'Central Park, New York',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 30,
  'suggestedBounds': {'ne': {'lat': 40.794123904500005,
    'lng': -73.95396136384342},
   'sw': {'lat': 40.7851238955, 'lng': -73.96582643615658}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4a78425df964a52053e51fe3',
       'name': 'Central Park Tennis Center',
       'location': {'address': 'Central Park West at 96th St',
        'lat': 40.78931319964619,
        'lng': -73.96186241658044,
        'labeledLatLngs': [{'label': 'display',
          'lat': 40.78931319964619,
          'lng': -73.96186241658044}],
        'distance': 169,


In [35]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [36]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.id']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng,id
0,Central Park Tennis Center,Tennis Court,40.789313,-73.961862,4a78425df964a52053e51fe3
1,North Meadow,Park,40.792027,-73.959853,4a5a4eb2f964a52021ba1fe3
2,East Meadow,Field,40.79016,-73.955498,4ba233dbf964a5206fe337e3
3,Central Park - North Meadow Recreation Center,Playground,40.790939,-73.960304,4bc27efd461576b047917d32
4,Central Park - Woodman's Gate,Park,40.787786,-73.955924,4c841c2ed8086dcb246f8652


Let's create a function for all the neighborhoods in Manhattan

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['id'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                             'id',
                  'Venue Category']   
    
       
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues.

In [38]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

manhattan_venues.head()

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,id,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,4b4429abf964a52037f225e3,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,4baf59e8f964a520a6f93be3,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,4b79cc46f964a520c5122fe3,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,55f81cd2498ee903149fcc64,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,4b5357adf964a520319827e3,Donut Shop


Let's check how many venues were returned for each neighborhood

In [39]:
manhattan_venues_cat = manhattan_venues[manhattan_venues['Venue Category'].str.contains('taco|pizza|Restaurant|sandwich|steakhouse|salad|Burger|breakfast|bistro|BBQ|Hot Dog|Fried Chicken',regex=True)]
manhattan_venues_cat.reset_index(drop=True, inplace=True)
manhattan_venues_cat

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,id,Venue Category
0,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,4b9c9c6af964a520b27236e3,Seafood Restaurant
1,Chinatown,40.715618,-73.994279,Spicy Village,40.717010,-73.993530,4db3374590a0843f295fb69b,Chinese Restaurant
2,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,5521c2ff498ebe2368634187,Greek Restaurant
3,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,4a96bf8ff964a520ce2620e3,Chinese Restaurant
4,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,5894c9a15e56b417cf79e553,Chinese Restaurant
...,...,...,...,...,...,...,...,...
943,Hudson Yards,40.756658,-74.000111,Spanish Diner,40.752394,-74.001491,5c98f037db1d81002ca94beb,Spanish Restaurant
944,Hudson Yards,40.756658,-74.000111,Il Punto Ristorante,40.756079,-73.994594,4abad7b8f964a520208320e3,Italian Restaurant
945,Hudson Yards,40.756658,-74.000111,Thai Select,40.754867,-73.995007,49bc064ff964a5200a541fe3,Thai Restaurant
946,Hudson Yards,40.756658,-74.000111,Treadwell,40.759964,-73.996284,5bb17b9531ac6c0039f150cf,Restaurant


In [40]:
manhattan_venues_cat['Venue Category'].unique()

array(['Seafood Restaurant', 'Chinese Restaurant', 'Greek Restaurant',
       'American Restaurant', 'New American Restaurant',
       'Hotpot Restaurant', 'Spanish Restaurant', 'Asian Restaurant',
       'Thai Restaurant', 'Malay Restaurant', 'Italian Restaurant',
       'Vietnamese Restaurant', 'Mexican Restaurant',
       'Taiwanese Restaurant', 'Dim Sum Restaurant',
       'Shanghai Restaurant', 'Austrian Restaurant',
       'Vegetarian / Vegan Restaurant', 'Dumpling Restaurant',
       'Cantonese Restaurant', 'Restaurant', 'Ramen Restaurant',
       'Burger Joint', 'Tapas Restaurant', 'Indian Restaurant',
       'Latin American Restaurant', 'Caribbean Restaurant',
       'Sushi Restaurant', 'Arepa Restaurant', 'Empanada Restaurant',
       'Fast Food Restaurant', 'Japanese Restaurant',
       'Mediterranean Restaurant', 'BBQ Joint',
       'Japanese Curry Restaurant', 'Falafel Restaurant',
       'Cuban Restaurant', 'French Restaurant', 'African Restaurant',
       'Ethiopian Rest

Group the categories of restaurants to perform analysis

In [51]:
# Group the types of restaurants into cusisines so that the analysis generate better results

euro = ['French Restaurant','Swiss Restaurant', 'Czech Restaurant','Austrian Restaurant',  'Belgian Restaurant','German Restaurant',
        'Eastern European Restaurant','Scandinavian Restaurant', 'Souvlaki Shop', 'Molecular Gastronomy Restaurant', 
        'Modern European Restaurant','Italian Restaurant', 'Portuguese Restaurant',  'Greek Restaurant']

middle_eastern = ['Persian Restaurant','Israeli Restaurant', 'Kosher Restaurant','Jewish Restaurant',
                    'Lebanese Restaurant',  'Falafel Restaurant','Moroccan Restaurant',
                    'Mediterranean Restaurant','Kebab Restaurant', 'Turkish Restaurant', 'Middle Eastern Restaurant']


latin = ['Mexican Restaurant','Venezuelan Restaurant','Argentinian Restaurant', 'Arepa Restaurant', 'Empanada Restaurant','South American Restaurant',
          'Paella Restaurant', 'Peruvian Restaurant','Tapas Restaurant', 'Spanish Restaurant','Caribbean Restaurant','Cuban Restaurant',
          'Latin American Restaurant', 'Brazilian Restaurant','Colombian Restaurant',  'Tex-Mex Restaurant']

asian = ['Ramen Restaurant','Soba Restaurant','Japanese Restaurant','Szechuan Restaurant','Himalayan Restaurant', 'Tibetan Restaurant','South Indian Restaurant',
         'North Indian Restaurant','Cantonese Restaurant', 'Shanghai Restaurant','Hotpot Restaurant','Asian Restaurant','Malay Restaurant',  
         'Afghan Restaurant','Sushi Restaurant', 'Vietnamese Restaurant',
         'Thai Restaurant', 'Poke Place', 'Sri Lankan Restaurant','Indian Restaurant', 
         'Japanese Curry Restaurant', 'Japanese Restaurant', 'Dumpling Restaurant',
         'Indonesian Restaurant', 'Udon Restaurant','Taiwanese Restaurant','Korean Restaurant', 'Noodle House',
         'Falafel Restaurant', 'Filipino Restaurant', 'Dim Sum Restaurant','Chinese Restaurant',
         'Yoshoku Restaurant']

casual = [ 'Doner Restaurant','Sandwich Place', 'Food Truck',
          'Frozen Yogurt Shop', 'Deli / Bodega', 'Dessert Shop',
          'Hot Dog Joint', 'Burger Joint', 'Breakfast Spot', 'Fondue Restaurant', 'Fast Food Restaurant','Pizza Place', 'Taco Place','Fried Chicken Joint']

american = ['Southern / Soul Food Restaurant','Theme Restaurant','Comfort Food Restaurant',  'Food & Drink Shop', 
            'Restaurant', 'American Restaurant', 'BBQ Joint', 'Theme Restaurant', 'New American Restaurant',
            'Vegetarian / Vegan Restaurant', 'Restaurant','Gluten-free Restaurant','Hawaiian Restaurant','Seafood Restaurant', 'Cajun / Creole Restaurant']

other = ['African Restaurant','Australian Restaurant', 'Ethiopian Restaurant', 'Russian Restaurant']

def conditions(m):
    if m['Venue Category'] in euro:
        return 'euro'
    if m['Venue Category'] in middle_eastern:
        return 'middle_eastern'
    if m['Venue Category'] in latino:
        return 'latino'
    if m['Venue Category'] in asian:
        return 'asian'
    if m['Venue Category'] in casual:
        return 'casual'
    if m['Venue Category'] in american:
        return 'american'
    if m['Venue Category'] in other:
        return 'other'


manhattan_venues_cat['categories_class']=manhattan_venues_cat.apply(conditions, axis=1)
manhattan_venues_cat

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,id,Venue Category,categories_class
0,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,4b9c9c6af964a520b27236e3,Seafood Restaurant,american
1,Chinatown,40.715618,-73.994279,Spicy Village,40.717010,-73.993530,4db3374590a0843f295fb69b,Chinese Restaurant,asian
2,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,5521c2ff498ebe2368634187,Greek Restaurant,euro
3,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,4a96bf8ff964a520ce2620e3,Chinese Restaurant,asian
4,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,5894c9a15e56b417cf79e553,Chinese Restaurant,asian
...,...,...,...,...,...,...,...,...,...
943,Hudson Yards,40.756658,-74.000111,Spanish Diner,40.752394,-74.001491,5c98f037db1d81002ca94beb,Spanish Restaurant,latino
944,Hudson Yards,40.756658,-74.000111,Il Punto Ristorante,40.756079,-73.994594,4abad7b8f964a520208320e3,Italian Restaurant,euro
945,Hudson Yards,40.756658,-74.000111,Thai Select,40.754867,-73.995007,49bc064ff964a5200a541fe3,Thai Restaurant,asian
946,Hudson Yards,40.756658,-74.000111,Treadwell,40.759964,-73.996284,5bb17b9531ac6c0039f150cf,Restaurant,american


#### Ensuring that all the restaurants and diners have been assigned to a pre-defined category

In [52]:
manhattan_venues_cat['categories_class'].unique()

array(['american', 'asian', 'euro', 'latino', 'casual', 'middle_eastern',
       'other'], dtype=object)

In [48]:
#v1 = manhattan_venues_cat[manhattan_venues_cat['categories_class'].isnull()]
#v1['Venue Category'].unique()

array([], dtype=object)