# The Battle of Neighborhoods (Week 1)

## ➤ Introduction / Business Problem

Lets say, some investor is looking to open a new restaurant in Toronto, Canada. Where should he open it? What type of restaurant should it be? Why?

## ➤ Data description (with data samples examples)

Will use:

- Dataframe of the neighborhoods in Toronto (from wikipedia)

<strong> This data is needed for getting neibourhoods and their geographical locations</strong>

- Foursquare location data

<strong> This data is needed for analysing and clustering venues in neibourhoods</strong>

In [3]:
# example of Dataframe of the neighborhoods in Toronto (from wikipedia) 

import bs4 as bs
import urllib.request

import pandas as pd
import numpy as np

source = urllib.request.urlopen('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').read()
soup = bs.BeautifulSoup(source,'lxml')

table = soup.find('table')
table_rows = table.find_all('tr')

list_rows = []

for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    list_rows.append(row)
#   print(row)
    

df = pd.DataFrame(list_rows, columns = ['PostalCode','Borough','Neighborhood'])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,,,
1,M1A,Not assigned,Not assigned\n
2,M2A,Not assigned,Not assigned\n
3,M3A,North York,Parkwoods\n
4,M4A,North York,Victoria Village\n


In [4]:
# example of Foursquare location data

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

#print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [7]:
CLIENT_ID = '21EFHPTJZVPQQ4I5DOI0OVK144XB5HVY11HBKITGP1QLRYZV' # Foursquare ID
CLIENT_SECRET = 'GZ5ECR3ECO3WU4DI0L0M0ASQBFIBNP0JNVCO55V2DGHAYZML' # Foursquare Secret
VERSION = '20191228' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 800 # defining radius
neighborhood_latitude = 43.653963  # neighborhood latitude value
neighborhood_longitude = -79.387207 # neighborhood longitude value

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json() #getting venues data from Foursquare


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
 

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Cafe Plenty,Café,43.654571,-79.38945
2,Japango,Sushi Restaurant,43.655268,-79.385165
3,Rolltation,Japanese Restaurant,43.654918,-79.387424
4,Sansotei Ramen 三草亭,Ramen Restaurant,43.655157,-79.386501
