# The battle of Neighborhoods: Toronto vs New York

## 1. Introduction/Business Problem
In this project, I will do the comparison between Toronto and New York.
Although there exist many use cases related to data science, I will limit the case in this project only for business. The Foursquare API facilitates us to give a lot of information in one area. We can use this information for business projects.

First of all, we will define the audience. In this project, we will help people who wants to open a shop/cafe.
We can use the data to confirm if the decision is correct. There are some questions that can help me to do the analysis. These questions are as follows:
1. Will a cafe be popular in both cities? Or is it better to open it in Toronto? Or New York?
2. In which neighborhood do we need to open the cafe?
3. Will it work to open a second cafe if a neighborhood already has one?
4. Or maybe it is better to open another type of shop?
5. Are the cafes located near another venues?

## 2. Data
To help doing the analysis, we will need to obtain some data.
From the previous notebooks and projects, we can have the neighborhood data in Toronto and New York.
Using the foursquare API, we can extract the favorite venues in each neighborhood for a certain radius.
We can check whether a neighborhood has already a cafe in it.
We can check if a cafe works well, or maybe a restaurant works better.
For this, I think we need to do a comparison of how many cafes available in both cities.
And the distribution of venues in a neighborhood.

In [1]:
# import libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

import json

from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

# import k-means
from sklearn.cluster import KMeans

import folium

print('All library is imported!')

All library is imported!


In [2]:
# Scrapping Toronto neighborhoods from website and save to dataframe
html_source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(html_source, 'lxml')

table = soup.find('table')

tablelist = []
for element in table.find_all('tr'):
    tablelist.append(element.text.split('\n')[1:4])

toronto_df = pd.DataFrame(tablelist)
toronto_df.columns = toronto_df.iloc[0].values
toronto_df = toronto_df[1:]
toronto_df.rename(columns={'Neighbourhood':'Neighborhood'}, inplace=True)

# drop not assigned borough
toronto_df.drop(toronto_df[toronto_df['Borough'] == 'Not assigned'].index, inplace=True)

# merge neighborhoods with same borough
toronto_df = toronto_df.groupby(['Postcode','Borough'])['Neighborhood'].apply(','.join).reset_index()

# insert geospatial_data from csv file
geospatial_data = pd.read_csv('Geospatial_data.csv')
geospatial_data.rename(columns={'Postal Code':'Postcode'}, inplace=True)

# and merge to dataframe
toronto_df = pd.merge(toronto_df, geospatial_data, on='Postcode')

# drop Postcode column
toronto_df.drop('Postcode', axis=1, inplace=True)

toronto_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,Scarborough,Woburn,43.770992,-79.216917
4,Scarborough,Cedarbrae,43.773136,-79.239476


In [3]:
# Getting data of newyork neighborhoods from csv
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data = newyork_data['features']

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']

newyork_df = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough']
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]    
    newyork_df = newyork_df.append({'Borough': borough,
                       'Neighborhood': neighborhood_name,
                       'Latitude': neighborhood_lat,
                       'Longitude': neighborhood_lon}, ignore_index=True)
newyork_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


# Part 2: The Battle of Neighborhoods

In [4]:
# map of toronto
toronto_addr = 'Toronto, ON'

to_locator = Nominatim(user_agent="to_explorer")
location= to_locator.geocode(toronto_addr)
latitude = location.latitude
longitude = location.longitude

map_toronto = folium.Map(location=[latitude, longitude], zoom_axis = 12)

# add the neighborhoods
for lat,lng,borough,name in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(name, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 

map_toronto

In [5]:
# map of toronto
newyork_addr = 'Newyork, NY'

ny_locator = Nominatim(user_agent="ny_explorer")
location= ny_locator.geocode(newyork_addr)
latitude = location.latitude
longitude = location.longitude

map_newyork = folium.Map(location=[latitude, longitude], zoom_axis = 12)

# add the neighborhoods
for lat,lng,borough,name in zip(newyork_df['Latitude'], newyork_df['Longitude'], newyork_df['Borough'], newyork_df['Neighborhood']):
    label = '{}, {}'.format(name, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork) 

map_newyork