## The Battle of the Neighborhood

### Problem Background

##### Since 2000, China has been New York’s leading growth market for exports. The New York Metropolitan Region is home to more than half of the 32 largest Chinese companies with offices in the United States. These companies represent a broad array of industries including shipping, steel, energy and manufacturing firms, and services. Many have chosen to open headquarters in New York in anticipation of eventual listing on the respective New York stock exchanges and entering U.S. capital markets. New York City currently boasts seven Chinese daily newspapers, two Chinese language television stations, and the largest Chinese neighborhood in the United States. New York area airports provide 12 daily flights to Hong Kong and five to Beijing, the most flights out of the eastern half of the United States. 

### Problem Description

##### Hence the competitiveness drives business owners to think strategically before opening a store. In this case, a Chinese business owner wishes to open a store that sells high-end products that are particularly catered to the Chinese immigrants. However, most of the Chinatown in New York City is populated with middle-class or lower-class immigrants. To ensure the products will sell, it is crucial to select a location that will give the best ROI.



## Solution Methodology

In [None]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analysis
import numpy as np # library to handle data in a vectorized manner

# library for displaying images
from IPython.display import Image

# transforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print ('Folium installed')
print ('Libraries imported')


Solving environment: done

# All requested packages already installed.

Solving environment: / 

In [None]:
address = 'New York City'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

In [None]:
search_query = '"Chinese"'
radius = 10000
LIMIT=100


In [None]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
print(url)

In [None]:
# Get the json output for the search query "Chinese"
result = requests.get(url).json()  
result

In [None]:
# assigning relevant part of JSON to venues
venues = result['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

In [None]:
# Obtaining only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

In [None]:
dataframe_filtered.describe()


In [None]:
dataframe_cleaned = dataframe_filtered[dataframe_filtered['address'].notnull()]  # get rid records with address "Not available" 
dataframe_TOR=dataframe_cleaned[dataframe_cleaned.state == 'ON']   # get rid of "non ON" states
df_withpostcode=dataframe_TOR[dataframe_TOR['postalCode'].notnull()]  # get rid records with no Postcode
df_withpostcode

In [None]:
# define the dataframe columns
column_names = ['postalcode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
df_postcode = pd.DataFrame(columns=column_names)
df_postcode

In [None]:
#Filling data for each row
df_postcode = df_withpostcode[['postalCode','lat','lng']]
df_postcode.head()

In [None]:
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))


In [None]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postalCode in zip(df_postcode['lat'], df_postcode['lng'], df_postcode['postalCode']):
    label = '{}'.format(postalCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_Newyork

In [None]:
#Filling data for each row
df_postcode = df_withpostcode[['postalCode','lat','lng']]
df_postcode.head()

In [None]:
df_postcode['postalCode']=df_postcode['postalCode'].str[:3]
df_postcode.head()

In [None]:
df_postcode=df_postcode.groupby('postalCode').count() 
df_postcode