# Capstone Project - Chinese Restaurant in Vancouver
#### by Cheng Ximen
---

### Table of Content

1. [Introduction](#1._Intoduction)
1. [Data](#2._data)
1. [Analysis](#3._Analysis)
1. [Conclusion](#4._Conclusion)
---


In [252]:
%%HTML
<button onclick="$('.input, .prompt, .output_stderr, .output_error, .output_result').toggle();">Hide Code</button>

### 1. Introduction

**Background**: Vancouver is a big city and numerous with people. It is also one of the city with largest Chinese immigration population. Even though food places in Canada are famous for its diversity, there're still abundant market for new immigrant to bring in local chinese cuisine. To start a resturant place is a difficult task and requires some analysis of the places first. The main obstacle is predicting if a new chinese resturant will be popular or not. To do so, one needs to do some analysis of neighbourhoods and the type of food places there.

**Problem**: This report we will try to find optimal location to open up new Chinese restaurant business in Vancouver. Since there are lots of restaurants in Vancouver we will try to find :

1. Locations that is less crowded with restaurants.
1. Locations with no Chinese restaurants in vicinity.
1. Locations with high Chinese residence population in neighborhood.

We will use our data science tools and techniques to generate a few most promising neighborhoods locations based on above criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

**Target audience**: People looking to open a new Chinese restaurant in the city of Vancouver will be very much interested in this analysis as it will help them pin point specific locations where they have promising future of a Chinese restaurant.

---

### 2. Data

1. We can get Vancouver Census Local Area Profiles data at this [link](https://opendata.vancouver.ca/explore/dataset/census-local-area-profiles-2016/information/). This dataset contains numerous profile features data for each neighborhood in Vancouver. We won't be needing all of them, but mainly those that will be related to our decision making process. 
1. We'll also get Vancouver Local Aea Boundary data at this [link](https://opendata.vancouver.ca/explore/dataset/local-area-boundary/information/). This dataset contains the geo location data of the neighborhoods, which will be usefull in mapping through folium.
1. We will fetch the restaurants data in a neighbourhood using Foursquare API. 

In [221]:
# load libraries
import pandas as pd
import numpy as np
import requests
import folium
from folium import plugins
pd.set_option('display.max_rows', None)

In [72]:
# load dataset
local_df = pd.read_csv('local-area-boundary.csv', sep=';')
census_df = pd.read_csv('CensusLocalAreaProfiles2016.csv',encoding='latin-1',header=4).dropna()
# data format
local_df['Latitude'] = local_df.geo_point_2d.str.split(',',expand = True)[0]
local_df['Longtitude'] = local_df.geo_point_2d.str.split(',',expand = True)[1]
local_df.drop(columns='geo_point_2d',inplace=True)

In [83]:
# load restaurant data
CLIENT_ID = '0MYKUDORN0NBL5D2G3EOPMNRLRWUKPFPWVIQO4FHQ1JO4ODA' # your Foursquare ID
CLIENT_SECRET = 'KNWJXISFFMRQW0V0F5SVJES2CFJKBQN1KJSWKN5IIPT2ZGBT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

LIMIT = 500
radius = 1500

def getNearbyVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Load Venues Data

In [89]:
venues_df = getNearbyVenues(names=local_df['Name'],
                                   latitudes=local_df['Latitude'],
                                   longitudes=local_df['Longtitude']
                                  )

Dunbar-Southlands
Kerrisdale
Killarney
Kitsilano
South Cambie
Victoria-Fraserview
Kensington-Cedar Cottage
Mount Pleasant
Oakridge
Renfrew-Collingwood
Sunset
West Point Grey
Arbutus-Ridge
Downtown
Fairview
Grandview-Woodland
Hastings-Sunrise
Marpole
Riley Park
Shaughnessy
Strathcona
West End


### 3. Analysis

Since we are trying to open a Chinese restraunt, I think it make sense to check first the Chinese population density of all those neighborhoods. Even though Chinese cuisine is popular now among people with various background, but I guess the largest portion of customers still come from Chinese immigrants. The way I used to find Chinese population is by finding out the population that uses Chinese as their mother tongue.

In [144]:
mother_tongue_ss = census_df.loc[222,:]
Chinese_ss = census_df.loc[456,:]
Chinese_df = pd.concat([mother_tongue_ss, Chinese_ss], axis=1)
Chinese_df = Chinese_df.iloc[1:]
#reformat data set
header = Chinese_df.iloc[0].values
header[0] = 'Total'
Chinese_df.columns = header
Chinese_df = Chinese_df.iloc[1:]
Chinese_df = Chinese_df.astype('float')
Chinese_df['portion'] = Chinese_df.iloc[:,1]/Chinese_df.iloc[:,0]
Chinese_df.index = Chinese_df.index.str.strip() 

Now let's plot the data in the map to see what it looks like. I think it's worth to take a look at both the percentage and the absolute value of the chinese population in the neighbor.

In [225]:
geo = r'local-area-boundary.geojson'

latitude = 49.2827
longitude = -123.1207
m = folium.Map(location=[latitude, longitude], zoom_start=11.5)

folium.Choropleth(
    geo_data=geo,
    name='choropleth',
    data=Chinese_df.iloc[0:22,:].reset_index(),
    columns=['index', 'portion'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Chinese Language Percentage'
).add_to(m)

m

In [243]:
geo = r'local-area-boundary.geojson'

latitude = 49.2827
longitude = -123.1207
m = folium.Map(location=[latitude, longitude], zoom_start=11.5)

folium.Choropleth(
    geo_data=geo,
    name='choropleth',
    data=Chinese_df.iloc[0:22,:].reset_index(),
    columns=['index', '          Chinese languages'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Chinese Language Percentage'
).add_to(m)

m

Even though there's a difference in the percentage and absolute value of Chinese population in the neighborhood. It's pretty clear that the Chinese population should relatively low in the central and north part of vancouver. I believe that's the area we can filter out from our initial analysis of the Chinese populaiton density analysis. It's also easy to see which area might have the potentially largest demand for the Chinese Restraunt. Now we got a sense of what the demand look like, let's also take a look at the supply side, because we absolutely want to avoid the area where there's already too high of competition. And that's when venues data we collected from foursquare come into the picture. 

In [246]:
# create chinese restaurant sub dataset.
Chinese_Restaurant = venues_df[venues_df['Venue Category']=='Chinese Restaurant']
Chinese_Restaurant_Arr = Chinese_Restaurant[['Venue Latitude','Venue Longitude']].as_matrix()

In [247]:
geo = r'local-area-boundary.geojson'

latitude = 49.2827
longitude = -123.1207
m = folium.Map(location=[latitude, longitude], zoom_start=11.5)

folium.Choropleth(
    geo_data=geo,
    name='choropleth',
    data=Chinese_df.iloc[0:22,:].reset_index(),
    columns=['index', 'portion'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.5,
    line_opacity=0.2,
    legend_name='Chinese Language Percentage'
).add_to(m)

plugins.HeatMap(Chinese_Restaurant_Arr, radius=15).add_to(m)
m

To be honestly, the Chinese Restraunt we collected from Foursquare is a lot less than I expected. But according to the data at hand, Oakridge, Victoria-Fraserview, Renfrew Collingwood all seem to be great candidate that have high desity of Chinese population but not many Chinese Restraunt in the region. I believe it's also a good idea to look at the density of all restraunt category in vancouver region. 

In [249]:
# pick out restaurant category subset
all_Restaurant = venues_df[venues_df['Venue Category'].str.contains('Restaurant')]
all_Restaurant_Arr = all_Restaurant[['Venue Latitude','Venue Longitude']].as_matrix()

In [250]:
geo = r'local-area-boundary.geojson'

latitude = 49.2827
longitude = -123.1207
m = folium.Map(location=[latitude, longitude], zoom_start=11.5)

folium.Choropleth(
    geo_data=geo,
    name='choropleth',
    data=Chinese_df.iloc[0:22,:].reset_index(),
    columns=['index', 'portion'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.5,
    line_opacity=0.2,
    legend_name='Chinese Language Percentage'
).add_to(m)

plugins.HeatMap(all_Restaurant_Arr, radius=15).add_to(m)
m

It seems the competition are the highest among central and north part of vancouver. but interestingly that's the area with the lowest Chinese population density. I guess it's a good news for the investor whose interested in opening up a new Chinese restaurant, because they can appeal to the customer while avoiding the area where the restaurant competition is the highest.

### 4. Conclusion

Based on the above analysis, I belive it's pretty clear that Oakridge, Victoria-Fraserview seem to be a good place to open up a chinese restaurant, if the investor want to look for a place with some chinese customer base but not too much competition. If the investor is really confident about his restaurant's flavor, I'd suggest him to open in Renfrew Collingwood where the chinese customer base is the largest but the competitoin is also higher comparing to the other two neiborhoods mentioned above.