#### Assignment: Capstone Project - The Battle of Neighborhoods
### Week 5 Report
**_Opening the first place which makes our Coworking Space - VietNam_**
- Build a dataframe of neighborhoods in Hanoi, Ho Chi Minh City, Danang - VietNam
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a Coworking Space
***
### 1. Import libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [4]:
CLIENT_ID = 'UGTYYYNFMKBLPPM1XYWHXZSRCICUMHURHKSF0UMBD2DHUQBS' # your Foursquare ID
CLIENT_SECRET = 'E20F4Z1VOQD001KZYG5GA3F24SUJUWFABON3EWAT52UC2DFO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)

Your credentails:
CLIENT_ID: UGTYYYNFMKBLPPM1XYWHXZSRCICUMHURHKSF0UMBD2DHUQBS


In [5]:
# type your answer here
LIMIT = 500 # Maximum is 100
cities = ['Hanoi, VN', 'Ho Chi Minh City, Vietnam', 'Danang, VN']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d174941735") # Coworking Space CATEGORY ID
    results[city] = requests.get(url).json()

In [6]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

The Foursquare API Only gives us the nearest 100 venues in the city.

Let's first check out their densities by our eyes

In [7]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of Coworking Space in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

Total number of Coworking Space in Hanoi, VN =  20
Showing Top 100
Total number of Coworking Space in Ho Chi Minh City, Vietnam =  43
Showing Top 100
Total number of Coworking Space in Danang, VN =  7
Showing Top 100


In [8]:
maps[cities[0]]

In [9]:
maps[cities[1]]

In [10]:
maps[cities[2]]

We can see that Hanoi and Hochiminh are the most dense cities with Coworking Space. 

However, Let's have a concrete measure of this density. 

For this I will use some basic statistics. I will get the mean location of the Coworking Space which should be near to most of them if they are really dense or far if not. 

Next I will take the average of the distance of the venues to the mean coordinates. 

In [11]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Hanoi, VN
Mean Distance from Mean coordinates
0.030603459874529875
Ho Chi Minh City, Vietnam
Mean Distance from Mean coordinates
0.023799319058783974
Danang, VN
Mean Distance from Mean coordinates
0.012595609731376937


In [12]:
maps[cities[0]]

In [13]:
maps[cities[1]]

In [14]:
maps[cities[2]]

### We now see that Hanoi, VN is his best option. 

Another observation is that there is one really far away Coworking Space which would possible increase its score to be beaten by Hanoi, VN. 
So let's try to remove it and calculate it again 

In [15]:
city = 'Hanoi, VN'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))# Ignore the biggest distance

Hanoi, VN
Mean Distance from Mean coordinates
0.028866831785587955


That puts Hanoi, VN back in the first place which makes our Coworking Space.

And By that I conclude my Notebook