# Capstone Project - The Battle of Neighborhoods (Week 2)

### Description of the problem

Bubble tea, which is a popular drink in Asia, attracts more and more customers in European market in recent years. Many Bubble tea brands come to open their stores in France, especially in Paris, which is one of the top touristic destination and receive the visitors around the world each year.

One well-known bubble tea brand wants to enter this potential market in Paris and looks for a good location to set up their shop. The criteria for the decision are:
1. Not far from the center of Paris
2. Not far from famous attractions
3. Having less existing bubble tea shops

We will analyze which neighborhood is the best place for him to open his store.

The audience of this project could be the investor or bubble tea shops owner or anyone who wants to research about this market.

### Description of the data

First of all, we will search for some popular attractions in Paris and list them down. Then, we will use Foursquare API to extract all the Bubble tea shops in Paris. After that, we will cluster those bubble tea shops according to their GPS coordinates (latitudes and longitudes) by making the attractions as their centroids. At the end, we could suggestion which cluster has the less existing shops and near which attraction.

### Methodology

In [62]:
#!pip install geopandas
#!pip install geopy

In [63]:
import pandas as pd
from pandas.io.json import json_normalize

from  geopy.geocoders import Nominatim

import folium

import requests

from sklearn.cluster import KMeans
import numpy as np 

import matplotlib.cm as cm
import matplotlib.colors as colors

We list some famous landmarks in Paris

In [64]:
List = {'Attractions':['Tour Eiffel','Louvre museum','Arc de Triomphe','Sacre-Coeur','Le Jardin du Luxembourg','Notre-Dame de Paris']}

In [65]:
df = pd.DataFrame.from_dict(List)
df

Unnamed: 0,Attractions
0,Tour Eiffel
1,Louvre museum
2,Arc de Triomphe
3,Sacre-Coeur
4,Le Jardin du Luxembourg
5,Notre-Dame de Paris


Using geocoder, we generate the coordonates (latitudes and longitudes) of those attractions.

In [66]:
lst_lat = []
lst_lon = []
for row in df['Attractions']:
    locator = Nominatim(user_agent="myGeocoder")
    location = locator.geocode(row)
    lst_lat.append(location.latitude)
    lst_lon.append(location.longitude)

In [67]:
df['Latitude']=lst_lat
df['Longitude']=lst_lon

In [68]:
df

Unnamed: 0,Attractions,Latitude,Longitude
0,Tour Eiffel,48.85826,2.294499
1,Louvre museum,48.861147,2.338028
2,Arc de Triomphe,48.873779,2.295037
3,Sacre-Coeur,48.886806,2.343015
4,Le Jardin du Luxembourg,48.846723,2.336413
5,Notre-Dame de Paris,48.852937,2.35005


The coordonates of the center of Paris is

In [69]:
paris_center=[48.864716,2.349014]

We illustrate those attractions in map

In [70]:
map_paris = folium.Map(location=paris_center, zoom_start=13)
for lat_att, lng_att, label_att in zip(df['Latitude'], df['Longitude'], df['Attractions']):
    folium.Marker([lat_att, lng_att], popup=label_att).add_to(map_paris)
map_paris

Then we will find the center point between those attractions in order to extract the bubble tea shops nearby.

In [71]:
import math

x = 0.0
y = 0.0
z = 0.0

for i, coord in df.iterrows():
    latitude = math.radians(coord.Latitude)
    longitude = math.radians(coord.Longitude)

    x += math.cos(latitude) * math.cos(longitude)
    y += math.cos(latitude) * math.sin(longitude)
    z += math.sin(latitude)

total = len(df)

x = x / total
y = y / total
z = z / total

central_longitude = math.atan2(y, x)
central_square_root = math.sqrt(x * x + y * y)
central_latitude = math.atan2(z, central_square_root)

mean_location = {
    'latitude': math.degrees(central_latitude),
    'longitude': math.degrees(central_longitude)
    }

mean_location

{'latitude': 48.863277588502456, 'longitude': 2.3261744551090393}

In [72]:
lat_mean_loc = mean_location.get('latitude')
lng_mean_loc = mean_location.get('longitude')

After getting the center point, we start to use Foursquare API to list down all the bubble shops in range.

In [73]:
client_id = 'P0W2UCI31FPMCPIKMJSM4QV3ZS3OEEBMOEUFY2J0U0F1ESYE' # your Foursquare ID
client_secret = 'HSWPYXPZRZMRD1UAKI30WMXHTBSNJHX055X3Y5ZAROPBQR3W' # your Foursquare Secret
version = '20200711' # Foursquare API version

In [74]:
categoryId = '52e81612bcbc57f1066b7a0c' # bubble tea shop
radius = 5000
LIMIT = 100

In [75]:
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
    client_id, client_secret, lat_mean_loc, lng_mean_loc, version, categoryId, radius, LIMIT)
            
#make the GET request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f1307150af06e78b9cf5e08'},
 'response': {'venues': [{'id': '5841c5b38ee56078594b3924',
    'name': 'Momen’Tea',
    'location': {'address': "5 avenue de l'Opéra",
     'lat': 48.86494065438193,
     'lng': 2.334791630066141,
     'labeledLatLngs': [{'label': 'display',
       'lat': 48.86494065438193,
       'lng': 2.334791630066141}],
     'distance': 657,
     'postalCode': '75001',
     'cc': 'FR',
     'neighborhood': 'Palais-Royal',
     'city': 'Paris',
     'state': 'Île-de-France',
     'country': 'France',
     'formattedAddress': ["5 avenue de l'Opéra", '75001 Paris', 'France']},
    'categories': [{'id': '52e81612bcbc57f1066b7a0c',
      'name': 'Bubble Tea Shop',
      'pluralName': 'Bubble Tea Shops',
      'shortName': 'Bubble Tea',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bubble_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1595082552',
    'hasPerk': False},
   {'id': '5a8c11

In [76]:
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.neighborhood,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,5841c5b38ee56078594b3924,Momen’Tea,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,5 avenue de l'Opéra,48.864941,2.334792,"[{'label': 'display', 'lat': 48.86494065438193...",657,75001.0,FR,Palais-Royal,Paris,Île-de-France,France,"[5 avenue de l'Opéra, 75001 Paris, France]",
1,5a8c1166e1f22874f4fc9714,Laïzé 來座 (Laïzé),"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,19 rue de Montmorency,48.863077,2.355073,"[{'label': 'display', 'lat': 48.86307722773456...",2116,75003.0,FR,Rambuteau,Paris,Île-de-France,France,"[19 rue de Montmorency, 75003 Paris, France]",
2,5b0add16419a9e002cafd160,The Alley,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,55 rue des Petits Champs,48.867163,2.334782,"[{'label': 'display', 'lat': 48.86716306970955...",764,75001.0,FR,,Paris,Île-de-France,France,"[55 rue des Petits Champs, 75001 Paris, France]",
3,5ec408840ac2020008c89ab0,Meng Salon de Thé,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,35 bis rue du Roi de Sicile,48.856643,2.356479,"[{'label': 'display', 'lat': 48.8566430396361,...",2339,75004.0,FR,,Paris,Île-de-France,France,"[35 bis rue du Roi de Sicile, 75004 Paris, Fra...",
4,5bfaa273c03635002c8d12e2,Tree Tea,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,8 rue Buffault,48.875071,2.341695,"[{'label': 'display', 'lat': 48.875071, 'lng':...",1736,75009.0,FR,,Paris,Île-de-France,France,"[8 rue Buffault, 75009 Paris, France]",
5,5ddfd5ecb298290008f18bb3,The Alley,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,13 Boulevard Haussmann,48.872562,2.335503,"[{'label': 'display', 'lat': 48.872562, 'lng':...",1238,75009.0,FR,,Paris,Île-de-France,France,"[13 Boulevard Haussmann, 75009 Paris, France]",
6,5ca0e90e088158002c0b9e34,Thé Point,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,,48.859543,2.377089,"[{'label': 'display', 'lat': 48.859543, 'lng':...",3751,75011.0,FR,,Paris,Île-de-France,France,"[75011 Paris, France]",
7,5ad1ed5d3149b90df0a3ab0c,Chatime,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,19 quai des Grands Augustins,48.85406,2.343399,"[{'label': 'display', 'lat': 48.85406, 'lng': ...",1626,75006.0,FR,,Paris,Île-de-France,France,"[19 quai des Grands Augustins, 75006 Paris, Fr...",
8,5a60f10e0c9f317921ca8c84,Chatime,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,89 rue Beaubourg,48.864214,2.354923,"[{'label': 'display', 'lat': 48.86421365202117...",2107,75004.0,FR,,Paris,Île-de-France,France,"[89 rue Beaubourg, 75004 Paris, France]",
9,5e0a0f5f26886900085fdfeb,CoCo Fresh Tea & Juice,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595082552,False,6 rue Dante,48.851275,2.346439,"[{'label': 'display', 'lat': 48.85127481320342...",1997,75005.0,FR,,Paris,Île-de-France,France,"[6 rue Dante, 75005 Paris, France]",


In [77]:
# filter columns
filtered_columns = ['name', 'location.lat', 'location.lng']
bubble_tea =nearby_venues[filtered_columns]
bubble_tea.shape

(47, 3)

In [78]:
bubble_tea.head()

Unnamed: 0,name,location.lat,location.lng
0,Momen’Tea,48.864941,2.334792
1,Laïzé 來座 (Laïzé),48.863077,2.355073
2,The Alley,48.867163,2.334782
3,Meng Salon de Thé,48.856643,2.356479
4,Tree Tea,48.875071,2.341695


We got in total 44 bubble tea shops. Let's illustrate in the map below.

In [79]:
map_paris = folium.Map(location=paris_center, zoom_start=13)
for lat_att, lng_att, label_att in zip(df['Latitude'], df['Longitude'], df['Attractions']):
    folium.Marker([lat_att, lng_att], popup=label_att).add_to(map_paris)
for lat_r, lon_r in zip (bubble_tea['location.lat'],bubble_tea['location.lng']):
    folium.Circle([lat_r,lon_r],radius=30, color = 'blue', fill=True).add_to(map_paris)

map_paris

The next step is to cluster those bubble tea shops according to their coordonates (latitudes and longitudes) by setting the attractions as their centroids.

In [80]:
X=bubble_tea.loc[:,['location.lat','location.lng']]
X.head()

Unnamed: 0,location.lat,location.lng
0,48.864941,2.334792
1,48.863077,2.355073
2,48.867163,2.334782
3,48.856643,2.356479
4,48.875071,2.341695


In [81]:
id_n=6
centroids = df[['Latitude','Longitude']].to_numpy()
kmeans = KMeans(n_clusters=id_n, init = centroids, max_iter =1 ).fit(X)
id_label=kmeans.labels_
id_label

  This is separate from the ipykernel package so we can avoid doing imports until


array([1, 5, 1, 5, 3, 1, 5, 4, 5, 4, 5, 5, 2, 5, 1, 5, 5, 5, 1, 5, 1, 5,
       5, 1, 3, 1, 3, 2, 5, 5, 4, 5, 1, 2, 5, 3, 3, 5, 5, 1, 1, 1, 1, 0,
       1, 4, 5])

In [82]:
bubble_tea['Cluster']=id_label
bubble_tea.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,name,location.lat,location.lng,Cluster
0,Momen’Tea,48.864941,2.334792,1
1,Laïzé 來座 (Laïzé),48.863077,2.355073,5
2,The Alley,48.867163,2.334782,1
3,Meng Salon de Thé,48.856643,2.356479,5
4,Tree Tea,48.875071,2.341695,3


In [83]:
bubble_tea['Cluster'].value_counts()

5    20
1    14
3     5
4     4
2     3
0     1
Name: Cluster, dtype: int64

We got the result of our clusters. Let's view in map.

In [84]:
# create map
map_clusters = folium.Map(location=paris_center, zoom_start=13)

# set color scheme for the clusters
x = np.arange(id_n)
ys = [i + x + (i*x)**2 for i in range(id_n)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bubble_tea['location.lat'], bubble_tea['location.lng'], bubble_tea['name'], bubble_tea['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color='white',
        fill_opacity=1.5).add_to(map_clusters)

for lat_att, lng_att, label_att in zip(df['Latitude'], df['Longitude'], df['Attractions']):
    folium.Marker([lat_att, lng_att], popup=label_att).add_to(map_clusters)       
map_clusters

### Result

There are 20 bubble tea shops nearby Notre-Dame de Paris (Cluster 5, Orange color).
<br>
There are 14 bubble tea shops nearby Louvre Museum (Cluster 1, Purple color).
<br>
There are 5 bubble tea shops nearby from Sacre-Coeur (cluster 3, Light Blue color).
<br>
There are 4 bubble tea shops nearby Le Jardin du Luxembourg (cluster 4, Green color).
<br>
There are 3 bubble tea shops nearby Tour Eiffel and Arc de Triomphe (cluster 3, Blue color).
<br>
There are only 1 bubble tea shops far away from any attractions (cluster 2, Red color).

### Discussion

According the result above, we could say that there are many bubble tea shops existing in the center of Paris, especially in the locations nearby Notre-Dame de Paris and Louvre Museum. However, there are few bubble tea shop around Arc de Triomphe and Tour Eiffel.

### Conclusion

In conclusion, the investor should look for the location around __Arc de Triomphe or Tour Eiffel__. They are both the most famous monuments in Paris and the top destinations in Paris for tourists. In addition, there are still not many bubble tea shop opened yet, so this location should be considered.