#  New Indian Restaurant in Paris

### Capstone Project - The Battle of the Neighborhoods ( _Applied Data Science Capstone by IBM/Coursera_ )
#### Tarun Kedia

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will try to find some good options to set up a restaurant in a city. This project specifically targets the city of love, **Paris** and we wish to setup an **Indian Restaurant**.

Paris is one of the central hubs of world tourism and has many great restaurants. So, we are trying to find **locations which are not crowded with restaurants** and narrow down to the areas **having no Indian Restaurants**. Another important preference is to keep the restaurant **close to the iconic Louvre Museum**. This location is preferrable not only because of its importance but also because it is geographically in the centre hub of the city.

## Data <a name="data"></a>

The data collection process and pre-processing of the data will depend upon these important factors from our problem statement:
* Number of restaurants currently present in a given neighborhood
* Number of Indian Restaurants in the neighborhood
* Distance of each Indian Restaurant from each other
* Distance of the neighborhood from the Lourve Museum

For doing so, we are using the regularly spaced grid of locations, centered around the Lourve, to get our neighborhoods. Following data sources will be needed to extract/generate the required information:
1. Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Tom Maps API reverse geocoding**
2. Number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
3. Coordinates of Lourve center will be obtained using **Geolocator**

In [1]:
#  @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='ccd16d9d-837d-47a7-b9e7-a385042eca04', project_access_token='p-792b47c3ccf02a753c03337621f8a869b7ada66c')
pc = project.project_context
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [2]:
import requests 
import pandas as pd 
import numpy as np 
import random 
from bs4 import BeautifulSoup  

# !conda install -c conda-forge geocoder --yes
import geocoder

# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

from IPython.display import Image 
from IPython.core.display import HTML 
    
from pandas.io.json import json_normalize

# !conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [3]:
address = 'Louvre Museum, Paris'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
lat = location.latitude
lon = location.longitude
lourve_loc = [lat,lon]
print('The geograpical coordinate of Lourve Museum are {}, {}.'.format(lat, lon))

The geograpical coordinate of Lourve Museum are 48.8611473, 2.33802768704666.


We convert the coordinate system to 2D Cartesiam sytem to calculate the distances more effectively in metres rather than in degrees.

Also, we define functions to convert between **WGS84 spherical coordinate system** (latitude/longitude degrees) and **UTM Cartesian coordinate system** (X/Y coordinates in  meters).

In [4]:
#  !pip install shapely
import shapely.geometry

#  !pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Lourve\'s longitude={}, latitude={}'.format(lourve_loc[1], lourve_loc[0]))
x, y = lonlat_to_xy(lourve_loc[1], lourve_loc[0])
print('Lourve\'s UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Lourve\'s longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Lourve's longitude=2.33802768704666, latitude=48.8611473
Lourve's UTM X=-427634.674999763, Y=5489808.378198432
Lourve's longitude=2.338027687046659, latitude=48.8611473


Now we create a grid of area candidates, equaly spaced, centered around Lourve and within ~6km from Lourve. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

We create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [5]:
lourve_x, lourve_y = lonlat_to_xy(lourve_loc[1], lourve_loc[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = lourve_x - 6000
x_step = 600
y_min = lourve_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(lourve_x, lourve_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

In [6]:
print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Now that we have our coordinates of the neighborhoods, let's project them on Folium map.

In [7]:
map_paris = folium.Map(location=lourve_loc, zoom_start=13)
folium.Marker(lourve_loc, popup='Paris').add_to(map_paris)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_paris)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_paris

### Tom API

_(https://developer.tomtom.com)_

We now use the tom api to get the following information:
* approximate location addresses of these locations

Once we have these adresses we will save them as our dataset.

Tom API credentials are defined in hidden cell bellow.

In [8]:
#  @hidden_cell
tom_api_key = 'hU3DWWoMpRAnFMHU8HgObuSpVk3a42tk'

In [9]:
def get_address(api_key, latitude, longitude):
    try:
        url = 'https://api.tomtom.com/search/2/reverseGeocode/{}%2C{}.json?key={}'.format(latitude, longitude, api_key)
        response = requests.get(url).json()
#         print('response; ',response)
        results = response['addresses']
#         print('results: ',results)
        loc_address = results[0]['address']
#         print('loc_address: ',loc_address)
        address=loc_address['freeformAddress']
#         print(address)
        return address
    except:
        return None

addr = get_address(tom_api_key, lourve_loc[0], lourve_loc[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(lourve_loc[0], lourve_loc[1], addr))

Reverse geocoding check
-----------------------
Address of [48.8611473, 2.33802768704666] is: Cour Carrée du Louvre, Paris, 75001


In [10]:
print('Obtaining location addresses: Please Wait!')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(tom_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Paris', '') # We don't need country part of address
    addresses.append(address)
#     print(' .', end='')
print(' Done! Proceed ahead!!!')

Obtaining location addresses: Please Wait!
 Done! Proceed ahead!!!


In [11]:
addresses[120:150]

['NO ADDRESS',
 '16 Rue Blaise Desgoffe, 75006',
 '26 Rue Guynemer, 75006',
 '26 Rue Soufflot, 75005',
 '12 Rue des Écoles, 75005',
 '27 Quai Saint-Bernard, 75005',
 'NO ADDRESS',
 '10 Passage du Chantier, 75012',
 'Square Louis Majorelle, 75011',
 '6 Rue Alexandre Dumas, 75011',
 '96 Boulevard de Charonne, 75020',
 '91 Rue des Pyrénées, 75020',
 'NO ADDRESS',
 '28 Villa Molitor, 75016',
 'Voie Georges Pompidou, 75016',
 '10 Rue du Capitaine Ménard, 75015',
 '63 Rue de Lourmel, 75015',
 '4 Rue Frémicourt, 75015',
 'NO ADDRESS',
 '16 Rue Éblé, 75007',
 'Paris, 75007',
 '21B Rue du Cherche-Midi, 75006',
 '7 Rue Lobineau, 75006',
 '17 Rue Saint-Séverin, 75005',
 'NO ADDRESS',
 '14 Rue Charlemagne, 75004',
 '6 Impasse Jean Beausire, 75004',
 '86 Rue de la Roquette, 75011',
 '19 Rue de Belfort, 75011',
 '194 Boulevard de Charonne, 75020']

Creating the dataframe of these addresses and saving it into a csv file

In [12]:
df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"39 Villa Moderne, Arcueil, 94110",48.808309,2.326956,-429434.675,5484093.0,5992.495307
1,"23bis Rue Vaucouleurs, Arcueil, 94110",48.8092,2.334928,-428834.675,5484093.0,5840.3767
2,"Gentilly, 94250",48.81009,2.342902,-428234.675,5484093.0,5747.173218
3,"Le Kremlin-Bicêtre, 94270",48.81098,2.350875,-427634.675,5484093.0,5715.767665
4,"21 Rue de la Convention, Le Kremlin-Bicêtre, 9...",48.811869,2.358849,-427034.675,5484093.0,5747.173218
5,"Rue Paul Andrieux, Ivry-sur-Seine, 94200",48.812758,2.366823,-426434.675,5484093.0,5840.3767
6,"1 Cité Pierre et Marie Curie, Ivry-sur-Seine, ...",48.813646,2.374798,-425834.675,5484093.0,5992.495307
7,"45 Rue Fénelon, Montrouge, 92120",48.811532,2.313828,-430334.675,5484612.0,5855.766389
8,"Square Buffalo, Montrouge, 92120",48.812424,2.321801,-429734.675,5484612.0,5604.462508
9,"Avenue Vladimir Ilitch Lénine, Gentilly, 94250",48.813315,2.329774,-429134.675,5484612.0,5408.326913


In [13]:
df_locations.to_pickle('./locations.pkl')

In [14]:
df_locations.shape

(364, 6)

In [15]:
# Save dataframe as csv file to storage
project.save_data(data=df_locations.to_csv(index=False),file_name='LourveNeighborhoods.csv',overwrite=True)

{'file_name': 'LourveNeighborhoods.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'capstoneprojectneighborhoodbattle-donotdelete-pr-klmrucy4lk8lu9',
 'asset_id': 'e8ec4557-ca29-4982-b7b5-068046f400ca'}

### Foursquare API

We now use the Foursquare API to get the following information:
* number of restaurants in each neighborhood.
* number of Indian Restaurants

we will include only those venues in our list which have **'restaurant' in category name**, and we'll make sure to detect and include all the subcategories of specific 'Indian restaurant' category, as we need info on Indian restaurants in the neighborhood.

Foursquare credentials are defined in hidden cell bellow.

In [16]:
#  @hidden_cell
foursquare_client_id = 'STGHSLDCF0PW1Z15KRRNZSEUGTFW0GMIYCGM1OSP4TMCTYV3' 
foursquare_client_secret = 'RXZP32OCZDU3MJU1DES5GRFTGGM4E42FESGXJD4NRPMZS4OK' 

In [17]:
# Category IDs corresponding to Indian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

indian_restaurant_categories = ['4bf58dd8d48988d10f941735','54135bf5e4b08f3d2429dfe5','54135bf5e4b08f3d2429dff3',
                                '54135bf5e4b08f3d2429dff5','54135bf5e4b08f3d2429dfe2','54135bf5e4b08f3d2429dff2',
                                '54135bf5e4b08f3d2429dfe1','54135bf5e4b08f3d2429dfe3','54135bf5e4b08f3d2429dfe8',
                                '54135bf5e4b08f3d2429dfe9','54135bf5e4b08f3d2429dfe6','54135bf5e4b08f3d2429dfdf',
                                '54135bf5e4b08f3d2429dfe4','54135bf5e4b08f3d2429dfe7','54135bf5e4b08f3d2429dfea',
                                '54135bf5e4b08f3d2429dfeb','54135bf5e4b08f3d2429dfed','54135bf5e4b08f3d2429dfee',
                                '54135bf5e4b08f3d2429dff4','54135bf5e4b08f3d2429dfe0','54135bf5e4b08f3d2429dfdd',
                                '54135bf5e4b08f3d2429dff6','54135bf5e4b08f3d2429dfef','54135bf5e4b08f3d2429dff0',
                                '54135bf5e4b08f3d2429dff1','54135bf5e4b08f3d2429dfde','54135bf5e4b08f3d2429dfec']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Paris', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20200120'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [18]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found Indian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    indian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_indian:
                    indian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, indian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
indian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('indian_restaurants_350.pkl', 'rb') as f:
        indian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, indian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('indian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(indian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Restaurant data loaded.


In [19]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Indian Restaurants:', len(indian_restaurants))
print('Percentage of Indian Restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 4337
Total number of Indian Restaurants: 88
Percentage of Indian Restaurants: 2.03%
Average number of restaurants in neighborhood: 10.57967032967033


In [20]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b940307f964a520296134e3', 'Quick Sushi', 48.810505853944434, 2.3286885023117065, 'C.C La Vache Noire - Niveau 2, 94110 Arcueil, France', 322, False, -429266.7242620067, 5484314.520721814)
('4fa10ccde4b008669d96d2cf', "Let's Wok", 48.81083439819573, 2.3282217979431152, 'C.C La Vache Noire - Niveau 3, 94110 Arcueil, France', 298, False, -429294.7103464196, 5484356.69063899)
('5ac7412c67af3a34ce84c04f', 'Au Bureau', 48.8105464, 2.3274329, '3 place de la Vache Noire, 94110 Arcueil, France', 251, False, -429357.8242302035, 5484334.556001987)
('4b5edc53f964a5204a9b29e3', 'Hippopotamus', 48.8106372, 2.3271332, '3 place de la Vache Noire, 94110 Arcueil, France', 259, False, -429378.0482272373, 5484348.324237996)
('566c21b4498e58e748c33670', 'Nabab Kebab', 48.810551779614364, 2.3287796974182124, 'C.C La Vache Noire - Niveau 1, 94110 Arcueil, France', 316, False, -429259.1916501798, 5484318.478938804)
('4c5fe91bde6920a14aa19464', 'Asia Room', 48

In [21]:
print('List of Indian Restaurants')
print('---------------------------')
for r in list(indian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(indian_restaurants))

List of Indian Restaurants
---------------------------
('53219c91498ef4526acab8d5', 'Chez Papa Indien', 48.81545434111303, 2.320712674355254, '90 rue Henri Ginoux, 92120 Montrouge, France', 321, True, -429757.4240626402, 5484961.420431379)
('4ba3edfdf964a5209f6e38e3', 'The Himalayan', 48.81937208717873, 2.3233279264080777, '40 avenue Henri Ginoux, 92120 Montrouge, France', 236, True, -429492.64380838524, 5485362.9896114655)
('4d27034fb818a35d974d878a', "Eat'n'Cure", 48.81838325742825, 2.329831123352051, '7 rue Danton, 92120 Montrouge, France', 202, True, -429035.49110810086, 5485172.945011821)
('4bc0dba6920eb713bfb5192c', 'Luckey', 48.82000980471855, 2.3054927587509155, '15 ter rue Danicourt, 92240 Malakoff, France', 147, True, -430785.23897125793, 5485654.655573207)
('4d7536969a296ea82bec5fa9', 'Palais de Vandan', 48.82402246706412, 2.3283274906819607, '30 rue Paul Fort, 75014 Paris, France', 188, True, -429039.76538765244, 5485816.200716887)
('4bab5aaef964a520c1a23ae3', "Saveurs d'Hi

In [22]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: Vinevitable, KitoKito, Le Drapeau de la Fidelité, La Maison du Bonheur, Les Écrivains, Pho Bo Bun, Asia W, Le Borromée
Restaurants around location 102: Aux Artistes, La Petite Bretagne, Okirama, La Forge
Restaurants around location 103: Le Petit Sommelier, Le Plomb du Cantal, Les Fauves, Kapunka, Le Plomb du Cantal, Le Ciel de Paris, Les Grillades de Buenos Aires, Black Pinky
Restaurants around location 104: Le Bistrot des Campagnes, La Rotonde, Il Barone, Le Cette, Le Relais de l'Entrecôte, Le Bistrot du Dôme, Chez Fernand, Wadja
Restaurants around location 105: Le Vin Sobre, Lithang, La Terrasse du Laurier, Coffee Club, Le Risalé, Le Val de Grace, Le Luco
Restaurants around location 106: Le Resto, Han Lim, La Grange, Crazy Pasta, Au Vieux Cèdre, La Crète, Chez Ann, Le Volcan
Restaurants around location 107: El Picaflor, Casa Hugo, Le Buisson Ardent, Hugo&Co, Kokoro, L'Étoile du Liban, Aux Portes 

We now visualize all the restaurants found in Paris. The Indian Restaurants are marked in red color, all other restaurants are in blue

In [23]:
map_paris = folium.Map(location=lourve_loc, zoom_start=13)
folium.Marker(lourve_loc, popup='Paris').add_to(map_paris)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_indian = res[6]
    color = 'red' if is_indian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_paris)
map_paris

## Methodology <a name="methodology"></a>

Now that we have all the restaurants in the area and we know the Indian Restaurants among them, we start analyzing the data to get the optimal result for a new Indian Restaurant.

We will be following these steps in a sequence:
* count the number of restuarants in every area
* calculate the **distance to nearest Indian restaurant from every area candidate center**
* determine the areas having less resturants, in particular, areas having few or no Indian Restaurants using **heatmaps**. ( _We limit our analysis to ~6km around the Lourve Museum._ )
* focus on area of interest and gather favourable location points
* create **clusters of these location points** (using **k-means clustering**) that meet our requirements established in the problem statement:
    1. **No more than 2 restaurants in the radius of 250 meters**
    2. **No Indian Restuarant within 750 meters**
    3. **Closer to the Lourve Museum**

## Analysis <a name="analysis"></a>

We now perform some analysis to get basic information from our data. Here, we count the **number of restaurants in every area candidate**:

In [24]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 10.57967032967033


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"39 Villa Moderne, Arcueil, 94110",48.808309,2.326956,-429434.675,5484093.0,5992.495307,6
1,"23bis Rue Vaucouleurs, Arcueil, 94110",48.8092,2.334928,-428834.675,5484093.0,5840.3767,2
2,"Gentilly, 94250",48.81009,2.342902,-428234.675,5484093.0,5747.173218,1
3,"Le Kremlin-Bicêtre, 94270",48.81098,2.350875,-427634.675,5484093.0,5715.767665,1
4,"21 Rue de la Convention, Le Kremlin-Bicêtre, 9...",48.811869,2.358849,-427034.675,5484093.0,5747.173218,7
5,"Rue Paul Andrieux, Ivry-sur-Seine, 94200",48.812758,2.366823,-426434.675,5484093.0,5840.3767,0
6,"1 Cité Pierre et Marie Curie, Ivry-sur-Seine, ...",48.813646,2.374798,-425834.675,5484093.0,5992.495307,1
7,"45 Rue Fénelon, Montrouge, 92120",48.811532,2.313828,-430334.675,5484612.0,5855.766389,3
8,"Square Buffalo, Montrouge, 92120",48.812424,2.321801,-429734.675,5484612.0,5604.462508,0
9,"Avenue Vladimir Ilitch Lénine, Gentilly, 94250",48.813315,2.329774,-429134.675,5484612.0,5408.326913,6


Now we calculate the **distance to nearest Indian restaurant from every area candidate center** (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [25]:
distances_to_indian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in indian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_indian_restaurant.append(min_distance)

df_locations['Distance to Indian restaurant'] = distances_to_indian_restaurant

In [26]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Indian restaurant
0,"39 Villa Moderne, Arcueil, 94110",48.808309,2.326956,-429434.675,5484093.0,5992.495307,6,926.821232
1,"23bis Rue Vaucouleurs, Arcueil, 94110",48.8092,2.334928,-428834.675,5484093.0,5840.3767,2,1098.840159
2,"Gentilly, 94250",48.81009,2.342902,-428234.675,5484093.0,5747.173218,1,1344.77843
3,"Le Kremlin-Bicêtre, 94270",48.81098,2.350875,-427634.675,5484093.0,5715.767665,1,1424.854177
4,"21 Rue de la Convention, Le Kremlin-Bicêtre, 9...",48.811869,2.358849,-427034.675,5484093.0,5747.173218,7,1668.249724
5,"Rue Paul Andrieux, Ivry-sur-Seine, 94200",48.812758,2.366823,-426434.675,5484093.0,5840.3767,0,2062.984453
6,"1 Cité Pierre et Marie Curie, Ivry-sur-Seine, ...",48.813646,2.374798,-425834.675,5484093.0,5992.495307,1,2425.755988
7,"45 Rue Fénelon, Montrouge, 92120",48.811532,2.313828,-430334.675,5484612.0,5855.766389,3,674.652171
8,"Square Buffalo, Montrouge, 92120",48.812424,2.321801,-429734.675,5484612.0,5604.462508,0,349.93489
9,"Avenue Vladimir Ilitch Lénine, Gentilly, 94250",48.813315,2.329774,-429134.675,5484612.0,5408.326913,6,569.423837


In [27]:
print('Average distance to closest Indian restaurant from each area center:', df_locations['Distance to Indian restaurant'].mean())

Average distance to closest Indian restaurant from each area center: 747.6389401832319


OK, so **on average Indian restaurant can be found within ~750** from every area center candidate. That's fairly expected given the cuisine is foreign to the city.

We now create a map showing **density of restaurants** and try to gather meaningful information. We highlight the **borders of Paris boroughs** on our map and a few circles indicating distance of 1km, 2km and 3km from Lourve.

In [28]:
paris_boroughs_url = 'https://raw.githubusercontent.com/tkedia/Coursera_Capstone/master/paris_boroughs.json'
paris_boroughs = requests.get(paris_boroughs_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [29]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

indian_latlons = [[res[2], res[3]] for res in indian_restaurants.values()]

In [30]:
from folium import plugins
from folium.plugins import HeatMap

map_paris = folium.Map(location=lourve_loc, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_paris) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
folium.Circle(lourve_loc, radius=1000, fill=False, color='white').add_to(map_paris)
folium.Circle(lourve_loc, radius=2000, fill=False, color='white').add_to(map_paris)
folium.Circle(lourve_loc, radius=3000, fill=False, color='white').add_to(map_paris)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='json').add_to(map_paris)
map_paris

In [31]:
map_paris = folium.Map(location=lourve_loc, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_paris) #cartodbpositron cartodbdark_matter
HeatMap(indian_latlons).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
folium.Circle(lourve_loc, radius=1000, fill=False, color='white').add_to(map_paris)
folium.Circle(lourve_loc, radius=2000, fill=False, color='white').add_to(map_paris)
folium.Circle(lourve_loc, radius=3000, fill=False, color='white').add_to(map_paris)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='json').add_to(map_paris)
map_paris

This map is hardly 'hot' (Indian restaurants represent a subset of only ~2.03% of all restaurants in Paris). We can see higher density of existing Indian restaurants **north, north-east and south-west** from Lourve.

Based on the above analysis, we now focus our analysis on areas around the Museum and south-east from Lourve. This places our location candidates mostly in boroughs **Lourve** and **Panthéon**.

###  Lourve and Panthéon

In [32]:
roi_x_min = lourve_x - 500
roi_y_max = lourve_y - 500
roi_width = 2000
roi_height = 1000
roi_center_x = roi_x_min + 1000
roi_center_y = roi_y_max - 300
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_paris = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_paris)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris)
map_paris

In [33]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

In [34]:
print(len(roi_latitudes), 'candidate neighborhood centers generated.')

1705 candidate neighborhood centers generated.


In [35]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_indian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, indian_restaurants)
    roi_indian_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [36]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Indian restaurant':roi_indian_distances})

df_roi_locations.head()

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Indian restaurant
0,48.832851,2.351432,-427184.675,5486508.0,15,238.192703
1,48.832999,2.352762,-427084.675,5486508.0,19,173.952022
2,48.832795,2.343925,-427734.675,5486595.0,7,96.148987
3,48.832944,2.345255,-427634.675,5486595.0,8,131.854504
4,48.833092,2.346584,-427534.675,5486595.0,9,213.369616


In [37]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ind_distance = np.array(df_roi_locations['Distance to Indian restaurant']>=700)
print('Locations with no Indian restaurants within 700m:', good_ind_distance.sum())

good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two restaurants nearby: 134
Locations with no Indian restaurants within 700m: 336
Locations with both conditions met: 54


In [38]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_paris = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_paris)
HeatMap(restaurant_latlons).add_to(map_paris)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris) 
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris)
map_paris

In [39]:
map_paris = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris)
map_paris

We now create **7 clusters** out of these good locations using **k-means clustering**

In [47]:
from sklearn.cluster import KMeans

number_of_clusters = 7

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_paris = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_paris)
HeatMap(restaurant_latlons).add_to(map_paris)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_paris)
folium.Marker(lourve_loc).add_to(map_paris)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_paris) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris)
map_paris

In [48]:
map_paris = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(lourve_loc).add_to(map_paris)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_paris)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_paris) 
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris)
map_paris

Finally, we **reverse geocode** these candidate area centers to get the addresses of the favourable locations. We thus complete our analysis by discovering 7 address which meet our intial business requirements. 

In [49]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(tom_api_key, lat, lon).replace(', Paris', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, lourve_x, lourve_y)
    print('{}{} => {:.1f}km from Lourve Museum '.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

8 Rue Agrippa d'Aubigné, 75004                     => 2.2km from Lourve Museum 
Jardin Robert Cavelier de La Salle, 75006          => 1.8km from Lourve Museum 
Batobus - Hôtel de Ville - Louvre                  => 0.3km from Lourve Museum 
Port de la Rapée, 75012                            => 3.3km from Lourve Museum 
Quai Saint-Bernard, 75005                          => 2.1km from Lourve Museum 
5 Rue Cassini, 75014                               => 2.7km from Lourve Museum 
7 Boulevard Morland, 75004                         => 2.5km from Lourve Museum 


## Results and Discussion <a name="results"></a>

We discovered from our analysis that Paris ahs a huge number of restaurants.We foound that the area around the Lourve Museum (centre of the city) is densely covered with restaurants, highest concentration being north, north-east and south-east of the Museum. So practically, we shifted our focus to the south and south-east of the Museum which is potentially the best area to open up the new Restaurant as it is closer to the area of interest as well as have less concentration of restaurants. In our analysis, we discovered that this area lies mosly south of Lourve borough and Panthèon Borough. Both these areas provide a great offering of tourism popularity, strong economic advantage and lie in the heart of the city of love.

Once, we shifted out focus to the said boroughs, we narrowed down the search based on the fact that no two restaurants were within 250 meters and no Indian restaurant was closer than 700 meters. These led us to discover 54 such locations which can be favourable. We finally clustered them into 7 zones and generated addresses of the centers of these zones using reverse geocoding.

## Conclusion <a name="conclusion"></a>

We started off to identify favourable locations for a new Indian restaurant close to the Lourve Museum in Paris, France. We wanted to have a location close to Lourve having less number of restaurants and no Indian restaurants in the vicinity. We identified the different neighborhoods, the density of the restaurants in those areas and finally analyzed those neighborhoods for points which matched our requirements. We used Tom API and Foursquare API for this purpose. After identifying those locations, we clustered them into areas of interest and addresses of those zones were created. We using k-means clustering algorithm for clustering.

We conclude from the analysis that we found many such locations in the Panthèon Borough and some close locations in the Lourve Borough itself.