<h1>Battle of the Neighborhoods - Clustering the Neighborhoods of Paris</h1>
<h2>1. Introduction</h2>
<p>TripAdvisor, the travel planning and booking site, unveiled the 2018 Travelers' Choice ™ Award-winning destinations. This ranking recognizes the favorite destinations of travelers around the world. In 2018, Paris took 1st place in the “World” ranking, ahead of London, Rome and New York.</p>
<p>Travelers' Choice ™ Awards winners are determined based on an algorithm that assesses the quantity and quality of reviews of hotels, restaurants and attractions in destinations around the world, gathered over a 12-month period, as well as the booking interest of travelers when they search the web.</p>


<h2>2. Business Problem</h2>
<p>The objective of this project is to provide tourists heading to Paris the relevant information about the best places to stay in the city while also taking into consideration the best attractions. We will try to find the most interesting neighborhood to stay in using an Airbnb service.</p>

<h2>3. Data</h2>
<p>We will be using the following datasets provided by Airbnb:</p>
<ul>
    <li><a href="http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.csv">http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.csv</a></li>
    <li><a href="http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.geojson">http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.geojson</a></li>

<p>We will also be leveraging the Foursquare API in order to discover information about different venues belonging to a neighborhood and a borough. Foursquare is a location data provider that will provide us with the relevant data involving our venues.</p>

<h2>4. Methodology</h2>

<h4>Downloading json file</h4>

In [1]:
!wget -q -O 'neighborhoods.geojson' http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.geojson

<h4>Importing libraries</h4>

In [3]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

In [18]:
paris_neighborhood = pd.read_json('neighborhoods.geojson')
paris_neighborhood.to_csv('neighborhoods.geojson', index=None)

In [12]:
!wget -q -O 'neighborhoods.csv' http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.csv

In [14]:
df = pd.read_csv("http://data.insideairbnb.com/france/ile-de-france/paris/2020-12-14/visualisations/neighbourhoods.csv")

<h4>The neighborhood.csv data frame</h4>

In [15]:
df

Unnamed: 0,neighbourhood_group,neighbourhood
0,,Batignolles-Monceau
1,,Bourse
2,,Buttes-Chaumont
3,,Buttes-Montmartre
4,,Élysée
5,,Entrepôt
6,,Gobelins
7,,Hôtel-de-Ville
8,,Louvre
9,,Luxembourg


The geojson file has already been converted into a csv file in order to take into account the longitude and latitude coordinates. Thus we are now working with a converted csv file that has been uploaded into the Data Assets folder of this project on IBM Cloud. In the following cell, we are importing the csv file into the jupyter notebook:

In [41]:
# The code was removed by Watson Studio for sharing.

In [42]:
df_data_1

Unnamed: 0,X,Y,neighbourhood
0,2.306204,48.887035,Batignolles-Monceau
1,2.312969,48.854714,Palais-Bourbon
2,2.383096,48.887149,Buttes-Chaumont
3,2.337891,48.877028,Opéra
4,2.360472,48.875907,Entrepôt
5,2.362916,48.8301,Gobelins
6,2.296235,48.841495,Vaugirard
7,2.445016,48.835072,Reuilly
8,2.337345,48.861938,Louvre
9,2.332321,48.849443,Luxembourg


<h4>Renaming X and Y columns into longitude and latitude, respectively:</h4>

In [46]:
paris_geo = df_data_1.rename(columns={'X' : 'longitude','Y' : 'latitude'})

In [47]:
paris_geo.head()

Unnamed: 0,longitude,latitude,neighbourhood
0,2.306204,48.887035,Batignolles-Monceau
1,2.312969,48.854714,Palais-Bourbon
2,2.383096,48.887149,Buttes-Chaumont
3,2.337891,48.877028,Opéra
4,2.360472,48.875907,Entrepôt


<h4>Importing geocoder</h4>

In [48]:
import geocoder
from geopy.geocoders import Nominatim

<h4>Coordinates for Paris</h4>

In [49]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


<h4>Visualizing the map of Paris</h4>

In [52]:
map_paris = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for longitude, latitude, neighbourhood in zip(paris_geo['longitude'], paris_geo['latitude'], paris_geo['neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

<h4>Foursquare API CLient Credentials</h4>

In [53]:
CLIENT_ID = 'FMPINKFOFWLARJGIB5OJZ5NLPHIYJK2CMVM00UWHOYGULVD0' 
CLIENT_SECRET = 'DKPAZXYPFRNVKZNQILQ0VIB5FWDRMIFZD2MOX5M2XOTH33HK'
VERSION = '20180605' # Foursquare API version
LIMIT = 100

print('Credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Credentials:
CLIENT_ID: FMPINKFOFWLARJGIB5OJZ5NLPHIYJK2CMVM00UWHOYGULVD0
CLIENT_SECRET:DKPAZXYPFRNVKZNQILQ0VIB5FWDRMIFZD2MOX5M2XOTH33HK


<h4>Defining the function to get nearby venues</h4>

In [54]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
paris_venues = getNearbyVenues(names=paris_geo['neighbourhood'], latitudes=paris_geo['latitude'], longitudes=paris_geo['longitude'])

Batignolles-Monceau
Palais-Bourbon
Buttes-Chaumont
Opéra
Entrepôt
Gobelins
Vaugirard
Reuilly
Louvre
Luxembourg
Élysée
Temple
Ménilmontant
Panthéon
Passy
Observatoire
Popincourt
Bourse
Buttes-Montmartre
Hôtel-de-Ville


<h4>Grouping venues by neighborhood</h4>

In [57]:
paris_venues.groupby('Neighbourhood').head()


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Batignolles-Monceau,48.887035,2.306204,LeavinRoom Escape Game,48.887913,2.307708,Escape Room
1,Batignolles-Monceau,48.887035,2.306204,Le Bistrot Tocqueville,48.886365,2.309140,Diner
2,Batignolles-Monceau,48.887035,2.306204,Fratelli,48.884824,2.307728,Italian Restaurant
3,Batignolles-Monceau,48.887035,2.306204,Hôtel Gaston,48.887659,2.307825,Hotel
4,Batignolles-Monceau,48.887035,2.306204,Pizzeria d'Ampère,48.885210,2.306645,Italian Restaurant
...,...,...,...,...,...,...,...
1082,Hôtel-de-Ville,48.854664,2.357004,Aux Merveilleux de Fred,48.855686,2.356369,Dessert Shop
1083,Hôtel-de-Ville,48.854664,2.357004,Maison Européenne de la Photographie,48.855128,2.358948,Art Museum
1084,Hôtel-de-Ville,48.854664,2.357004,Pamela Popo,48.855749,2.356919,French Restaurant
1085,Hôtel-de-Ville,48.854664,2.357004,Jardin de l'Hôtel de Sens,48.853842,2.358404,Garden


In [58]:
paris_venues.shape

(1182, 7)

<h4>Grouping venues by category</h4>

In [59]:
paris_venues.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghan Restaurant,Popincourt,48.860048,2.378153,Afghanistan,48.862327,2.379999
African Restaurant,Opéra,48.877028,2.360472,Wally Le Saharien,48.879211,2.357852
American Restaurant,Buttes-Chaumont,48.887149,2.383096,Belushi's,48.888541,2.379129
Antique Shop,Opéra,48.877028,2.337891,Hôtel des Ventes Drouot,48.873061,2.340101
Argentinian Restaurant,Temple,48.877028,2.358704,Loco,48.873772,2.358244
...,...,...,...,...,...,...
Wine Bar,Temple,48.891966,2.378153,Willi's Wine Bar,48.891257,2.379394
Wine Shop,Temple,48.887149,2.400876,Portologia,48.885361,2.405792
Women's Store,Bourse,48.867735,2.343035,L'Appartement Sézane,48.869574,2.345060
Yoga Studio,Batignolles-Monceau,48.887035,2.306204,Espace Bikram,48.883052,2.304334


<h2>One Hot Encoding - Categories</h2>

In [60]:
paris_onehot_cat = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")
paris_onehot_cat.head()

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h4>We need to be more specific about the venues to make the clustering optimal</h4>

In [61]:
paris_onehot_cat['Neighbourhood'] = paris_venues['Neighbourhood'] 

fixed_columns = [paris_onehot_cat.columns[-1]] + list(paris_onehot_cat.columns[:-1])
paris_onehot_cat = paris_onehot_cat[fixed_columns]

paris_onehot_cat.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,Batignolles-Monceau,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Batignolles-Monceau,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Batignolles-Monceau,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Batignolles-Monceau,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Batignolles-Monceau,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [62]:
paris_onehot_cat.shape

(1182, 199)

In [63]:
paris_grouped = paris_onehot_cat.groupby('Neighbourhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,Batignolles-Monceau,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0
1,Bourse,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.02,0.0,0.0
2,Buttes-Chaumont,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0
3,Buttes-Montmartre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0
4,Entrepôt,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0
5,Gobelins,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,...,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0
6,Hôtel-de-Ville,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
7,Louvre,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,...,0.032258,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0
8,Luxembourg,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
9,Ménilmontant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0


In [64]:
paris_grouped.shape

(20, 199)

<h4>Top 5 venues by neighborhood</h4>

In [65]:
num_top_venues = 5

for hood in paris_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Batignolles-Monceau----
                venue  freq
0   French Restaurant  0.20
1               Hotel  0.18
2  Italian Restaurant  0.08
3                Café  0.05
4              Bakery  0.05


----Bourse----
               venue  freq
0  French Restaurant  0.13
1           Wine Bar  0.05
2       Cocktail Bar  0.05
3             Bakery  0.04
4             Bistro  0.04


----Buttes-Chaumont----
                venue  freq
0   French Restaurant  0.10
1                 Bar  0.10
2         Supermarket  0.06
3  Italian Restaurant  0.04
4  Seafood Restaurant  0.04


----Buttes-Montmartre----
               venue  freq
0                Bar  0.16
1  French Restaurant  0.14
2               Café  0.05
3         Restaurant  0.05
4  Convenience Store  0.05


----Entrepôt----
               venue  freq
0  French Restaurant  0.13
1             Bistro  0.05
2               Café  0.04
3        Coffee Shop  0.04
4              Hotel  0.04


----Gobelins----
                   venue  freq
0  Vietnam

<h4>Function to sort venues in descending order</h4>

In [66]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h4>Top 10 venues</h4>

In [68]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = paris_grouped['Neighbourhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Batignolles-Monceau,French Restaurant,Hotel,Italian Restaurant,Bakery,Japanese Restaurant,Café,Bistro,Plaza,Restaurant,Metro Station
1,Bourse,French Restaurant,Wine Bar,Cocktail Bar,Bakery,Bistro,Italian Restaurant,Hotel,Creperie,Thai Restaurant,Pedestrian Plaza
2,Buttes-Chaumont,French Restaurant,Bar,Supermarket,Sushi Restaurant,Hotel,Italian Restaurant,Bistro,Beer Bar,Seafood Restaurant,Farmers Market
3,Buttes-Montmartre,Bar,French Restaurant,Convenience Store,Restaurant,Café,Italian Restaurant,Fast Food Restaurant,Mediterranean Restaurant,Cheese Shop,Bistro
4,Entrepôt,French Restaurant,Bistro,Hotel,Coffee Shop,Café,Bar,Indian Restaurant,Asian Restaurant,Pizza Place,Fast Food Restaurant


<h2>K-Means Clustering Model</h2>

Let's take 5 clusters

In [88]:
# set number of clusters
kclusters = 5
paris_grouped_clustering = paris_grouped.drop('Neighbourhood', 1)
paris_grouped_clustering
# run k-means clustering
kmeans_paris = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.02,0.0,0.0
2,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0
4,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,...,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,...,0.032258,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0


In [107]:
kmeans_paris
kmeans_paris.labels_[0:100]

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 2, 0, 4, 0, 0, 1],
      dtype=int32)

<h4>Merging paris_grouped dataframe with the paris_geo dataframe to add longitude and latitude columns</h4>

In [92]:
# add clustering labels
#neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans_paris.labels_ +1)

paris_merged = paris_geo

# merge paris_grouped with paris_geo to add latitude/longitude for each neighbourhood
paris_merged = paris_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='neighbourhood')

paris_merged.head(10) # check the last columns!

Unnamed: 0,longitude,latitude,neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2.306204,48.887035,Batignolles-Monceau,1,French Restaurant,Hotel,Italian Restaurant,Bakery,Japanese Restaurant,Café,Bistro,Plaza,Restaurant,Metro Station
1,2.312969,48.854714,Palais-Bourbon,1,Hotel,French Restaurant,Café,Plaza,History Museum,Italian Restaurant,Bakery,Historic Site,Garden,Park
2,2.383096,48.887149,Buttes-Chaumont,0,French Restaurant,Bar,Supermarket,Sushi Restaurant,Hotel,Italian Restaurant,Bistro,Beer Bar,Seafood Restaurant,Farmers Market
3,2.337891,48.877028,Opéra,0,French Restaurant,Hotel,Japanese Restaurant,Bistro,Cocktail Bar,Lounge,Bar,Bakery,Restaurant,Wine Bar
4,2.360472,48.875907,Entrepôt,0,French Restaurant,Bistro,Hotel,Coffee Shop,Café,Bar,Indian Restaurant,Asian Restaurant,Pizza Place,Fast Food Restaurant
5,2.362916,48.8301,Gobelins,0,Vietnamese Restaurant,French Restaurant,Chinese Restaurant,Asian Restaurant,Hotel,Creperie,Park,Plaza,Sushi Restaurant,Cambodian Restaurant
6,2.296235,48.841495,Vaugirard,0,French Restaurant,Hotel,Italian Restaurant,Bakery,Plaza,Lebanese Restaurant,Bistro,Coffee Shop,Indian Restaurant,Supermarket
7,2.445016,48.835072,Reuilly,4,Theater,Playground,Performing Arts Venue,Botanical Garden,Comedy Club,Stadium,Bike Trail,Bike Rental / Bike Share,Tennis Stadium,Fast Food Restaurant
8,2.337345,48.861938,Louvre,0,French Restaurant,Plaza,Café,Italian Restaurant,Coffee Shop,Art Museum,Historic Site,Hotel,Udon Restaurant,Cheese Shop
9,2.332321,48.849443,Luxembourg,0,French Restaurant,Pastry Shop,Plaza,Italian Restaurant,Wine Bar,Fountain,Tea Room,Bakery,Chocolate Shop,Deli / Bodega


<h4>Dropping NAN values</h4>

In [93]:
paris_merged_fix = paris_merged.dropna(subset=['Cluster Labels'])

<h4>Cluster Map</h4>

In [95]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged_fix['latitude'], paris_merged_fix['longitude'], paris_merged_fix['neighbourhood'], paris_merged_fix['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>Examining Clusters</h3>

Cluster 1

In [96]:
paris_merged_fix.loc[paris_merged_fix['Cluster Labels'] == 0, paris_merged_fix.columns[[1] + list(range(5, paris_merged_fix.shape[1]))]]

Unnamed: 0,latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,48.887149,Bar,Supermarket,Sushi Restaurant,Hotel,Italian Restaurant,Bistro,Beer Bar,Seafood Restaurant,Farmers Market
3,48.877028,Hotel,Japanese Restaurant,Bistro,Cocktail Bar,Lounge,Bar,Bakery,Restaurant,Wine Bar
4,48.875907,Bistro,Hotel,Coffee Shop,Café,Bar,Indian Restaurant,Asian Restaurant,Pizza Place,Fast Food Restaurant
5,48.8301,French Restaurant,Chinese Restaurant,Asian Restaurant,Hotel,Creperie,Park,Plaza,Sushi Restaurant,Cambodian Restaurant
6,48.841495,Hotel,Italian Restaurant,Bakery,Plaza,Lebanese Restaurant,Bistro,Coffee Shop,Indian Restaurant,Supermarket
8,48.861938,Plaza,Café,Italian Restaurant,Coffee Shop,Art Museum,Historic Site,Hotel,Udon Restaurant,Cheese Shop
9,48.849443,Pastry Shop,Plaza,Italian Restaurant,Wine Bar,Fountain,Tea Room,Bakery,Chocolate Shop,Deli / Bodega
11,48.862738,Coffee Shop,Gourmet Shop,Italian Restaurant,Japanese Restaurant,Bakery,Cocktail Bar,Clothing Store,Wine Bar,Bookstore
12,48.862437,Bistro,Japanese Restaurant,Bakery,Park,Bar,Café,Mexican Restaurant,Bookstore,Food & Drink Shop
13,48.845388,Hotel,Bar,Italian Restaurant,Bakery,Museum,Wine Bar,Coffee Shop,Pub,Plaza


Cluster 2

In [97]:
paris_merged_fix.loc[paris_merged_fix['Cluster Labels'] == 1, paris_merged_fix.columns[[1] + list(range(5, paris_merged_fix.shape[1]))]]

Unnamed: 0,latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,48.887035,Hotel,Italian Restaurant,Bakery,Japanese Restaurant,Café,Bistro,Plaza,Restaurant,Metro Station
1,48.854714,French Restaurant,Café,Plaza,History Museum,Italian Restaurant,Bakery,Historic Site,Garden,Park
10,48.873209,Hotel,Bakery,Art Gallery,Spa,Department Store,Sporting Goods Shop,Jewelry Store,Café,Furniture / Home Store


Cluster 3

In [98]:
paris_merged_fix.loc[paris_merged_fix['Cluster Labels'] == 2, paris_merged_fix.columns[[1] + list(range(5, paris_merged_fix.shape[1]))]]

Unnamed: 0,latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,48.857042,Circus,Bike Rental / Bike Share,Lake,Zoo Exhibit,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Exhibit


Cluster 4

In [99]:
paris_merged_fix.loc[paris_merged_fix['Cluster Labels'] == 3, paris_merged_fix.columns[[1] + list(range(5, paris_merged_fix.shape[1]))]]

Unnamed: 0,latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,48.82969,Hotel,Food & Drink Shop,Bakery,Plaza,Pizza Place,Fast Food Restaurant,Brasserie,Bistro,Sushi Restaurant


Cluster 5

In [100]:
paris_merged_fix.loc[paris_merged_fix['Cluster Labels'] == 4, paris_merged_fix.columns[[1] + list(range(5, paris_merged_fix.shape[1]))]]

Unnamed: 0,latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,48.835072,Playground,Performing Arts Venue,Botanical Garden,Comedy Club,Stadium,Bike Trail,Bike Rental / Bike Share,Tennis Stadium,Fast Food Restaurant


<h2>5. Results and Discussion</h2>

<p>The results of the modeling has proven that Paris is truly a multicultural city. It boasts restaurants that cater to all kinds of tastes, including but not limited to Italian, Middle-Eastern, Asian and French cuisines. There are gourmet shops and wine bars for a sophisticated outing as well as bistros and cafes if you want to take it easy and relax. Aside from restaurants, Paris is home to a number of stores, museums, stadiums and hotels that can tickle anyone's fancy. Whether you're looking to relax in a park or go clubbing, Paris has just the place for your needs.</p>

<h2>6. Conclusion</h2>

<p>The objective of this capstone project was to analyze data about the city of Paris and give a report on the most interesting places that would seduce a tourist into visiting. Paris seems to offer all kinds of places for all kinds of people. As has been indicated by the modelling, we are able to visualize the extent to which each neighborhood provides a variety of interesting venues for tourists. In the interests of exploration this city has anything you would be interested in seeing, whether it be art and culture, sports or even nature.</p>