# Capstone Project. Paris: Restaurants and Movies

## i. Introduction

As part of my Data Science Capstone Project in these sometimes dark and uncertain times I have decided to consider a case of exploring Paris venues and help them improve their customer experience by the means of the following:

<i> When I used to live in Paris to do my studies I used to have that idea to visit all the cool places in the city that were filmed in my favourite movies to make photos after watching "Midnight in Paris" by Woody Allen. That is how I have come up with the initial idea for my project. </i>

Who might be interested by this project? 

- Venues (mainly restaurants in our case) owners
- Marketing agencies
- City tourism department
- Customers 

Let's also not forget that this project is a pilot and in case of our idea viability it might be scaled up to any city or venue type.

Below you can find a link to the Project Description notebook published on my GitHub repository.

https://github.com/zhanikey/github-capstone/blob/master/ParisMoviesProjectDescription.ipynb

## ii. Data Usage

The data we will be using in the Project are:

1. French government open dataset to get neighbourhoods and their locations

https://www.data.gouv.fr/en/datasets/arrondissements-1/#_

2. Foursquare open API for fetching the exact location and addresses of the venues

https://ru.foursquare.com/developers/login?continue=%2Fdevelopers%2Fapps

3. Kaggle open Dataset providing us with the list of the movies that were filmed in Paris with their exact location.

https://www.kaggle.com/alhadiboublenza/movies-filmed-in-paris

4. Additional data from open sources for movies list extending, for example:

https://en.wikipedia.org/w/index.php?title=Category:Films_shot_in_Paris&pagefrom=Lucy%0ALucy+%282014+film%29#mw-pages
   

https://www.imdb.com/search/title/?locations=Paris,%20France&ref_=adv_prv

## iii. Methodology

The structure of our work will be as follows:

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 2>

1. Locating main neighbourhood clusters in order to find out what are the most visited restaurants in the area to help us find out what people from here do like

2. Exploring the neighbourhoods in Paris

3. Analyzing each neihbourhood that we have found

4. Cluster the neighbourhoods with attempt to identify the patterns

5. Creating a map of the above-mentioned clusters   
</font>
</div>

In this project, we will use the Foursquare API to explore neighborhoods in Paris. We will use the explore function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. We will use the k-means clustering algorithm to complete this task. Also, we will use the Folium library to visualize the neighborhoods in Paris and their emerging clusters.

## iv. Results

- We have fetched the Open Data dataset for Paris neihborhoods 
- We have fetched the movies filmed in Paris dataset 
- We have transformed all our data to dataframes
- We have created superimposed map of Paris with neighborhoods marked on it
- We have used Foursquare API to categorize the venue for each neighborhood on the basis of 100 venues within the radius of 500 meters
- We used one-hot encoding to explore the categories of the venue by calculating the mean of the frequency of occurrence of each category
- We have also calcuated the frequency for each neighborshood's venue category
- We have obtained 5 clusters for our neighborhoods and top 10 venues using k-means 
- We have examined each cluster 
- We have used movies dataframe to create a map and joined the map layer to our existing Paris clusters map

## v. Observations and Recommendations

Whether you are deciding to open a restaraunt our analysis helps us to know <i> what </i> cuisine will be more popular <i> where. </i>
We can see that many of the clusters have French Restaurant as their 1st most popular venue category, except for the 4th cluster. We have limited our movies dataframe to 100 but we can clearly see that a lot of movies were filmed very closely to our cluster points.

## vi. Conclusion

This project demonstrates the capabilities of combining any dataframe with geographical data using Python. We have used folium to build our maps, and Foursquare API enabled using venues data for our analysis. As the data might not be always precise, I was considering this project as an opportunity to enhance my skills and apply them directly via this practical task. When extending new skills further (which I hope I will be able to do) I will continue to create notebooks using more advanced techniques and statistical methods. 

# Work Zone 

# Part I. Paris Neighborhoods and venues

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

### 1. Download Data

Paris has 20 Neighbourhoods - called Arrondissements and luckily City council provides us with the necessary data and location.

https://www.data.gouv.fr/en/datasets/arrondissements-1/#_

In [124]:
!wget -q -O 'parisarr.json' https://www.data.gouv.fr/en/datasets/r/4765fe48-35fd-4536-b029-4727380ce23c
print('Data downloaded!')

Data downloaded!


Let's load the data.

In [125]:
with open('parisarr.json') as json_data:
    parisarr_data = json.load(json_data)

In [127]:
parisarr_data

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Polygon',
    'coordinates': [[[2.3962365763098292, 48.85415458748718],
      [2.39707503544599, 48.85308233164173],
      [2.397117501448112, 48.85302801982591],
      [2.3971727739650133, 48.85295732609968],
      [2.397693577586153, 48.852291211748145],
      [2.398372562482884, 48.85142278081672],
      [2.398432636842917, 48.851345942817474],
      [2.39843491343761, 48.851339031956975],
      [2.398437794439281, 48.85133029035795],
      [2.398443683453921, 48.85131242917116],
      [2.398471440861602, 48.85122824104696],
      [2.398712222802474, 48.85049791691401],
      [2.398726932176968, 48.850453300238804],
      [2.398731318425771, 48.85043999658338],
      [2.398742159800161, 48.85040711139916],
      [2.398754036634924, 48.85037176539315],
      [2.398758580768257, 48.85035824128862],
      [2.398778630020184, 48.85029856660312],
      [2.398953002953572, 48.84977956761976],
      [2

In [131]:
arrond_data = parisarr_data['features']
arrond_data[0]

{'type': 'Feature',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[2.3962365763098292, 48.85415458748718],
    [2.39707503544599, 48.85308233164173],
    [2.397117501448112, 48.85302801982591],
    [2.3971727739650133, 48.85295732609968],
    [2.397693577586153, 48.852291211748145],
    [2.398372562482884, 48.85142278081672],
    [2.398432636842917, 48.851345942817474],
    [2.39843491343761, 48.851339031956975],
    [2.398437794439281, 48.85133029035795],
    [2.398443683453921, 48.85131242917116],
    [2.398471440861602, 48.85122824104696],
    [2.398712222802474, 48.85049791691401],
    [2.398726932176968, 48.850453300238804],
    [2.398731318425771, 48.85043999658338],
    [2.398742159800161, 48.85040711139916],
    [2.398754036634924, 48.85037176539315],
    [2.398758580768257, 48.85035824128862],
    [2.398778630020184, 48.85029856660312],
    [2.398953002953572, 48.84977956761976],
    [2.399237132438608, 48.84892142919408],
    [2.399263732241717, 48.84884109492937],
    

So now when we have loaded our data let's transform it to the dataframe.

In [168]:
# define the dataframe columns
column_names = ['C_AR', 'C_ARINSEE', 'L_AR', 'L_AROFF', 'LATITUDE', 'LONGITUDE'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [169]:
neighborhoods

Unnamed: 0,C_AR,C_ARINSEE,L_AR,L_AROFF,LATITUDE,LONGITUDE


Now we are looping through the data in order to fill our dataframe.

In [170]:
for data in arrond_data:
    neighborhood_name = data['properties']['l_ar']
    neihborhood_number = data['properties']['c_ar']
    insee_number = data['properties']['c_arinsee']
    official_name = data['properties']['l_aroff']
    neighborhood_latlon = data['properties']['geom_x_y']
    neighborhood_lat = neighborhood_latlon[0]
    neighborhood_lon = neighborhood_latlon[1]
    
    neighborhoods = neighborhoods.append({'C_AR': neihborhood_number,
                                          'C_ARINSEE': insee_number,
                                          'L_AR': neighborhood_name,
                                          'L_AROFF': official_name,
                                          'LATITUDE': neighborhood_lat,
                                          'LONGITUDE': neighborhood_lon}, 
                                          ignore_index=True)

Now we are using geopy library in order to fetch coordinates of the city.

In [171]:
neighborhoods

Unnamed: 0,C_AR,C_ARINSEE,L_AR,L_AROFF,LATITUDE,LONGITUDE
0,11,75111,11ème Ardt,Popincourt,48.859059,2.380058
1,13,75113,13ème Ardt,Gobelins,48.828388,2.362272
2,4,75104,4ème Ardt,Hôtel-de-Ville,48.854341,2.35763
3,8,75108,8ème Ardt,Élysée,48.872721,2.312554
4,18,75118,18ème Ardt,Buttes-Montmartre,48.892569,2.348161
5,15,75115,15ème Ardt,Vaugirard,48.840085,2.292826
6,3,75103,3ème Ardt,Temple,48.862872,2.360001
7,2,75102,2ème Ardt,Bourse,48.868279,2.342803
8,17,75117,17ème Ardt,Batignolles-Monceau,48.887327,2.306777
9,5,75105,5ème Ardt,Panthéon,48.844443,2.350715


In [172]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


Let's now create a map with our neihborhoods. 

In [323]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(neighborhoods.LATITUDE, neighborhoods.LONGITUDE, neighborhoods.L_AROFF):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_paris)
    
map_paris

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [175]:
CLIENT_ID = 'W44RWKJ1W0B3DGCX0FNHUK1X4BN0HNRJAGSYD2B0HMZPYK4Z' # your Foursquare ID
CLIENT_SECRET = 'S2FT222VEEMFVOGHDDEZKUYSU1U5VYVHPNUQYTFDHH5AEDK5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: W44RWKJ1W0B3DGCX0FNHUK1X4BN0HNRJAGSYD2B0HMZPYK4Z
CLIENT_SECRET:S2FT222VEEMFVOGHDDEZKUYSU1U5VYVHPNUQYTFDHH5AEDK5


### 2. Explore neighborhoods

Here, we are creating function to explore our neighborhoods.

In [179]:
LIMIT = 100
radius = 500

def getNearbyVenues(L_AROFF, LATITUDE, LONGITUDE, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(L_AROFF, LATITUDE, LONGITUDE):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [180]:
paris_venues = getNearbyVenues(L_AROFF=neighborhoods['L_AROFF'],
                                   LATITUDE=neighborhoods['LATITUDE'],
                                   LONGITUDE=neighborhoods['LONGITUDE']
                                  )

Popincourt
Gobelins
Hôtel-de-Ville
Élysée
Buttes-Montmartre
Vaugirard
Temple
Bourse
Batignolles-Monceau
Panthéon
Luxembourg
Reuilly
Opéra
Buttes-Chaumont
Palais-Bourbon
Observatoire
Ménilmontant
Louvre
Entrepôt
Passy


In [181]:
print(paris_venues.shape)
paris_venues.head()

(1246, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Popincourt,48.859059,2.380058,Monsieur Antoine,48.860365,2.378295,Cocktail Bar
1,Popincourt,48.859059,2.380058,Le Servan,48.861063,2.381244,Bistro
2,Popincourt,48.859059,2.380058,Monsieur Matthieu,48.861133,2.381144,Wine Bar
3,Popincourt,48.859059,2.380058,Chez Aline,48.857042,2.37864,Sandwich Place
4,Popincourt,48.859059,2.380058,Ethiopia,48.860833,2.38,Ethiopian Restaurant


Let's check how many venues were returned for each neighborhood

In [182]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Batignolles-Monceau,59,59,59,59,59,59
Bourse,100,100,100,100,100,100
Buttes-Chaumont,43,43,43,43,43,43
Buttes-Montmartre,43,43,43,43,43,43
Entrepôt,100,100,100,100,100,100
Gobelins,61,61,61,61,61,61
Hôtel-de-Ville,100,100,100,100,100,100
Louvre,73,73,73,73,73,73
Luxembourg,38,38,38,38,38,38
Ménilmontant,47,47,47,47,47,47


Now let's check how many different venue categories do we have.

In [183]:
print('There are {} uniques categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 198 uniques categories.


### 3. Analyzing each neigborhood

Now we are analyzing each neighborhood using one-hot encoding.

In [185]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auvergne Restaurant,Baby Store,Bakery,Bank,Bar,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Cambodian Restaurant,Canal,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Corsican Restaurant,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gay Bar,General College & University,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Liquor Store,Lounge,Lyonese Bouchon,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,New American Restaurant,Nightclub,Noodle House,Okonomiyaki Restaurant,Optical Shop,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Resort,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Science Museum,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,Popincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Popincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Popincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
3,Popincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Popincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [187]:
paris_onehot.shape

(1246, 199)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [188]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auvergne Restaurant,Baby Store,Bakery,Bank,Bar,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Cambodian Restaurant,Canal,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Corsican Restaurant,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gay Bar,General College & University,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Liquor Store,Lounge,Lyonese Bouchon,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,New American Restaurant,Nightclub,Noodle House,Okonomiyaki Restaurant,Optical Shop,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Resort,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Science Museum,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,Batignolles-Monceau,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.050847,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.016949,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.067797,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186441,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.152542,0.0,0.0,0.0,0.0,0.0,0.0,0.084746,0.050847,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.050847,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0
1,Bourse,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.06,0.02,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.13,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.01,0.0,0.0
2,Buttes-Chaumont,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093023,0.0,0.0,0.046512,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.093023,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.069767,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.069767,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0
3,Buttes-Montmartre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.162791,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.116279,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.046512,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0
4,Entrepôt,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.13,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.01,0.04,0.0,0.0,0.01,0.02,0.03,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0
5,Gobelins,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.196721,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.016393,0.0,0.0,0.0,0.081967,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081967,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.04918,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081967,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.229508,0.0,0.0,0.0,0.0,0.0,0.0
6,Hôtel-de-Ville,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.14,0.0,0.01,0.03,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.05,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0
7,Louvre,0.0,0.0,0.0,0.0,0.0,0.0,0.041096,0.0,0.0,0.0,0.0,0.0,0.027397,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.013699,0.0,0.027397,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.027397,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.109589,0.0,0.013699,0.027397,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.027397,0.0,0.0,0.068493,0.0,0.0,0.0,0.0,0.0,0.0,0.054795,0.082192,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.068493,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.013699,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.027397,0.027397,0.0,0.0,0.0,0.0,0.027397,0.0,0.0,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.0
8,Luxembourg,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.078947,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0
9,Ménilmontant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.106383,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.06383,0.06383,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0


In [189]:
# confirming new size

paris_grouped.shape

(20, 199)

Let's find out each neighborhood with its top 5 venues

In [190]:
num_top_venues = 5

for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Batignolles-Monceau----
                venue  freq
0   French Restaurant  0.19
1               Hotel  0.15
2  Italian Restaurant  0.08
3                Café  0.07
4              Bakery  0.05


----Bourse----
               venue  freq
0  French Restaurant  0.13
1       Cocktail Bar  0.06
2           Wine Bar  0.06
3              Hotel  0.05
4             Bakery  0.04


----Buttes-Chaumont----
               venue  freq
0  French Restaurant  0.09
1                Bar  0.09
2        Supermarket  0.07
3              Hotel  0.07
4             Bistro  0.05


----Buttes-Montmartre----
               venue  freq
0                Bar  0.16
1  French Restaurant  0.12
2              Hotel  0.05
3         Restaurant  0.05
4  Convenience Store  0.05


----Entrepôt----
               venue  freq
0  French Restaurant  0.13
1        Coffee Shop  0.05
2              Hotel  0.05
3  Indian Restaurant  0.04
4               Café  0.04


----Gobelins----
                   venue  freq
0  Vietnamese Re

Let's put that into dataframe

In [191]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [193]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Batignolles-Monceau,French Restaurant,Hotel,Italian Restaurant,Café,Plaza,Japanese Restaurant,Bakery,Wine Shop,Bistro,Pizza Place
1,Bourse,French Restaurant,Wine Bar,Cocktail Bar,Hotel,Bakery,Bistro,Creperie,Ice Cream Shop,Thai Restaurant,Concert Hall
2,Buttes-Chaumont,French Restaurant,Bar,Supermarket,Hotel,Seafood Restaurant,Beer Bar,Bistro,Music Store,Coffee Shop,Steakhouse
3,Buttes-Montmartre,Bar,French Restaurant,Hotel,Restaurant,Coffee Shop,Pizza Place,Convenience Store,Deli / Bodega,Seafood Restaurant,Café
4,Entrepôt,French Restaurant,Hotel,Coffee Shop,Café,Indian Restaurant,Bistro,Pizza Place,Japanese Restaurant,Bakery,Seafood Restaurant


### 4. Clustering neigborhoods

Let's run k-means to cluster our neighborhoods. 

In [195]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 2, 2, 2, 4, 2, 2, 2, 2], dtype=int32)

In [204]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#paris_merged = neighborhoods

# merge paris_grouped with paris data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='L_AROFF')

paris_merged.head() # check the last columns!

Unnamed: 0,C_AR,C_ARINSEE,L_AR,L_AROFF,LATITUDE,LONGITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,11,75111,11ème Ardt,Popincourt,48.859059,2.380058,2,French Restaurant,Café,Supermarket,Restaurant,Wine Bar,Pastry Shop,Italian Restaurant,Cocktail Bar,Bar,Bakery
1,13,75113,13ème Ardt,Gobelins,48.828388,2.362272,4,Vietnamese Restaurant,Asian Restaurant,French Restaurant,Chinese Restaurant,Thai Restaurant,Juice Bar,Coffee Shop,Park,Cambodian Restaurant,Cosmetics Shop
2,4,75104,4ème Ardt,Hôtel-de-Ville,48.854341,2.35763,2,French Restaurant,Ice Cream Shop,Hotel,Plaza,Italian Restaurant,Clothing Store,Pedestrian Plaza,Garden,Wine Bar,Art Gallery
3,8,75108,8ème Ardt,Élysée,48.872721,2.312554,0,French Restaurant,Hotel,Art Gallery,Spa,Theater,Plaza,Cocktail Bar,Park,Resort,Modern European Restaurant
4,18,75118,18ème Ardt,Buttes-Montmartre,48.892569,2.348161,2,Bar,French Restaurant,Hotel,Restaurant,Coffee Shop,Pizza Place,Convenience Store,Deli / Bodega,Seafood Restaurant,Café


Let's try to visualize the resulting clusters

In [207]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['LATITUDE'], paris_merged['LONGITUDE'], paris_merged['L_AROFF'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 5. Examinating clusters 

Now let's examine each cluster:

<b> Cluster 0 </b>

In [214]:
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[3] + list(range(6, paris_merged.shape[1]))]]

Unnamed: 0,L_AROFF,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Élysée,0,French Restaurant,Hotel,Art Gallery,Spa,Theater,Plaza,Cocktail Bar,Park,Resort,Modern European Restaurant
8,Batignolles-Monceau,0,French Restaurant,Hotel,Italian Restaurant,Café,Plaza,Japanese Restaurant,Bakery,Wine Shop,Bistro,Pizza Place
14,Palais-Bourbon,0,Hotel,French Restaurant,Italian Restaurant,Café,Plaza,History Museum,Cocktail Bar,Coffee Shop,Art Museum,Dessert Shop
15,Observatoire,0,French Restaurant,Hotel,Bistro,Supermarket,Convenience Store,Bakery,Food & Drink Shop,Brasserie,Pizza Place,Sushi Restaurant


<b> Cluster 1 </b>

In [215]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[3] + list(range(6, paris_merged.shape[1]))]]

Unnamed: 0,L_AROFF,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Reuilly,1,Zoo Exhibit,Supermarket,Monument / Landmark,Zoo,Antique Shop,Argentinian Restaurant,Food & Drink Shop,Flower Shop,Fish & Chips Shop,Fast Food Restaurant


<b> Cluster 2 </b>

In [216]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[3] + list(range(6, paris_merged.shape[1]))]]

Unnamed: 0,L_AROFF,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Popincourt,2,French Restaurant,Café,Supermarket,Restaurant,Wine Bar,Pastry Shop,Italian Restaurant,Cocktail Bar,Bar,Bakery
2,Hôtel-de-Ville,2,French Restaurant,Ice Cream Shop,Hotel,Plaza,Italian Restaurant,Clothing Store,Pedestrian Plaza,Garden,Wine Bar,Art Gallery
4,Buttes-Montmartre,2,Bar,French Restaurant,Hotel,Restaurant,Coffee Shop,Pizza Place,Convenience Store,Deli / Bodega,Seafood Restaurant,Café
5,Vaugirard,2,Hotel,Italian Restaurant,French Restaurant,Bakery,Coffee Shop,Lebanese Restaurant,Japanese Restaurant,Wine Shop,Park,Brasserie
6,Temple,2,French Restaurant,Japanese Restaurant,Gourmet Shop,Italian Restaurant,Cocktail Bar,Art Gallery,Coffee Shop,Wine Bar,Bakery,Sandwich Place
7,Bourse,2,French Restaurant,Wine Bar,Cocktail Bar,Hotel,Bakery,Bistro,Creperie,Ice Cream Shop,Thai Restaurant,Concert Hall
9,Panthéon,2,French Restaurant,Italian Restaurant,Bakery,Science Museum,Coffee Shop,Plaza,Café,Bar,Hotel,Greek Restaurant
10,Luxembourg,2,French Restaurant,Pastry Shop,Bistro,Cocktail Bar,Fountain,Café,Lebanese Restaurant,Market,Shopping Mall,Miscellaneous Shop
12,Opéra,2,French Restaurant,Hotel,Cocktail Bar,Bakery,Bistro,Wine Bar,Japanese Restaurant,Lounge,Plaza,Bar
13,Buttes-Chaumont,2,French Restaurant,Bar,Supermarket,Hotel,Seafood Restaurant,Beer Bar,Bistro,Music Store,Coffee Shop,Steakhouse


<b> Cluster 3 </b>

In [217]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[3] + list(range(6, paris_merged.shape[1]))]]

Unnamed: 0,L_AROFF,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Passy,3,Park,Plaza,Lake,French Restaurant,Boat or Ferry,Art Museum,Bus Station,Bus Stop,Donut Shop,Fast Food Restaurant


<b> Cluster 4 </b>

In [218]:
paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[3] + list(range(6, paris_merged.shape[1]))]]

Unnamed: 0,L_AROFF,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Gobelins,4,Vietnamese Restaurant,Asian Restaurant,French Restaurant,Chinese Restaurant,Thai Restaurant,Juice Bar,Coffee Shop,Park,Cambodian Restaurant,Cosmetics Shop


Now let's continue with adding movie datasets

# Part 2. Movies filmed in Paris

Let's download necessary connections first.

In [221]:
pip install kaggle

Collecting kaggle
[?25l  Downloading https://files.pythonhosted.org/packages/62/ab/bb20f9b9e24f9a6250f95a432f8d9a7d745f8d24039d7a5a6eaadb7783ba/kaggle-1.5.6.tar.gz (58kB)
[K     |████████████████████████████████| 61kB 16.3MB/s eta 0:00:01
[?25hCollecting urllib3<1.25,>=1.21.1 (from kaggle)
[?25l  Downloading https://files.pythonhosted.org/packages/01/11/525b02e4acc0c747de8b6ccdab376331597c569c42ea66ab0a1dbd36eca2/urllib3-1.24.3-py2.py3-none-any.whl (118kB)
[K     |████████████████████████████████| 122kB 24.6MB/s eta 0:00:01
Collecting tqdm (from kaggle)
[?25l  Downloading https://files.pythonhosted.org/packages/c9/40/058b12e8ba10e35f89c9b1fdfc2d4c7f8c05947df2d5eb3c7b258019fda0/tqdm-4.46.0-py2.py3-none-any.whl (63kB)
[K     |████████████████████████████████| 71kB 19.6MB/s eta 0:00:01
[?25hCollecting python-slugify (from kaggle)
  Downloading https://files.pythonhosted.org/packages/92/5f/7b84a0bba8a0fdd50c046f8b57dcf179dc16237ad33446079b7c484de04c/python-slugify-4.0.0.tar.gz
Coll

In order to use kaggle datasets we have to provide credentials and the link to the necessary dataset.

In [229]:
!echo '{"username":"zhanikey","key":"0ec9d87166dd5494ddf7b49a2e6ff88d"}' > ~/.kaggle/kaggle.json
!kaggle datasets download -d alhadiboublenza/movies-filmed-in-paris

movies-filmed-in-paris.zip: Skipping, found more recently modified local copy (use --force to force download)


In [232]:
!chmod 600 /home/jupyterlab/.kaggle/kaggle.json

In [240]:
# I have already unziped the file but I provide the code as commented

# !unzip 'movies-filmed-in-paris.zip'

movies = pd.read_csv('tournagesdefilmsparis2011_v3.csv')

In [239]:
movies.head()

Unnamed: 0,titre,realisateur,adresse,organisme_demandeur,type_de_tournage,ardt,date_debut,date_fin,xy
0,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,RUE ROCHER/DE MADRID ET PORTALIS,BIG BAND STORY,TELEFILM,75008,2016-03-31,2016-03-31,"48.878256,2.320229"
1,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,PLACE SAINT AUGUSTIN,BIG BAND STORY,TELEFILM,75008,2016-03-31,2016-03-31,"48.875273,2.319789"
2,UNE FEMME D'EXCEPTION,ERIC GUIRADO,34 QUAI DE LA MARNE,CINETEVE,TELEFILM,75019,2016-10-05,2016-10-05,"48.890336,2.384201"
3,"ALICE NEVERS, LE JUGE EST UNE FEMME/84 ET 85",ERIC LE ROUX,RUE EMILE ZOLA,EGO PRODUCTION,SERIE TELEVISEE,75015,2016-10-12,2016-10-12,"48.846601,2.28608"
4,THE PACKAGE,CHANG KEUN CHUN,PONT NEUF,BH PARIS MEDIA PRODUCTIONS,SERIE TELEVISEE,75001,2016-10-12,2016-10-12,"48.860317,2.344139"


In [246]:
#Below is the code to split our xy column to two separate for latitude and longitude

# new data frame with split value columns 
#movies_xysplit = movies["xy"].str.split(",", n = 1, expand = True) 
  
# making separate first name column from new data frame 
#movies["LATITUDE"]= movies_xysplit[0] 
  
# making separate last name column from new data frame 
#movies["LONGITUDE"]= movies_xysplit[1] 
  
# Dropping old Name columns 
#movies.drop(columns =["xy"], inplace = True) 
  
# df display 
movies.head()

Unnamed: 0,titre,realisateur,adresse,organisme_demandeur,type_de_tournage,ardt,date_debut,date_fin,LATITUDE,LONGITUDE
0,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,RUE ROCHER/DE MADRID ET PORTALIS,BIG BAND STORY,TELEFILM,75008,2016-03-31,2016-03-31,48.878256,2.320229
1,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,PLACE SAINT AUGUSTIN,BIG BAND STORY,TELEFILM,75008,2016-03-31,2016-03-31,48.875273,2.319789
2,UNE FEMME D'EXCEPTION,ERIC GUIRADO,34 QUAI DE LA MARNE,CINETEVE,TELEFILM,75019,2016-10-05,2016-10-05,48.890336,2.384201
3,"ALICE NEVERS, LE JUGE EST UNE FEMME/84 ET 85",ERIC LE ROUX,RUE EMILE ZOLA,EGO PRODUCTION,SERIE TELEVISEE,75015,2016-10-12,2016-10-12,48.846601,2.28608
4,THE PACKAGE,CHANG KEUN CHUN,PONT NEUF,BH PARIS MEDIA PRODUCTIONS,SERIE TELEVISEE,75001,2016-10-12,2016-10-12,48.860317,2.344139


In [247]:
movies.shape

(2805, 10)

Let's drop columns that we will not use

In [248]:
movies_clean = movies.drop(['organisme_demandeur', 'date_debut', 'date_fin'], axis=1)
movies_clean.head()

Unnamed: 0,titre,realisateur,adresse,type_de_tournage,ardt,LATITUDE,LONGITUDE
0,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,RUE ROCHER/DE MADRID ET PORTALIS,TELEFILM,75008,48.878256,2.320229
1,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,PLACE SAINT AUGUSTIN,TELEFILM,75008,48.875273,2.319789
2,UNE FEMME D'EXCEPTION,ERIC GUIRADO,34 QUAI DE LA MARNE,TELEFILM,75019,48.890336,2.384201
3,"ALICE NEVERS, LE JUGE EST UNE FEMME/84 ET 85",ERIC LE ROUX,RUE EMILE ZOLA,SERIE TELEVISEE,75015,48.846601,2.28608
4,THE PACKAGE,CHANG KEUN CHUN,PONT NEUF,SERIE TELEVISEE,75001,48.860317,2.344139


In [249]:
movies_clean['type_de_tournage'].unique()

array(['TELEFILM', 'SERIE TELEVISEE', 'LONG METRAGE'], dtype=object)

In [256]:
# I have added some movies manually in order to demonstrate how we can transform our data

other_movies = pd.DataFrame({"titre":["INCEPTION", "DA VINCI CODE"], 
                             "realisateur":["Christopher Nolan", "Ron Howard"],
                             "adresse":["Pont de Bir-Hakeim", "2 Rue Palatine"],
                             "type_de_tournage":["LONG METRAGE","LONG METRAGE"],
                             "ardt":["75015", "75006"],
                             "LATITUDE":[48.8555961, 48.8510095],
                             "LONGITUDE":[2.285403, 2.3328204]
                            })

In [259]:
movies_clean = movies_clean.append(other_movies,ignore_index = True)
movies_clean.head()

Unnamed: 0,titre,realisateur,adresse,type_de_tournage,ardt,LATITUDE,LONGITUDE
0,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,RUE ROCHER/DE MADRID ET PORTALIS,TELEFILM,75008,48.878256,2.320229
1,COUP DE FOUDRE A JAIPUR,ARNAULD MERCADIER,PLACE SAINT AUGUSTIN,TELEFILM,75008,48.875273,2.319789
2,UNE FEMME D'EXCEPTION,ERIC GUIRADO,34 QUAI DE LA MARNE,TELEFILM,75019,48.890336,2.384201
3,"ALICE NEVERS, LE JUGE EST UNE FEMME/84 ET 85",ERIC LE ROUX,RUE EMILE ZOLA,SERIE TELEVISEE,75015,48.846601,2.28608
4,THE PACKAGE,CHANG KEUN CHUN,PONT NEUF,SERIE TELEVISEE,75001,48.860317,2.344139


In [260]:
movies_clean.shape

(2807, 7)

We see that our dataset contains not only movies but also series. Let's filter that. 

In [284]:
is_movie =  movies_clean['type_de_tournage']=="LONG METRAGE"
movies_only = movies_clean[is_movie]
movies_only.head()

Unnamed: 0,titre,realisateur,adresse,type_de_tournage,ardt,LATITUDE,LONGITUDE
14,GC5,GUILLAUME CANET,AVENUE RUYSDAEL,LONG METRAGE,75008,48.878173,2.310065
15,IRIS/EX CHAOS,JALIL LESPERT,61 QUAI DE GRENELLE,LONG METRAGE,75015,48.8495,2.282467
16,GC5,GUILLAUME CANET,ROUTE DE SEVRES,LONG METRAGE,75016,48.843191,2.257667
17,MON POUSSIN,FREDERIC FORESTIER,12 AVENUE TRUDAINE,LONG METRAGE,75009,48.881483,2.343411
18,SI J'ETAIS UN HOMME,AUDREY DANA,RUE PETIT,LONG METRAGE,75019,48.885369,2.385804


In [285]:
movies_only.shape

(1869, 7)

In [286]:
movies_only = movies_only.dropna()
movies_only.shape

(1781, 7)

We see a lot of data, so just for the sake of experiment and to demonstrate, let's limit the quantity of movies observed.

In [361]:
movies_limit = 100
movies_only_limited = movies_only.iloc[0:movies_limit, :]
movies_only_limited.shape

(100, 7)

In [362]:
movies_only_limited.dtypes

titre               object
realisateur         object
adresse             object
type_de_tournage    object
ardt                object
LATITUDE            object
LONGITUDE           object
dtype: object

In [363]:
movies_only_limited['LONGITUDE'] = pd.to_numeric(movies_only_limited['LONGITUDE'].str.replace(' ',''), errors='coerce')
movies_only_limited['LATITUDE'] = pd.to_numeric(movies_only_limited['LATITUDE'].str.replace(' ',''), errors='coerce')
movies_only_limited.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


titre                object
realisateur          object
adresse              object
type_de_tournage     object
ardt                 object
LATITUDE            float64
LONGITUDE           float64
dtype: object

In [364]:
movies_only_limited.shape

(100, 7)

In [365]:
movies_only_limited.head()

Unnamed: 0,titre,realisateur,adresse,type_de_tournage,ardt,LATITUDE,LONGITUDE
14,GC5,GUILLAUME CANET,AVENUE RUYSDAEL,LONG METRAGE,75008,48.878173,2.310065
15,IRIS/EX CHAOS,JALIL LESPERT,61 QUAI DE GRENELLE,LONG METRAGE,75015,48.8495,2.282467
16,GC5,GUILLAUME CANET,ROUTE DE SEVRES,LONG METRAGE,75016,48.843191,2.257667
17,MON POUSSIN,FREDERIC FORESTIER,12 AVENUE TRUDAINE,LONG METRAGE,75009,48.881483,2.343411
18,SI J'ETAIS UN HOMME,AUDREY DANA,RUE PETIT,LONG METRAGE,75019,48.885369,2.385804


In [366]:
# We have already defined Paris coordinates so let's just use them.

# create map and display it
movies_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
movies_map

Now we will build a map with the location of 100 movies that were filmed in Paris. And let's join our map with the cluster map.

In [368]:
from IPython.core.display import HTML

HTML(movies_map._repr_html_())

from folium import plugins

# instantiate a mark cluster object for the incidents in the dataframe
movies_mark = plugins.MarkerCluster().add_to(map_clusters)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(movies_only_limited.LATITUDE, movies_only_limited.LONGITUDE, movies_only_limited.titre):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=folium.Popup(label,parse_html=True),
    ).add_to(movies_mark)

# display map
map_clusters

Here we go! We have created a map containing our clusters, as well as locations of the movies filmed. How can we use it? Well when opening a restaurant, we can clearly see the places that might interest our customers. We have divided our venues to clusters which should also help us to diversify.