**BUSINESS PROBLEM**

An entrepreneur would like to open a café that specializes in cheesecakes in Dubai. Being a city with diverse population and a tourist hub, Dubai is considered great choice for café and restaurant owners to begin or expand their business. The entrepreneur would like for the café to be affordable and easily accessible to the public. 
Taking into account the price level and easy access to public, the café needs to be located within a community in Dubai that has a good amount of footfall of the general public as well as the tourists. 
Being a resident of Dubai for 20 years now, the intention behind is to derive optimal communities within Dubai for the location of the restaurant using unsupervised machine learning in addition to application of knowledge of the environment of the city. 
Although this business problem is specific for a particular café owner, this model can also be extrapolated to the audience of any potential entrepreneur looking to open a new restaurant or café. 

**DATA**

To build this model, 3 different data sources will be used:

1)	List of Communities in Dubai
https://en.wikipedia.org/wiki/List_of_communities_in_Dubai

This data will be retrieved from the URL using Web Scraping. The pandas package on Python will be used to retrieve this data.

2)	Geospatial data of the Communities in Dubai from the above list
The latitude and longitude of the communities in Dubai will be retrieved by using the geocoder package on Python.
This data will then be merged with the data obtained from Wikipedia to create the base data. 

3)	Top Venues per Community
The top venues per community will be retrieved by using Foursquare through an API by using the data collected in points 1&2 as base data.

**METHODOLOGY**

After obtaining the base data containing the list of communities along with their coordinates. We use Foursquare API to segment the various communities and find out the top common venue categories visited by the people in each community. 

We then use K-means clustering to cluster the communities and then provide with an optimum list of areas for the opening of the cafe. 

We also use Folium library for our map representations.

Firstly, we import all the libraries required for our project.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


**For our first data source, the Wikipedia page containing the list of all the communities in Dubai, we do Webscraping using the read_html command from the pandas library to retrieve the table.**

In [2]:
dubai_raw = pd.read_html("https://en.wikipedia.org/wiki/List_of_communities_in_Dubai")

In [3]:
dubai = pd.DataFrame(dubai_raw[0])

In [4]:
dubai.head()

Unnamed: 0,Community Number,Community (English),Community (Arabic),Area(km2),Population(2000),Population density(/km2)
0,126.0,Abu Hail,أبو هيل,1.27 km²,21414.0,"16,861.4/km²"
1,711.0,Al Awir First,العوير الأولى,,,
2,721.0,Al Awir Second,العوير الثانية,,,
3,333.0,Al Bada,البدع,0.82 km²,18816.0,22946/km²
4,122.0,Al Baraha,البراحة,1.104 km²,7823.0,"7,086/km²"


We now rename the columns and drop all the rows which do have the population density populated. We also drop columns which are not helping to build our case. There is also no community number populated for one community. However, since it is an important one, we do not drop the row and instead rename it to 0.

In [5]:
dubai.dropna(subset=['Population density(/km2)'], inplace=True)
dubai['Community Number'].fillna(345, inplace=True)
dubai['Community Number'] = dubai['Community Number'].astype(np.int64)
dubai.drop(columns=['Community (Arabic)', 'Area(km2)', 'Population(2000)'], inplace=True)
dubai.rename(columns={"Community Number": "Community Code", "Community (English)": "Community Name", "Population density(/km2)": "Pop Density"}, inplace=True)
dubai

Unnamed: 0,Community Code,Community Name,Pop Density
0,126,Abu Hail,"16,861.4/km²"
3,333,Al Bada,22946/km²
4,122,Al Baraha,"7,086/km²"
11,114,Al Buteen,"33,771/km²"
12,113,Al Dhagaya,"21,451/km²"
13,214,Al Garhoud,"1,116.5/km²"
15,313,"Al Hamriya, Dubai","20,890/km²"
16,131,Al Hamriya Port,93.25/km²
17,322,Al Hudaiba,"9,165/km²"
18,326,Al Jaddaf,409.5/km²


In [6]:
dubai.reset_index(drop=True, inplace=True)
dubai

Unnamed: 0,Community Code,Community Name,Pop Density
0,126,Abu Hail,"16,861.4/km²"
1,333,Al Bada,22946/km²
2,122,Al Baraha,"7,086/km²"
3,114,Al Buteen,"33,771/km²"
4,113,Al Dhagaya,"21,451/km²"
5,214,Al Garhoud,"1,116.5/km²"
6,313,"Al Hamriya, Dubai","20,890/km²"
7,131,Al Hamriya Port,93.25/km²
8,322,Al Hudaiba,"9,165/km²"
9,326,Al Jaddaf,409.5/km²


In [7]:
dubai.shape

(91, 3)

After cleaning the dataset, we now have 91 different communities in Dubai instead of 130. However, being a resident of the city for several years, it is well known that the areas that were discarded were not entirely a good fit for a cafe startup as they were mostly outskirts or industrial areas. So, we now proceed with these 91 communities for our analysis.

**For the next step, we use the geolocator package on Python to get the geocoordinates of these areas.**

In [8]:
# define the dataframe columns
column_names = ['Community Code', 'Community Name', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighbors = pd.DataFrame(columns=column_names)

neighbors

Unnamed: 0,Community Code,Community Name,Latitude,Longitude


In [9]:
district = dubai ['Community Code']
name = dubai ['Community Name']
location = None
latitude = None
longitude = None

for data in range(0, len(dubai)):
    dt = district[data]
    nm = name[data]
    
    geolocator = Nominatim(user_agent="myGeocoder")
    location = geolocator.geocode('{}'.format(dt))
    latitude = location.latitude
    longitude = location.longitude

    neighbors = neighbors.append({ 'Community Code':dt,'Community Name':nm,'Latitude': location.latitude,'Longitude': location.longitude}, ignore_index=True)

In [10]:
neighbors

Unnamed: 0,Community Code,Community Name,Latitude,Longitude
0,126,Abu Hail,46.376973,15.045851
1,333,Al Bada,51.593324,3.718116
2,122,Al Baraha,46.165274,14.306747
3,114,Al Buteen,46.337306,15.422497
4,113,Al Dhagaya,46.389777,15.57035
5,214,Al Garhoud,61.802715,22.396534
6,313,"Al Hamriya, Dubai",51.592529,3.718355
7,131,Al Hamriya Port,46.363668,14.30954
8,322,Al Hudaiba,35.695442,139.815453
9,326,Al Jaddaf,55.6079,11.260773


In [11]:
neighbors.shape

(91, 4)

Now that we have gotten all the coordinates, we merge it with our base data.

In [12]:
dubai = dubai.merge(neighbors, left_on="Community Code", right_on="Community Code", how="inner")
dubai = dubai.drop(columns = 'Community Name_y')
dubai = dubai.rename(columns={'Community Name_x': 'Community Name'})
dubai.head()

Unnamed: 0,Community Code,Community Name,Pop Density,Latitude,Longitude
0,126,Abu Hail,"16,861.4/km²",46.376973,15.045851
1,333,Al Bada,22946/km²,51.593324,3.718116
2,122,Al Baraha,"7,086/km²",46.165274,14.306747
3,114,Al Buteen,"33,771/km²",46.337306,15.422497
4,113,Al Dhagaya,"21,451/km²",46.389777,15.57035


Now that we have got all the coordinates of the communities, we use geopy library to get the coordinates of Dubai.

In [13]:
address = 'Dubai, AE'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dubai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dubai are 25.0750095, 55.18876088183319.


We now create a map of all communities superimposed on top of the city.

In [14]:
# create map of Toronto using latitude and longitude values
map_dubai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, name in zip(dubai['Latitude'], dubai['Longitude'], dubai['Community Name']):
    label = '{}'.format('Community Name')
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dubai)  
    
map_dubai

From the above map, we can see that the geocoder has not rightly mapped the coordinates. So, I used to google to manually update the coordinates for all the 91 communities and created a csv file on my own.

In [15]:
geo = pd.read_csv('Dubai.csv')
geo.head()

Unnamed: 0,Community Code,Community Name,Latitude,Longitude
0,126,Abu Hail,25.27651,55.34592
1,333,Al Bada,25.21977,55.26466
2,122,Al Baraha,25.28292,55.31806
3,114,Al Buteen,25.26858,55.29829
4,113,Al Dhagaya,25.27185,55.29893


We join this data to our main file to get the final base table.

In [16]:
dubai.drop(columns = ['Latitude', 'Longitude'], inplace=True)
dubai.head()

Unnamed: 0,Community Code,Community Name,Pop Density
0,126,Abu Hail,"16,861.4/km²"
1,333,Al Bada,22946/km²
2,122,Al Baraha,"7,086/km²"
3,114,Al Buteen,"33,771/km²"
4,113,Al Dhagaya,"21,451/km²"


In [17]:
dubai = dubai.merge(geo, left_on = ["Community Code", "Community Name"] , right_on = ["Community Code", "Community Name"], how = "left")
dubai.head()

Unnamed: 0,Community Code,Community Name,Pop Density,Latitude,Longitude
0,126,Abu Hail,"16,861.4/km²",25.27651,55.34592
1,333,Al Bada,22946/km²,25.21977,55.26466
2,122,Al Baraha,"7,086/km²",25.28292,55.31806
3,114,Al Buteen,"33,771/km²",25.26858,55.29829
4,113,Al Dhagaya,"21,451/km²",25.27185,55.29893


We now superimpose these coordinates over the map of Dubai.

In [18]:
# create map of Toronto using latitude and longitude values
map_dubai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, name in zip(dubai['Latitude'], dubai['Longitude'], dubai['Community Name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dubai)  
    
map_dubai

**Now, we see that all the coordinates are superimposed over the city perfectly so we now move on to segmenting the various communities through the Foursquare API.**

In [19]:
CLIENT_ID = '5AND1ENQGIAOVMSM5UN4RRB0HFE5S1B5UUVGHGVD0MBAORFZ' # your Foursquare ID
CLIENT_SECRET = '1DP41VG0O0IOZS1QSFXZY43LOZSYPIJ1HU0UG5PNNJ3XP5ND' # your Foursquare Secret
LIMIT = 30
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5AND1ENQGIAOVMSM5UN4RRB0HFE5S1B5UUVGHGVD0MBAORFZ
CLIENT_SECRET:1DP41VG0O0IOZS1QSFXZY43LOZSYPIJ1HU0UG5PNNJ3XP5ND


In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Community', 
                  'Community Latitude', 
                  'Community Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
dubai_venues = getNearbyVenues(names=dubai['Community Name'],
                                   latitudes=dubai['Latitude'],
                                   longitudes=dubai['Longitude']
                                  )

Abu Hail
Al Bada
Al Baraha
Al Buteen
Al Dhagaya
Al Garhoud
Al Hamriya, Dubai
Al Hamriya Port
Al Hudaiba
Al Jaddaf
Al Jafiliya
Al Karama
Al Khabisi
Al Kifaf
Al Mamzar
Al Manara
Al Mankhool
Al Mina
Al Mizhar First
Al Mizhar Second
Al Muraqqabat
Al Murar
Al Muteena
Al Nahda First
Al Nahda Second
Al Nasr, Dubai
Al Quoz First
Al Quoz Industrial First
Al Quoz Industrial Fourth
Al Quoz Industrial Second
Al Quoz Industrial Third
Al Quoz Second
Al Quoz Third
Al Ras
Al Rashidiya
Al Rigga
Al Sabkha
Al Safa First
Al Safa Second
Al Safouh First
Al Safouh Second
Al Satwa
Al Shindagha
Al Twar First
Al Twar Second
Al Twar Third
Al Warqa'a Fifth
Al Warqa'a First
Al Warqa'a Fourth
Al Warqa'a Second
Al Warqa'a Third
Al Wasl
Al Waheda
Ayal Nasir
Business Bay
Bu Kadra
Downtown Dubai
Hor Al Anz
Hor Al Anz East
Jebel Ali 1
Jebel Ali 2
Jebel Ali Industrial
Jumeira First
Jumeira Second
Jumeira Third
Mirdif
Muhaisanah Fourth
Muhaisanah Second
Muhaisanah Third
Muhaisnah First
Nad Al Hammar
Nadd Al Shiba Fourth
N

In [31]:
print(dubai_venues.shape)
dubai_venues.head()

(1113, 7)


Unnamed: 0,Community,Community Latitude,Community Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abu Hail,25.27651,55.34592,Pizza & Pizza,25.276561,55.347293,Pizza Place
1,Abu Hail,25.27651,55.34592,Al Zowar Cafateria (كافتريا الزوار),25.275098,55.346817,Burrito Place
2,Abu Hail,25.27651,55.34592,For You Cafe,25.27889,55.347699,Café
3,Abu Hail,25.27651,55.34592,Baqer Mohebi Supermarket,25.277732,55.350696,Convenience Store
4,Abu Hail,25.27651,55.34592,KFC,25.278837,55.344576,Fast Food Restaurant


In [32]:
dubai_venues.groupby('Community').count()

Unnamed: 0_level_0,Community Latitude,Community Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Community,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abu Hail,19,19,19,19,19,19
Al Bada,11,11,11,11,11,11
Al Baraha,8,8,8,8,8,8
Al Buteen,18,18,18,18,18,18
Al Dhagaya,15,15,15,15,15,15
Al Garhoud,30,30,30,30,30,30
Al Hamriya Port,3,3,3,3,3,3
"Al Hamriya, Dubai",28,28,28,28,28,28
Al Hudaiba,30,30,30,30,30,30
Al Jaddaf,4,4,4,4,4,4


In [33]:
print('There are {} uniques categories.'.format(len(dubai_venues['Venue Category'].unique())))

There are 192 uniques categories.


After grouping the venue categories per community, we see that there are totally 192 unique venue categories spread across the 91 communities

**Now, we move on to our last step: Clustering to find the optimum list of communities for our cafe owner**

In [34]:
# one hot encoding
dubai_onehot = pd.get_dummies(dubai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dubai_onehot['Community'] = dubai_venues['Community'] 

#move neighborhood column to the first column
fixed_columns = ["Community"] + list(dubai_onehot.columns.difference(['Community']))
dubai_onehot = dubai_onehot[fixed_columns]

dubai_onehot.head()

Unnamed: 0,Community,ATM,Afghan Restaurant,African Restaurant,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Basketball Court,Beach,Beach Bar,Bed & Breakfast,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cupcake Shop,Currency Exchange,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,German Restaurant,Gift Shop,Gluten-free Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Iraqi Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lounge,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mobility Store,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,New American Restaurant,Night Market,Nightclub,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Pakistani Restaurant,Park,Pedestrian Plaza,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Post Office,Pub,Racetrack,Record Shop,Residential Building (Apartment / Condo),Resort,Restaurant,Roof Deck,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shawarma Place,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Syrian Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Track,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Water Park,Wine Bar,Women's Store,Yoga Studio,Zoo Exhibit
0,Abu Hail,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abu Hail,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abu Hail,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abu Hail,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abu Hail,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now, we group the communities based on the frequency of occurrence.

In [35]:
dubai_grouped = dubai_onehot.groupby('Community').mean().reset_index()
dubai_grouped.head()

Unnamed: 0,Community,ATM,Afghan Restaurant,African Restaurant,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Basketball Court,Beach,Beach Bar,Bed & Breakfast,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cupcake Shop,Currency Exchange,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,German Restaurant,Gift Shop,Gluten-free Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Iraqi Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lounge,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mobility Store,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,New American Restaurant,Night Market,Nightclub,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Pakistani Restaurant,Park,Pedestrian Plaza,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Post Office,Pub,Racetrack,Record Shop,Residential Building (Apartment / Condo),Resort,Restaurant,Roof Deck,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shawarma Place,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Syrian Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Track,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Water Park,Wine Bar,Women's Store,Yoga Studio,Zoo Exhibit
0,Abu Hail,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Bada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0
2,Al Baraha,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Buteen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0
4,Al Dhagaya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0


We now try to get the top 5 most commonly visited venues per community.

In [37]:
num_top_venues = 5

for hood in dubai_grouped['Community']:
    print("----"+hood+"----")
    temp = dubai_grouped[dubai_grouped['Community'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abu Hail----
               venue  freq
0               Café  0.21
1     Clothing Store  0.11
2  Convenience Store  0.11
3         Restaurant  0.05
4       Optical Shop  0.05


----Al Bada----
           venue  freq
0           Café  0.18
1    Coffee Shop  0.18
2       Boutique  0.09
3  Women's Store  0.09
4   Burger Joint  0.09


----Al Baraha----
                       venue  freq
0                      Hotel  0.25
1        American Restaurant  0.12
2  Middle Eastern Restaurant  0.12
3         Turkish Restaurant  0.12
4                Coffee Shop  0.12


----Al Buteen----
                       venue  freq
0  Middle Eastern Restaurant  0.11
1                     Market  0.11
2       Fast Food Restaurant  0.06
3                       Café  0.06
4                Coffee Shop  0.06


----Al Dhagaya----
                  venue  freq
0         Shopping Mall  0.13
1  Fast Food Restaurant  0.07
2    Miscellaneous Shop  0.07
3                 Beach  0.07
4  Pakistani Restaurant  0.07


--

           venue  freq
0  Boat or Ferry   1.0
1           Pool   0.0
2  Movie Theater   0.0
3  Moving Target   0.0
4      Multiplex   0.0


----Al Twar First----
              venue  freq
0       Coffee Shop   0.4
1        Hookah Bar   0.1
2             Diner   0.1
3   Airport Service   0.1
4  Airport Terminal   0.1


----Al Twar Second----
         venue  freq
0         Café  0.33
1  Pizza Place  0.33
2  Coffee Shop  0.33
3          ATM  0.00
4    Multiplex  0.00


----Al Twar Third----
                     venue  freq
0              Zoo Exhibit   0.5
1                   Tunnel   0.5
2  North Indian Restaurant   0.0
3            Movie Theater   0.0
4            Moving Target   0.0


----Al Waheda----
                       venue  freq
0                       Park   0.4
1  Middle Eastern Restaurant   0.2
2                      Beach   0.2
3                 Restaurant   0.2
4                        ATM   0.0


----Al Warqa'a First----
               venue  freq
0          Nightclub   1.

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creating a new dataframe with the top 10 venues per Community.

In [40]:

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Community']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
communities_venues_sorted = pd.DataFrame(columns=columns)
communities_venues_sorted['Community'] = dubai_grouped['Community']

for ind in np.arange(dubai_grouped.shape[0]):
    communities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dubai_grouped.iloc[ind, :], num_top_venues)

communities_venues_sorted.head()

Unnamed: 0,Community,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abu Hail,Café,Clothing Store,Convenience Store,Indian Restaurant,Furniture / Home Store,Bakery,Burrito Place,Fast Food Restaurant,Restaurant,Shoe Store
1,Al Bada,Café,Coffee Shop,Restaurant,Women's Store,Burger Joint,Beach,Grocery Store,Boutique,Gym / Fitness Center,Food & Drink Shop
2,Al Baraha,Hotel,Middle Eastern Restaurant,American Restaurant,Turkish Restaurant,Coffee Shop,Lounge,Café,Zoo Exhibit,Farmers Market,Food & Drink Shop
3,Al Buteen,Middle Eastern Restaurant,Market,Ice Cream Shop,Electronics Store,Miscellaneous Shop,Flower Shop,Café,Fast Food Restaurant,Farmers Market,Jewelry Store
4,Al Dhagaya,Shopping Mall,Art Gallery,Flower Shop,Market,Beach,Fast Food Restaurant,Farmers Market,Juice Bar,Miscellaneous Shop,Jewelry Store


**K-Means Clustering to cluster the communities into 5 different clusters.**

In [62]:
# set number of clusters
kclusters = 5

dubai_grouped_clustering = dubai_grouped.drop('Community', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=6).fit(dubai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 3, 1, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 1, 1, 3, 3, 2, 3, 3, 1,
       3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 3, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 3, 0, 3, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3,
       1, 3, 1, 3, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3], dtype=int32)

In [None]:
dubai.rename(columns = {'Community Name': 'Community'}, inplace=True)
dubai.head()

In [50]:
# add clustering labels
communities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_.astype("int"))

dubai_merged = dubai

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dubai_merged = dubai_merged.join(communities_venues_sorted.set_index('Community'), on='Community').dropna()

dubai_merged.head() # check the last columns!

Unnamed: 0,Community Code,Community,Pop Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,126,Abu Hail,"16,861.4/km²",25.27651,55.34592,1.0,Café,Clothing Store,Convenience Store,Indian Restaurant,Furniture / Home Store,Bakery,Burrito Place,Fast Food Restaurant,Restaurant,Shoe Store
1,333,Al Bada,22946/km²,25.21977,55.26466,1.0,Café,Coffee Shop,Restaurant,Women's Store,Burger Joint,Beach,Grocery Store,Boutique,Gym / Fitness Center,Food & Drink Shop
2,122,Al Baraha,"7,086/km²",25.28292,55.31806,3.0,Hotel,Middle Eastern Restaurant,American Restaurant,Turkish Restaurant,Coffee Shop,Lounge,Café,Zoo Exhibit,Farmers Market,Food & Drink Shop
3,114,Al Buteen,"33,771/km²",25.26858,55.29829,1.0,Middle Eastern Restaurant,Market,Ice Cream Shop,Electronics Store,Miscellaneous Shop,Flower Shop,Café,Fast Food Restaurant,Farmers Market,Jewelry Store
4,113,Al Dhagaya,"21,451/km²",25.27185,55.29893,1.0,Shopping Mall,Art Gallery,Flower Shop,Market,Beach,Fast Food Restaurant,Farmers Market,Juice Bar,Miscellaneous Shop,Jewelry Store


*Visualizing the clusters*

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dubai_merged['Latitude'], dubai_merged['Longitude'], dubai_merged['Community'], dubai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

From the above map, we see that most communities belong to Cluster 1 or Cluster 3. So, we now try to understand all the clusters further.

In [54]:
dubai_merged.loc[dubai_merged['Cluster Labels'] == 1, dubai_merged.columns[[1] + list(range(5, dubai_merged.shape[1]))]]["1st Most Common Venue"].value_counts()[0:10]

Café                         9
Indian Restaurant            9
Coffee Shop                  6
Fast Food Restaurant         4
Middle Eastern Restaurant    3
Furniture / Home Store       3
Market                       2
Cosmetics Shop               2
Cafeteria                    1
Grocery Store                1
Name: 1st Most Common Venue, dtype: int64

In [55]:
dubai_merged.loc[dubai_merged['Cluster Labels'] == 3, dubai_merged.columns[[1] + list(range(5, dubai_merged.shape[1]))]]["1st Most Common Venue"].value_counts()[0:10]

Hotel                        12
Middle Eastern Restaurant     5
Park                          1
Café                          1
Boat or Ferry                 1
Indian Restaurant             1
Name: 1st Most Common Venue, dtype: int64

In [56]:
dubai_merged.loc[dubai_merged['Cluster Labels'] == 2, dubai_merged.columns[[1] + list(range(5, dubai_merged.shape[1]))]]["1st Most Common Venue"].value_counts()[0:10]

Burger Joint    1
Name: 1st Most Common Venue, dtype: int64

In [57]:
dubai_merged.loc[dubai_merged['Cluster Labels'] == 4, dubai_merged.columns[[1] + list(range(5, dubai_merged.shape[1]))]]["1st Most Common Venue"].value_counts()[0:10]

Cafeteria    1
Name: 1st Most Common Venue, dtype: int64

In [58]:
dubai_merged.loc[dubai_merged['Cluster Labels'] == 5, dubai_merged.columns[[1] + list(range(5, dubai_merged.shape[1]))]]["1st Most Common Venue"].value_counts()[0:10]

Series([], Name: 1st Most Common Venue, dtype: int64)

**From the above, it is very clear that all areas belonging to Cluster 1 belong to the optimum list of areas to open a Cafe. Cafe is its top visited venues and also has similar venues such as coffee shop, etc among its top venues.**

**So any area belonging to Cluster 1 can be a very good fit for our cafe owner to open their new store.**