## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction/Business Problem: <a name="introduction"></a>
Mumbai is a city known for its nightlife with loads of popular venues spread across the city which are frequented by youngsters, party goers and celebrities alike. An entrepreneur is interested in starting a venture in this lucrative space and wants to open a Gastropub in the city. He would like to identify an appropriate location in the city to start this venture based on the venues in the locality and their popularity. This will form our business problem and we will use data from Foursquare API and other sources to explore the Nightlife venues in city to arrive at an appropriate location.

In [76]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

## Data Used: <a name="data"></a>
We will use the dataset obtained from website http://creativecommons.org/licenses/by/3.0/ which contains the latitude longitude data for India at a pincode level. We will subset the above dataset in order to obtain latitude & longitude only for Mumbai at a pincode level which is our area of interest for this exercise.

In [77]:
columns = ['country code','postal code','Neighborhood','State' ,'admin code1','District','admin code2','admin name3','admin code3',
'latitude','longitude','accuracy']
df = pd.read_csv("IN.txt",delimiter = '\t',names=columns, usecols=['postal code','Neighborhood','State','District','latitude','longitude'])
df.head(10)

Unnamed: 0,postal code,Neighborhood,State,District,latitude,longitude
0,744101,Marine Jetty,Andaman & Nicobar Islands,South Andaman,11.6667,92.75
1,744101,Port Blair,Andaman & Nicobar Islands,South Andaman,11.6667,92.75
2,744101,N.S.Building,Andaman & Nicobar Islands,South Andaman,11.6667,92.75
3,744102,Haddo,Andaman & Nicobar Islands,South Andaman,11.6833,92.7167
4,744102,Chatham,Andaman & Nicobar Islands,South Andaman,11.7,92.6667
5,744102,Herbertabad,Andaman & Nicobar Islands,South Andaman,11.7167,92.6167
6,744102,Delanipur,Andaman & Nicobar Islands,South Andaman,11.7,92.6667
7,744102,Radio Colony,Andaman & Nicobar Islands,South Andaman,11.7,92.6667
8,744103,Minnie Bay,Andaman & Nicobar Islands,South Andaman,11.6651,92.7121
9,744103,Brijgunj,Andaman & Nicobar Islands,South Andaman,11.6651,92.7121


In [78]:
Mumbai_df = df[df['District'] == 'Mumbai'].reset_index(drop=True)
Mumbai_df.head(10)

Unnamed: 0,postal code,Neighborhood,State,District,latitude,longitude
0,400001,Mumbai G.P.O.,Maharashtra,Mumbai,18.938536,72.836334
1,400001,Bazargate,Maharashtra,Mumbai,18.938536,72.836334
2,400001,Town Hall (Mumbai),Maharashtra,Mumbai,18.938536,72.836334
3,400001,Tajmahal,Maharashtra,Mumbai,18.938536,72.836334
4,400001,Stock Exchange,Maharashtra,Mumbai,18.938536,72.836334
5,400001,M.P.T.,Maharashtra,Mumbai,18.938536,72.836334
6,400002,Kalbadevi,Maharashtra,Mumbai,18.948366,72.825935
7,400002,S. C. Court,Maharashtra,Mumbai,18.948366,72.825935
8,400002,Ramwadi,Maharashtra,Mumbai,18.948366,72.825935
9,400002,Thakurdwar,Maharashtra,Mumbai,18.948366,72.825935


As seen above, multiple localities have a common pincode. We will combine this data at a pincode level and bring the dataframe in format which contains columns:

- postal code	
- Neighborhood	
- State	
- District	
- latitude	
- longitude

In [79]:
dedupe_mum_df = Mumbai_df.drop_duplicates(['postal code']).drop(columns = ['Neighborhood']).reset_index(drop=True)
dedupe_mum_df.head(10)

Unnamed: 0,postal code,State,District,latitude,longitude
0,400001,Maharashtra,Mumbai,18.938536,72.836334
1,400002,Maharashtra,Mumbai,18.948366,72.825935
2,400003,Maharashtra,Mumbai,18.95,72.8333
3,400004,Maharashtra,Mumbai,18.95,72.8167
4,400005,Maharashtra,Mumbai,18.9069,72.8106
5,400006,Maharashtra,Mumbai,18.95,72.7833
6,400007,Maharashtra,Mumbai,18.9667,72.8167
7,400008,Maharashtra,Mumbai,18.96714,72.828659
8,400009,Maharashtra,Mumbai,18.958296,72.838943
9,400010,Maharashtra,Mumbai,18.970188,72.845963


In [80]:
Mumbai_df_grp = Mumbai_df.groupby(by = ['postal code'])['Neighborhood'].apply(','.join).reset_index()

In [81]:
pd.set_option('display.max_colwidth', -1)
Mumbai_df_grp.head()

Unnamed: 0,postal code,Neighborhood
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T."
1,400002,"Kalbadevi,S. C. Court,Ramwadi,Thakurdwar"
2,400003,"Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid"
3,400004,"Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug"
4,400005,"Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini"


In [82]:
Mumbai_df_grp = Mumbai_df_grp.merge(dedupe_mum_df,how = 'left',on = 'postal code',validate='1:1',suffixes = (False,False))

In [83]:
print('Total Neighborhoods found in Mumbai: ' + str(Mumbai_df_grp.shape[0]))

Total Neighborhoods found in Mumbai: 89


In [84]:
Mumbai_df_grp.head(10)

Unnamed: 0,postal code,Neighborhood,State,District,latitude,longitude
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",Maharashtra,Mumbai,18.938536,72.836334
1,400002,"Kalbadevi,S. C. Court,Ramwadi,Thakurdwar",Maharashtra,Mumbai,18.948366,72.825935
2,400003,"Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid",Maharashtra,Mumbai,18.95,72.8333
3,400004,"Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug",Maharashtra,Mumbai,18.95,72.8167
4,400005,"Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini",Maharashtra,Mumbai,18.9069,72.8106
5,400006,Malabar Hill,Maharashtra,Mumbai,18.95,72.7833
6,400007,"Tardeo,Grant Road,S V Marg,N.S.Patkar Marg,Bharat Nagar (Mumbai)",Maharashtra,Mumbai,18.9667,72.8167
7,400008,"Falkland Road,Mumbai Central,M A Marg,J.J.Hospital,Kamathipura",Maharashtra,Mumbai,18.96714,72.828659
8,400009,"Noor Baug,Princess Dock,Chinchbunder",Maharashtra,Mumbai,18.958296,72.838943
9,400010,"Mazgaon,Mazgaon Road,V K Bhavan,Mazgaon Dock,Dockyard Road",Maharashtra,Mumbai,18.970188,72.845963


Let us now use the folium library to visualize the different neighborhoods from the dataframe **Mumbai_df_grp** on a map basis the latitude and longitude data.

In [85]:
address = 'Mumbai'

geolocator = Nominatim(user_agent="mum_explorer",timeout=50)
location = geolocator.geocode(address)
lat_mum = location.latitude
long_mum = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(lat_mum, long_mum))

The geograpical coordinate of Mumbai are 18.9387711, 72.8353355.


In [255]:
# create map of Mumbai using latitude and longitude values
map_mumbai = folium.Map(location=[lat_mum, long_mum], zoom_start=10)

# add markers to map
for lat, lng, postal_code, neighborhood in zip(Mumbai_df_grp['latitude'], Mumbai_df_grp['longitude'], Mumbai_df_grp['postal code'], Mumbai_df_grp['Neighborhood']):
    label = '{}, {}'.format(lat,lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

## Foursquare API
We will now use the Foursquare API to explore the venues falling in the **‘Nightlife’** category around various neighbourhoods in the city.

In [89]:
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [253]:
CLIENT_ID = 'enter here' # your Foursquare ID
CLIENT_SECRET = 'enter here' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: enter here
CLIENT_SECRET:enter here


In [58]:
# Loop through neighborhooods to fetch venues in 'Nightlife' category
limit = 100
radius = 1000
category_id = '4d4b7105d754a06376d81259'


venue_list = []
for index, rows in Mumbai_df_grp.iterrows():
    print(rows['Neighborhood'])
    lat = rows['latitude']
    long = rows['longitude']
    url = 'https://api.foursquare.com/v2//venues/explore?&ll={},{}&categoryId={}&radius={}&limit={}&client_id={}&client_secret={}&v={}'\
    .format(lat,long,category_id,radius,limit,CLIENT_ID, CLIENT_SECRET, VERSION)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    for v in results:
        l = [rows['postal code'],
        rows['Neighborhood'],
        rows['latitude'],
        rows['longitude'],
        v['venue']['name'],
        v['venue']['id'],
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']]
        venue_list.append(l)    

Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.
Kalbadevi,S. C. Court,Ramwadi,Thakurdwar
Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid
Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug
Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini
Malabar Hill
Tardeo,Grant Road,S V Marg,N.S.Patkar Marg,Bharat Nagar (Mumbai)
Falkland Road,Mumbai Central,M A Marg,J.J.Hospital,Kamathipura
Noor Baug,Princess Dock,Chinchbunder
Mazgaon,Mazgaon Road,V K Bhavan,Mazgaon Dock,Dockyard Road
Agripada,Chinchpokli,Jacob Circle,Haines Road
Parel,Chamarbaug,Haffkin Institute,Lal Baug,Parel Naka,Parel Rly Work Shop,BEST STaff Quarters
Delisle Road
Naigaon (Mumbai),Dadar,Dadar Colony
Sewri
Mahim,Mahim Bazar,Mori Road,Kapad Bazar,Mahim East
Dharavi Road,Dharavi
Worli,Worli Naka
Matunga
Marine Lines,Churchgate,Central Building
Nariman Point,New Yogakshema
Sion,Raoli Camp,Transit Camp,Chunabhatti
Nehru Nagar (Mumbai)
Prabhadevi,New Prabhadevi Road
Cumballa Sea Face,Gowalia Tank

In [97]:
Mumbai_venues_df = pd.DataFrame(venue_list)
Mumbai_venues_df.columns = ['postal code',
              'Neighborhood', 
              'Neighborhood Latitude', 
              'Neighborhood Longitude', 
              'Venue', 
              'id',
              'Venue Latitude', 
              'Venue Longitude', 
              'Venue Category']

Mumbai_venues_df.to_csv('Mumbai_venues_df.csv',index=False)
# Mumbai_venues_df = pd.read_csv('Mumbai_venues_df.csv')
Mumbai_venues_df.head(10)

Unnamed: 0,postal code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,id,Venue Latitude,Venue Longitude,Venue Category
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Town House Cafe,5263e1ba11d265711e8024bf,18.93855,72.833464,Bar
1,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,The Clearing House,5810c12738faa0b87f99d404,18.935328,72.838263,Lounge
2,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,5 Spice,4be18fe88815b713ef9e6406,18.933491,72.835955,Chinese Restaurant
3,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Press Club Mumbai,4dca583cae607b31c0bdf19c,18.940721,72.832469,Bar
4,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Sunlight Bar,4e19a3221f6eb9559885364c,18.944448,72.829234,Bar
5,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Bottle bar,517830f4498e1f75ba5bb195,18.930246,72.833423,Lounge
6,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,The Bar Terminal,567ba37f498e1d6a14d30821,18.930338,72.833421,Cocktail Bar
7,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Spices & Flavours,4fd842aad5fb0913de97e8c4,18.930272,72.833417,Indian Restaurant
8,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,S2 Restobar,4e468e46fa76a07fde5a95fd,18.938573,72.833671,Bar
9,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,Mustafa juice,4f22da48e4b0008740ad4dd6,18.935902,72.833929,Sake Bar


## Methodology <a name="methodology"></a>
Now that the data collected and cleaned we can discuss the methodology and the approach we are going to take to solve our problem i.e. recommend an optimum location for opening a Gastropub/Lounge.

We will look at density of top 10 most popular venue catrgories in each neighborhood and then run a k-means clustering     algorithm to divide the neighborhoods into clusters. We will then analyse each of these clusters to arrive at the optimum location for starting our own gastropub/lounge.

Let us first analyze the data that we have collected and cleaned in the previous steps.

## Data Analysis <a name="analysis"></a>

In [98]:
# Total venues found & total unique categories found
print("Total nightlife venues found: " +str(Mumbai_venues_df.shape[0]))
print("Number of unique venue categories found: " +str(len(Mumbai_venues_df['Venue Category'].unique())))
print("Number of neighborhoods: " +str(len(Mumbai_venues_df['Neighborhood'].unique())))

Total nightlife venues found: 726
Number of unique venue categories found: 43
Number of neighborhoods: 78


From the above analysis we see that a total 726 venues belonging to 43 distinct nightlife categories were found in Mumbai neighborhoods. Also out of the 89 neighborhoods in **Mumbai_df_grp** dataset we could find nightlife venues in only 78 neighborhoods. This might be due to lack of data in Foursquare about nightlife venues in these localities.

We will later cluster these 11 neighborhoods into a separate cluster as we will not be able to make any decisions regarding these neighborhoods due to lack of data. Let's see which neighborhoods fall in this cluster.

In [99]:
#Neighborhoods where we could not find any veneues for our category:
Mumbai_df_grp[~Mumbai_df_grp['Neighborhood'].isin(Mumbai_venues_df['Neighborhood'])].reset_index(drop=True)

Unnamed: 0,postal code,Neighborhood,State,District,latitude,longitude
0,400006,Malabar Hill,Maharashtra,Mumbai,18.95,72.7833
1,400013,Delisle Road,Maharashtra,Mumbai,18.9448,72.8524
2,400063,"Sharma Estate,Goregaon East",Maharashtra,Mumbai,19.1624,72.8694
3,400082,"Mulund Colony,Bhandup Complex",Maharashtra,Mumbai,19.1247,72.9488
4,400083,"Kannamwar Nagar,Tagore Nagar",Maharashtra,Mumbai,19.1247,72.9488
5,400084,Barve Nagar,Maharashtra,Mumbai,19.1247,72.9488
6,400085,BARC,Maharashtra,Mumbai,19.1247,72.9488
7,400086,"Ghatkopar West,Rifle Range,Sahakar Bhavan",Maharashtra,Mumbai,19.1247,72.9488
8,400087,"Sandeepany Sadhanalya,NITIE",Maharashtra,Mumbai,19.1247,72.9488
9,400088,"Trombay,T.F.Donar,Govandi",Maharashtra,Mumbai,19.0333,72.9333


Let us also have a look at the count of various venue categories present in our data. We can see from the data below that Gastropubs fall in top 10 venue categories. 

Also other categories like bars and lounges might be saturated and we may face stiff competition if we enter into these categories. A category like Gastropub which is popular yet not saturated can give us a nice niche to operate in.

In [227]:
#Let us have a look at all the categories that have been found in our neighborhoods
pd.set_option('display.max_rows', 500)
pd.value_counts(Mumbai_venues_df['Venue Category'])

Bar                                189
Lounge                             139
Pub                                58 
Nightclub                          42 
Hotel Bar                          39 
Hookah Bar                         35 
Cocktail Bar                       21 
Gastropub                          20 
Sports Bar                         16 
Restaurant                         16 
Café                               16 
Brewery                            14 
Hotel                              14 
Indian Restaurant                  13 
Wine Bar                           10 
Beer Garden                        9  
Italian Restaurant                 9  
Whisky Bar                         8  
Asian Restaurant                   5  
Mediterranean Restaurant           5  
Tapas Restaurant                   5  
Karaoke Bar                        3  
Bistro                             3  
Gym                                3  
Coffee Shop                        3  
Chinese Restaurant       

**Part 1: Top 10 venue categories in each neighborhood**
- In the very first step we will encode our venue categories into numeric data using one hot encoding.
- Then we will find the average number of each venue categories present in each neighborhood.
- Basis the average number of venue categories per neighborhood we will arrive at top 10 categories in each neighborhood


In [101]:
# one hot encoding
mumbai_onehot = pd.get_dummies(Mumbai_venues_df[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = Mumbai_venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.shape

(726, 44)

In [228]:
#calculating the mean of venue categories in each location
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped.head(10)

Unnamed: 0,Neighborhood,Asian Restaurant,Athletics & Sports,Bar,Beer Bar,Beer Garden,Bistro,Brewery,Café,Campground,...,Pub,Restaurant,Sake Bar,Seafood Restaurant,Speakeasy,Sports Bar,Tapas Restaurant,Tea Room,Whisky Bar,Wine Bar
0,"Agripada,Chinchpokli,Jacob Circle,Haines Road",0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Antop Hill,C G S Colony,B P T Colony,Wadala Truck Terminal",0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Anushakti Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
3,"Audit Bhavan,Kherwadi,Bandra(East),B.N. Bhavan,Government Colony",0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Azad Nagar (Mumbai),Andheri",0.0,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bandra West,0.0,0.0,0.28125,0.0,0.03125,0.03125,0.0,0.03125,0.0,...,0.15625,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
6,Bhandup East,0.0,0.0,0.275862,0.0,0.034483,0.0,0.0,0.0,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Borivali,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Borivali West,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
9,Chakala Midc,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0


In [128]:
#Function for returning top 10 venue categories in each neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [129]:
mumbai_grouped.shape

(78, 44)

In [130]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

In [131]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Agripada,Chinchpokli,Jacob Circle,Haines Road",Bar,Campground,Hookah Bar,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
1,"Antop Hill,C G S Colony,B P T Colony,Wadala Truck Terminal",Bar,Hotel,Gastropub,Lounge,Pub,Cocktail Bar,Grocery Store,Food,Event Space,Dive Bar
2,Anushakti Nagar,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
3,"Audit Bhavan,Kherwadi,Bandra(East),B.N. Bhavan,Government Colony",Bar,Gastropub,Indian Restaurant,Lounge,Pub,Dim Sum Restaurant,Wine Bar,Coffee Shop,Grocery Store,Food
4,"Azad Nagar (Mumbai),Andheri",Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar


**Part 2: Clustering**

We now begin the steps for clustering our data. 

- We use the average number of venue categories per location for our clustering excercise
- We will then merge clustered data & top 10 venue categories data in order to draw conclusions about popularity of venue categories in different clusters

In [113]:
from sklearn.cluster import KMeans

In [136]:
# set number of clusters
kclusters = 3

mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 0, 1, 2, 1, 0, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 1, 2, 2,
       2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 1])

In [239]:
neighborhoods_venues_sorted.drop(['Cluster Labels'],axis=1,inplace=True)

In [241]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head(10)
mumbai_clustered_final = Mumbai_df_grp

# merge mumbai_grouped with mumbai_data to add latitude/longitude for each neighborhood
mumbai_clustered_final = mumbai_clustered_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
mumbai_clustered_final.drop(['State','District'],inplace=True,axis=1)
mumbai_clustered_final.head()

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,2.0,Bar,Lounge,Chinese Restaurant,Indian Restaurant,Sake Bar,Cocktail Bar,Coffee Shop,Grocery Store,Gastropub,Food
1,400002,"Kalbadevi,S. C. Court,Ramwadi,Thakurdwar",18.948366,72.825935,2.0,Lounge,Bar,Gastropub,Wine Bar,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
2,400003,"Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid",18.95,72.8333,2.0,Bar,Hookah Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
3,400004,"Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug",18.95,72.8167,2.0,Hookah Bar,Indian Restaurant,Gastropub,Lounge,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
4,400005,"Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini",18.9069,72.8106,2.0,Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar


In [242]:
#Create a separate cluster for neighborhoods where no venue was found via Foursquare API
mumbai_clustered_final.loc[mumbai_clustered_final['Cluster Labels'].isna(),'Cluster Labels']=3

In [243]:
# Let us evaluate the number of neihborhoods in each cluster
pd.value_counts(mumbai_clustered_final['Cluster Labels'])

2.0    58
1.0    13
3.0    11
0.0    7 
Name: Cluster Labels, dtype: int64

Now we will plot these clusters on a map in order to visualize their locations and analyzing each cluster in order to make our final decision regarding the location of the Gastropub.

In [244]:
map_clusters = folium.Map(location=[lat_mum, long_mum], zoom_start=11)
color_map = ['red','green','blue','yellow']
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_clustered_final['latitude'], mumbai_clustered_final['longitude'], mumbai_clustered_final['Neighborhood'], mumbai_clustered_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    int(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=color_map[int(cluster)],
        fill=True,
        fill_color=color_map[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

In [246]:
mumbai_clustered_final[mumbai_clustered_final['Cluster Labels'] == 0]

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
78,400093,Chakala Midc,19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
79,400094,Anushakti Nagar,19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
80,400095,"Kharodi,Ins Hamla",19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
81,400096,Seepz,19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
82,400097,"Malad East,Rani Sati Marg",19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
83,400098,Vidyanagari,19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar
84,400099,"Sahar P & T Colony,International Airport,Sahargaon,Airport (Mumbai)",19.2355,72.8468,0.0,Lounge,Sports Bar,Restaurant,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space,Dive Bar


In [245]:
mumbai_clustered_final[mumbai_clustered_final['Cluster Labels'] == 1]

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,400010,"Mazgaon,Mazgaon Road,V K Bhavan,Mazgaon Dock,Dockyard Road",18.970188,72.845963,1.0,Hotel Bar,Bar,Beer Garden,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
10,400011,"Agripada,Chinchpokli,Jacob Circle,Haines Road",18.9833,72.8333,1.0,Bar,Campground,Hookah Bar,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
15,400016,"Mahim,Mahim Bazar,Mori Road,Kapad Bazar,Mahim East",19.0333,72.85,1.0,Bar,Cocktail Bar,Hotel Bar,Food,Café,Coffee Shop,Gym,Grocery Store,Gastropub,Event Space
16,400017,"Dharavi Road,Dharavi",19.05,72.8667,1.0,Bar,Nightclub,Cocktail Bar,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar,Dim Sum Restaurant
17,400018,"Worli,Worli Naka",19.0167,72.8167,1.0,Nightclub,Bar,Café,Cocktail Bar,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar
18,400019,Matunga,19.0333,72.85,1.0,Bar,Cocktail Bar,Hotel Bar,Food,Café,Coffee Shop,Gym,Grocery Store,Gastropub,Event Space
21,400022,"Sion,Raoli Camp,Transit Camp,Chunabhatti",19.0333,72.85,1.0,Bar,Cocktail Bar,Hotel Bar,Food,Café,Coffee Shop,Gym,Grocery Store,Gastropub,Event Space
51,400064,"Malad West Dely,Orlem,Liberty Garden,Malad",19.197,72.845,1.0,Hotel Bar,Bar,Grocery Store,Wine Bar,Coffee Shop,Gym,Gastropub,Food,Event Space,Dive Bar
55,400068,"Dahisar,Ketkipada,Dahisar RS",19.2565,72.8733,1.0,Hotel Bar,Bar,Karaoke Bar,Dive Bar,Wine Bar,Coffee Shop,Grocery Store,Gastropub,Food,Event Space
57,400070,"Kurla North,Kurla,Netajinagar",19.0713,72.883,1.0,Bar,Karaoke Bar,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar


In [247]:
mumbai_clustered_final[mumbai_clustered_final['Cluster Labels'] == 2]

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,2.0,Bar,Lounge,Chinese Restaurant,Indian Restaurant,Sake Bar,Cocktail Bar,Coffee Shop,Grocery Store,Gastropub,Food
1,400002,"Kalbadevi,S. C. Court,Ramwadi,Thakurdwar",18.948366,72.825935,2.0,Lounge,Bar,Gastropub,Wine Bar,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
2,400003,"Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid",18.95,72.8333,2.0,Bar,Hookah Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
3,400004,"Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug",18.95,72.8167,2.0,Hookah Bar,Indian Restaurant,Gastropub,Lounge,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
4,400005,"Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini",18.9069,72.8106,2.0,Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar
6,400007,"Tardeo,Grant Road,S V Marg,N.S.Patkar Marg,Bharat Nagar (Mumbai)",18.9667,72.8167,2.0,Nightclub,Event Space,Brewery,Restaurant,Lounge,Cocktail Bar,Grocery Store,Gastropub,Food,Dive Bar
7,400008,"Falkland Road,Mumbai Central,M A Marg,J.J.Hospital,Kamathipura",18.96714,72.828659,2.0,Bar,Wine Bar,Beer Garden,Cocktail Bar,Hookah Bar,Nightclub,Bistro,Brewery,Café,Campground
8,400009,"Noor Baug,Princess Dock,Chinchbunder",18.958296,72.838943,2.0,Bar,Hookah Bar,Cocktail Bar,Hotel Bar,Brewery,Dim Sum Restaurant,Gym,Grocery Store,Gastropub,Food
11,400012,"Parel,Chamarbaug,Haffkin Institute,Lal Baug,Parel Naka,Parel Rly Work Shop,BEST STaff Quarters",19.0,72.8333,2.0,Lounge,Bar,Pub,Brewery,Nightclub,Mediterranean Restaurant,Mexican Restaurant,Hotel Bar,Cocktail Bar,Molecular Gastronomy Restaurant
13,400014,"Naigaon (Mumbai),Dadar,Dadar Colony",19.0201,72.8381,2.0,Bar,Lounge,Hotel,Gastropub,Pub,Cocktail Bar,Grocery Store,Food,Event Space,Dive Bar


In [248]:
mumbai_clustered_final[mumbai_clustered_final['Cluster Labels'] == 3]

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,400006,Malabar Hill,18.95,72.7833,3.0,,,,,,,,,,
12,400013,Delisle Road,18.9448,72.8524,3.0,,,,,,,,,,
50,400063,"Sharma Estate,Goregaon East",19.1624,72.8694,3.0,,,,,,,,,,
68,400082,"Mulund Colony,Bhandup Complex",19.1247,72.9488,3.0,,,,,,,,,,
69,400083,"Kannamwar Nagar,Tagore Nagar",19.1247,72.9488,3.0,,,,,,,,,,
70,400084,Barve Nagar,19.1247,72.9488,3.0,,,,,,,,,,
71,400085,BARC,19.1247,72.9488,3.0,,,,,,,,,,
72,400086,"Ghatkopar West,Rifle Range,Sahakar Bhavan",19.1247,72.9488,3.0,,,,,,,,,,
73,400087,"Sandeepany Sadhanalya,NITIE",19.1247,72.9488,3.0,,,,,,,,,,
74,400088,"Trombay,T.F.Donar,Govandi",19.0333,72.9333,3.0,,,,,,,,,,


## Results & Discussion <a name="results"></a>

From the above analysis we see that 58 out of 89 neighborhoods fall in cluster 2. Cluster 1 & 0 has 13 & 7 neighborhoods respectively. As pointed out earlier Cluster 3 is a cluster created to put together all the neighborhoods where we could not find any venue ionformation and hence this will not be used for making any decisions.

For deciding the optimum location of our Gastropub business we will evaluate the clusters from 2 perspectives: a.) Location b.) Popularity of our venue category
- **Location:** From map we can see that cluster 2 is spread quite uniformly throughout the city making it a favourable cluster for opening our business. This gives us an multiple options in the city and we can select a location that suits our budget. The places in South Mumbai will tend to be much more costly than those in the Western suburbs.
- **Popularity:** Based on the information available for the top 10 venue categories we can see that Gastropubs feature predominantly in cluster 2 & cluster 0. However cluster 0 has only 7 neighborhoods and locations are very near to each other (they have same lat longs in our data and are represented by the single red dot in the cluster map). Choosing cluster 0  will limit our options and also we cannot rely on the data as the venue information for these locations is also very limited.

From the above analysis we can say that **cluster 2** neighborhoods can be good potential candidates for opening a **Gastropub** given the location options and the popularity of our venue category. Specifically we would recommend locations in cluster 2 where Gastropubs are present in top 10 locations.
Thus we will end up with **46 locations** to choose from in **cluster 2** as shown below. The final decision will be made in discussion with the stakeholders and additional considerations like location cost, local permits, zoning details etc. will come into play which have not been considered for our analysis.

In [249]:
test = [mumbai_clustered_final[col].str.contains("Gastropub", na=False) for col in mumbai_clustered_final.iloc[:,7:]]

In [252]:
pref_loc = mumbai_clustered_final.loc[np.column_stack(test).any(axis=1)]
pref_loc[pref_loc['Cluster Labels'] == 2].reset_index(drop = True)

Unnamed: 0,postal code,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,"Mumbai G.P.O.,Bazargate,Town Hall (Mumbai),Tajmahal,Stock Exchange,M.P.T.",18.938536,72.836334,2.0,Bar,Lounge,Chinese Restaurant,Indian Restaurant,Sake Bar,Cocktail Bar,Coffee Shop,Grocery Store,Gastropub,Food
1,400002,"Kalbadevi,S. C. Court,Ramwadi,Thakurdwar",18.948366,72.825935,2.0,Lounge,Bar,Gastropub,Wine Bar,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
2,400003,"Mandvi (Mumbai),Null Bazar,B.P.Lane,Masjid",18.95,72.8333,2.0,Bar,Hookah Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space
3,400004,"Girgaon,Opera House,Ambewadi (Mumbai),Charni Road,Chaupati,Madhavbaug",18.95,72.8167,2.0,Hookah Bar,Indian Restaurant,Gastropub,Lounge,Coffee Shop,Gym,Grocery Store,Food,Event Space,Dive Bar
4,400005,"Colaba,Holiday Camp,V.W.T.C.,Colaba Bazar,Asvini",18.9069,72.8106,2.0,Bar,Lounge,Wine Bar,Coffee Shop,Gym,Grocery Store,Gastropub,Food,Event Space,Dive Bar
5,400007,"Tardeo,Grant Road,S V Marg,N.S.Patkar Marg,Bharat Nagar (Mumbai)",18.9667,72.8167,2.0,Nightclub,Event Space,Brewery,Restaurant,Lounge,Cocktail Bar,Grocery Store,Gastropub,Food,Dive Bar
6,400009,"Noor Baug,Princess Dock,Chinchbunder",18.958296,72.838943,2.0,Bar,Hookah Bar,Cocktail Bar,Hotel Bar,Brewery,Dim Sum Restaurant,Gym,Grocery Store,Gastropub,Food
7,400014,"Naigaon (Mumbai),Dadar,Dadar Colony",19.0201,72.8381,2.0,Bar,Lounge,Hotel,Gastropub,Pub,Cocktail Bar,Grocery Store,Food,Event Space,Dive Bar
8,400021,"Nariman Point,New Yogakshema",18.9274,72.8241,2.0,Bar,Café,Gastropub,Wine Bar,Nightclub,Lounge,Hotel,Karaoke Bar,Indian Restaurant,Pub
9,400025,"Prabhadevi,New Prabhadevi Road",19.0164,72.8294,2.0,Bar,Dive Bar,Restaurant,Pub,Wine Bar,Cocktail Bar,Grocery Store,Gastropub,Food,Event Space


In [251]:
#Let us plot the above preffered locations on the map
map_clusters = folium.Map(location=[lat_mum, long_mum], zoom_start=11)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pref_loc['latitude'], pref_loc['longitude'], pref_loc['Neighborhood'], pref_loc['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    int(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

## Conclusion <a name="conclusion"></a>

We started with a problem statement of finding an optimum location for starting a business in **'Nightlife'** category in Mumbai. From the data we concluded that we will be entering **Gastropub** category which gives us a unique niche. Also basis the clustering analysis and popular venues across locations we have concluded that we will open the Gastropub in **cluster 2**. This will give us a shortlist of **45 locations** across the city to choose from.