# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

### "Predicting ideal location for setting up a new supermarket in the city of Mumbai."

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## 1. Introduction: Business Problem <a name="introduction"></a>

**Problem description**: Mumbai city situated in Maharashtra, India is a populous city comprising of nearly 21 million people closely huddled within a meagre 603.4 km² area. The aim of this project is to find a reasonably densely populated area to open up a new supermarket. This would mean that Mumbai City as a whole will be segmented and clustered to meet the desired goal. By leveraging the foursquare location data, each area will be analysed in this process.

**Background discussion:** Mumbai, also called Bombay, is the capital city of the state of Maharashtra in India, and it's the most populous city in India. As the 4th most populous city in the world and one of the populous urban regions in the world, Mumbai has a metro population of about 20,185,064 in 2019. Mumbai's demographics relate to us that the city is considered a melting pot due to all of the migrants that relocate to the city for employment opportunities. With workforce readily available the major issue to opening up a supermarket is the tiresome process of acquiring large lands; which is a tedious job given the population explosion. Assuming that stakeholders involved have influence, power and funds to readily acquire a piece of land of their choice, this project will help facilitate them to find an optimum location for setting up the supermarket. The key influencing factors would be-
1.	How populated the area is? A reasonably populated area would mean more client base.
2.	Is there another supermarket in the location of interest? One would definitely want less competition or none at all.
3.	Is the area of interest in plush locality? This will target customers with higher income for a greater profit margin and revenue generation.

Keeping the above key points in mind, the exploratory analysis of Mumbai City will be undertaken to affix a suitable location for the said supermarket.


## 2. Data <a name="data"></a>

#### Data description and how it will be used to solve the problem:

1.	**Statistical information on Mumbai’s population and its density**. http://worldpopulationreview.com/world-cities/mumbai-population/
2.	**A map rendering detailed classification of population density in Mumbai with key highlights:** Census data reveals that population density varies noticeably from area to area. Small area census data do a better job depicting where the crowded neighborhoods are. In this map, areas of highest density exceed 30,000 persons per square kilometer.  Very high density areas exceed 7,000 persons per square kilometer.  High density areas exceed 5,200 persons per square kilometer. The last categories break at 3,330 persons per square kilometer, and 1,500 persons per square kilometer.
https://www.arcgis.com/home/webmap/viewer.html?webmap=d39c316b61364e86918fe566aaccf54e
3.	**A complete list of boroughs and neighbourhoods in Mumbai city:** This data will be used to construct dataset comprising of boroughs, neighbourhoods, latitude and longitude co-ordinates of corresponding neighbourhoods. https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai 
4.	**Latitude and longitude co-ordinates of various neighbourhoods in Mumbai city:**  This data will be used to construct dataset comprising of boroughs, neighbourhoods, latitude and longitude co-ordinates of corresponding neighbourhoods.
https://www.latlong.net/place/mumbai-maharashtra-india-27236.html
5.	**Data pertaining to venues of specific location:** The foursquare location data will be leveraged for all our location data requirements.
https://foursquare.com/
6.	**Data on real estate pricing for each neighbourhood of Mumbai City:** This data will be used to identify and categorize customers on the basis of their income. The real estate pricing of neighbourhoods will be a key indicator of that.
https://www.makaan.com/price-trends/property-rates-for-buy-in-mumbai 


## 3. Methodology <a name="methodology"></a>

1. COMPILING DATASET: The process of shortlisting the most suitable areas where a supermarket can be located starts off by formulating a dataset which comprises of basic information like the name of borough, neighbourhood, its associated latitude and longitude coordinates, population density and real estate pricing. Its interesting to note that the data pertaining to population density and real estate pricing was availaible on the internet in the form of categories namely Highest, Very High, High, Low and Very Low. However each of the categories had well defined ranges as seen in th coming section. This dataset will be labelled "mumbai".

2. EXPLORE 'MUMBAI' DATASET: Next step will be to explore the dataset for mumbai city first by visualizing all of its neighbourhood on a map. We use the folium library to render the map.

3. FIND EXISTING SUPERMARKETS: All of mumbai city will be scanned for supermarkets with their location.We take the help of foursquare for this step. Observe the borough which has least supermarkets.

4. FILTER A BOROUGH: After closely observing a borough, we filter out a suitable borough based on population density. We want it to be in the higher range. Also the borough should have least number of supermarkets to avoid unwanted competition.

5. EXPLORE THE BOROUGH: Explore and visualize the borough, check out each neighbourhood and find out popular venues. The foursquare locaction data will be extracted in this step.

6. CLUSTER NEIGHBOURHOODS: The next step would be to cluster the neighbourhoods by running k means clustering algorithm. This will bring together similar neighbourhoods and bring apart dissimilar neighbourhoods.

7. OBSERVE AND PREDICT: The clusters will then be observed to identify most suitable traits of neighbourhoods which will be conducive to the prospect of starting a new supermarket. 


##### Downloading the required dependencies

In [40]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Load and explore the dataset for Mumbai City

The rubrics defined for parameters **"Population Density"** and **"Average real estate pricing"** for Mumbai City are as follows:

In [41]:
rubrics = pd.read_csv('range.csv')
rubrics

Unnamed: 0,Parameter,Highest,Very High,High,Low,Very Low
0,Population Density/sq.ft,> 30000,>7000,>5200,3300,1500
1,Average real estate pricing/sq.ft,>80000,>60000,>40000,>20000,<20000


Loading the above data in the main dataset 

In [42]:
mumbai = pd.read_csv('mumbai.csv')
mumbai

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Population Density,Average real estate pricing
0,Western Suburb,Andheri,19.120371,72.848043,Highest,High
1,Western Suburb,Bandra,19.054979,72.84022,Very High,Very High
2,Western Suburb,Borivali,19.228739,72.856877,Very High,Low
3,Western Suburb,Dahisar,19.257178,72.857536,Very High,Low
4,Western Suburb,Goregaon,18.153715,73.295064,Very High,Low
5,Western Suburb,Jogeshwari,19.134899,72.84882,Very High,Low
6,Western Suburb,Juhu,19.107021,72.827528,Very High,High
7,Western Suburb,Kandivali west,19.20838,72.842227,Highest,Low
8,Western Suburb,Khar,19.072458,72.833707,Very High,Very High
9,Western Suburb,Malad,19.184677,72.835807,Very High,Highest


In [43]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(mumbai['Borough'].unique()),
        mumbai.shape[0]
    )
)

The dataframe has 4 boroughs and 37 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Mumbai City.

In [44]:
address = 'mumbai,india'

geolocator = Nominatim(user_agent="mum_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai City are 18.9387711, 72.8353355.


#### Create a map of Mumbai City with neighborhoods superimposed on top.

In [45]:
# create map of Mumbai using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(mumbai['Latitude'], mumbai['Longitude'], mumbai['Borough'], mumbai['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

Let's simplify the above map and segment and cluster only the neighborhoods in South Mumbai. This is because we aim to locate the new supermarket in a populous area. *South Mumbai is the only borough with maximum number of "Highest" population density neighbourhoods to its name*. So let's slice the original dataframe and create a new dataframe of the South Mumbai data.

In [46]:
SouthMumbai = mumbai[mumbai['Borough'] == 'South Mumbai'].reset_index(drop=True)
SouthMumbai.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Population Density,Average real estate pricing
0,South Mumbai,Antop Hill,19.020761,72.865256,Highest,Low
1,South Mumbai,Byculla,18.976622,72.832794,Highest,High
2,South Mumbai,Colaba,18.915091,72.825969,Highest,Very High
3,South Mumbai,Dadar,19.023823,72.839427,Highest,High
4,South Mumbai,Fort,18.933267,72.834515,Very High,Low


Lets get the geographical coordinates of South Mumbai

In [47]:
address = 'south mumbai, india'

geolocator = Nominatim(user_agent="mum_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of South Mumbai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of South Mumbai are 18.9387711, 72.8353355.


In [48]:
# create map of South Mumbai using latitude and longitude values
map_SouthMumbai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(SouthMumbai['Latitude'], SouthMumbai['Longitude'],SouthMumbai['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_SouthMumbai)  
    
map_SouthMumbai

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [49]:
CLIENT_ID = 'PPV43AY4P4IHA1EJW4KMVJ5QZ00Y2VBSPL15TVUVMS0PLC3T' # your Foursquare ID
CLIENT_SECRET = 'Q4E0CB1KWKOZ42NOJ2E5YRX1JSWX2TTVMK42ZTTCAARPL4XP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PPV43AY4P4IHA1EJW4KMVJ5QZ00Y2VBSPL15TVUVMS0PLC3T
CLIENT_SECRET:Q4E0CB1KWKOZ42NOJ2E5YRX1JSWX2TTVMK42ZTTCAARPL4XP


#### Lets check for *malls* and *supermarkets* in Mumbai City

In [50]:
url1 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, 18.9387711, 72.8353355, VERSION,'supermarket',30000,100)
url1

'https://api.foursquare.com/v2/venues/search?client_id=PPV43AY4P4IHA1EJW4KMVJ5QZ00Y2VBSPL15TVUVMS0PLC3T&client_secret=Q4E0CB1KWKOZ42NOJ2E5YRX1JSWX2TTVMK42ZTTCAARPL4XP&ll=18.9387711,72.8353355&v=20180605&query=supermarket&radius=30000&limit=100'

In [51]:
results1 = requests.get(url1).json()
results1

{'meta': {'code': 200, 'requestId': '5c947c58351e3d4c7f93da40'},
 'response': {'venues': [{'id': '4df04eeed4c04d0392c58011',
    'name': 'Suryodaya',
    'location': {'crossStreet': 'Opp. Churchgate Station',
     'lat': 18.93356949853369,
     'lng': 72.82728152818353,
     'labeledLatLngs': [{'label': 'display',
       'lat': 18.93356949853369,
       'lng': 72.82728152818353}],
     'distance': 1026,
     'cc': 'IN',
     'city': 'Mumbai',
     'state': 'Mahārāshtra',
     'country': 'India',
     'formattedAddress': ['Opp. Churchgate Station',
      'Mumbai',
      'Mahārāshtra',
      'India']},
    'categories': [{'id': '4d954b0ea243a5684a65b473',
      'name': 'Convenience Store',
      'pluralName': 'Convenience Stores',
      'shortName': 'Convenience Store',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/conveniencestore_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1553235032',
    'hasPerk': False},
   {'id': '4d0f1d3cba3

In [52]:
# assign relevant part of JSON to venues
venues1 = results1['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues1)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '4d954b0ea243a5684a65b473', 'name': 'C...",False,4df04eeed4c04d0392c58011,,IN,Mumbai,India,Opp. Churchgate Station,1026,"[Opp. Churchgate Station, Mumbai, Mahārāshtra,...","[{'label': 'display', 'lat': 18.93356949853369...",18.933569,72.827282,,Mahārāshtra,Suryodaya,v-1553235032
1,"[{'id': '4bf58dd8d48988d1f6941735', 'name': 'D...",False,4d0f1d3cba378cfa7e436f93,Kohinoor City,IN,Mumbai,India,"Off LBS Marg, Kurla West",16808,"[Kohinoor City (Off LBS Marg, Kurla West), Mum...","[{'label': 'display', 'lat': 19.08212271265058...",19.082123,72.885507,400070.0,Mahārāshtra,Dhanraj Supermarket,v-1553235032
2,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",False,4b9b37cbf964a52082fa35e3,"Haware Parekh Chamber, Sion Trombay Road, Chembur",IN,Mumbai,India,Opposite Union Park,14418,"[Haware Parekh Chamber, Sion Trombay Road, Che...","[{'label': 'display', 'lat': 19.049532, 'lng':...",19.049532,72.906343,400071.0,Mahārāshtra,Ratna Supermarket,v-1553235032
3,"[{'id': '4bf58dd8d48988d1ff941735', 'name': 'M...",False,50166a26e4b080ff6c3f66c9,Shop 6+7 Beach View Appts.,IN,Mumbai,India,"77 Chimbai Rd, Bandra West",13324,"[Shop 6+7 Beach View Appts. (77 Chimbai Rd, Ba...","[{'label': 'display', 'lat': 19.05796130691874...",19.057961,72.823686,400050.0,Mahārāshtra,Society Supermarket,v-1553235032
4,"[{'id': '4bf58dd8d48988d118951735', 'name': 'G...",False,4e394081e4cd799aaeee9825,Dadar,IN,Mumbai,India,Ranade Road,9164,"[Dadar (Ranade Road), Mumbai 4000028, Mahārāsh...","[{'label': 'display', 'lat': 19.02096631437195...",19.020966,72.840254,4000028.0,Mahārāshtra,Sarvodaya Supermarket ¤,v-1553235032


In [53]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Suryodaya,Convenience Store,,IN,Mumbai,India,Opp. Churchgate Station,1026,"[Opp. Churchgate Station, Mumbai, Mahārāshtra,...","[{'label': 'display', 'lat': 18.93356949853369...",18.933569,72.827282,,Mahārāshtra,4df04eeed4c04d0392c58011
1,Dhanraj Supermarket,Department Store,Kohinoor City,IN,Mumbai,India,"Off LBS Marg, Kurla West",16808,"[Kohinoor City (Off LBS Marg, Kurla West), Mum...","[{'label': 'display', 'lat': 19.08212271265058...",19.082123,72.885507,400070.0,Mahārāshtra,4d0f1d3cba378cfa7e436f93
2,Ratna Supermarket,Supermarket,"Haware Parekh Chamber, Sion Trombay Road, Chembur",IN,Mumbai,India,Opposite Union Park,14418,"[Haware Parekh Chamber, Sion Trombay Road, Che...","[{'label': 'display', 'lat': 19.049532, 'lng':...",19.049532,72.906343,400071.0,Mahārāshtra,4b9b37cbf964a52082fa35e3
3,Society Supermarket,Miscellaneous Shop,Shop 6+7 Beach View Appts.,IN,Mumbai,India,"77 Chimbai Rd, Bandra West",13324,"[Shop 6+7 Beach View Appts. (77 Chimbai Rd, Ba...","[{'label': 'display', 'lat': 19.05796130691874...",19.057961,72.823686,400050.0,Mahārāshtra,50166a26e4b080ff6c3f66c9
4,Sarvodaya Supermarket ¤,Grocery Store,Dadar,IN,Mumbai,India,Ranade Road,9164,"[Dadar (Ranade Road), Mumbai 4000028, Mahārāsh...","[{'label': 'display', 'lat': 19.02096631437195...",19.020966,72.840254,4000028.0,Mahārāshtra,4e394081e4cd799aaeee9825
5,aastha supermarket,Department Store,Dadar west,IN,,India,,8464,"[Dadar west, India]","[{'label': 'display', 'lat': 19.01474391920790...",19.014744,72.831905,,,4f437d20e4b03c8ff6865902
6,Dhanraj's Supermarket,Shopping Mall,Kohinoor City,IN,Mumbai,India,Kurla West,8905,"[Kohinoor City (Kurla West), Mumbai 400070, Ma...","[{'label': 'display', 'lat': 19.01847117947931...",19.018471,72.842666,400070.0,Mahārāshtra,4c5ab1f5d3aee21ea03b6b55
7,naaz supermarket,Grocery Store,"Wadala, Mumbai, Maharashtra",IN,Mumbai,India,,9738,"[Wadala, Mumbai, Maharashtra, Mumbai, Mahārāsh...","[{'label': 'display', 'lat': 19.02216416818391...",19.022164,72.863284,,Mahārāshtra,4f143717e4b0a6ade3032801
8,A P Mani Supermarket,Department Store,Amul Apartments,IN,Mumbai,India,"St. Anthony Road, Chembur",14858,"[Amul Apartments (St. Anthony Road, Chembur), ...","[{'label': 'display', 'lat': 19.05504247965011...",19.055042,72.904662,400071.0,Mahārāshtra,517532e7e4b016400fc27f39
9,Chandan Supermarket,Supermarket,Sanduwadi,IN,Mumbai,India,,14357,"[Sanduwadi, Mumbai 400071, Mahārāshtra, India]","[{'label': 'display', 'lat': 19.053839, 'lng':...",19.053839,72.896954,400071.0,Mahārāshtra,5b951e470a08ab002c81e33e


There are a total of 50 supermarkets in Mumbai City. On closely observing the above dataframe one would realize that majority ofsupermarkets exist in 'Western Suburb' borough. This is conducive to our predicament that 'South Mumbai' could be the ideal place to set up the new supermarket. 

#### Let's explore the first neighborhood in our  SouthMumbai dataframe.

Getting the neighbourhoods name

In [54]:
SouthMumbai.loc[0, 'Neighbourhood']

'Antop Hill'

Getting the neighbourhoods latitude and longitude values

In [55]:
neighbourhood_latitude = SouthMumbai.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = SouthMumbai.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = SouthMumbai.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Antop Hill are 19.0207608, 72.8652556.


#### Now, let's get the top 100 venues that are in Antop Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL url.

In [56]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighbourhood_latitude, neighbourhood_longitude, VERSION, 500,100)
url



'https://api.foursquare.com/v2/venues/explore?client_id=PPV43AY4P4IHA1EJW4KMVJ5QZ00Y2VBSPL15TVUVMS0PLC3T&client_secret=Q4E0CB1KWKOZ42NOJ2E5YRX1JSWX2TTVMK42ZTTCAARPL4XP&ll=19.0207608,72.8652556&v=20180605&radius=500&limit=100'

Send the GET request and examine the results

In [57]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c947c58dd579725dd2d3c7e'},
 'response': {'headerLocation': 'Mumbai',
  'headerFullLocation': 'Mumbai',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 19.025260804500007,
    'lng': 72.87000660474271},
   'sw': {'lat': 19.016260795499996, 'lng': 72.86050459525728}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e52243b483bf6d4ed335f00',
       'name': 'Club House',
       'location': {'address': 'Dosti Acres',
        'crossStreet': 'Antop Hill',
        'lat': 19.023104073169748,
        'lng': 72.8658144678318,
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.023104073169748,
          'lng': 72.8658144678318}],
        'distance': 267,
        'cc': 'IN',
      

From the Foursquare lab, we know that all the information is in the items key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [58]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [59]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Club House,Gym / Fitness Center,19.023104,72.865814
1,Wadala Skywalk,Trail,19.018421,72.864422
2,Palkhi Restaurant,Indian Restaurant,19.022163,72.863034
3,Wadala gate no. 4,Bus Station,19.022335,72.862039


And how many venues were returned by Foursquare?

In [60]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


## Exploring neighbourhoods in South Mumbai

#### Let's create a function to repeat the same process to all the neighborhoods in South Mumbai

In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Creating a new dataframe called SouthMumbai_venues.

In [62]:
SouthMumbai_venues = getNearbyVenues(names=SouthMumbai['Neighbourhood'],
                                   latitudes=SouthMumbai['Latitude'],
                                   longitudes=SouthMumbai['Longitude']
                                  )


Antop Hill
Byculla
Colaba
Dadar
Fort
Girgaon
Kalbadevi
Kamathipura
Matunga
Parel
Tardeo


#### Let's check the size of the resulting dataframe

In [63]:
print(SouthMumbai_venues.shape)
SouthMumbai_venues.head()

(230, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Antop Hill,19.020761,72.865256,Club House,19.023104,72.865814,Gym / Fitness Center
1,Antop Hill,19.020761,72.865256,Wadala Skywalk,19.018421,72.864422,Trail
2,Antop Hill,19.020761,72.865256,Palkhi Restaurant,19.022163,72.863034,Indian Restaurant
3,Antop Hill,19.020761,72.865256,Wadala gate no. 4,19.022335,72.862039,Bus Station
4,Byculla,18.976622,72.832794,Persian Darbar,18.976055,72.833643,Indian Restaurant


Let's check how many venues were returned for each neighborhood

In [64]:
SouthMumbai_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Antop Hill,4,4,4,4,4,4
Byculla,10,10,10,10,10,10
Colaba,15,15,15,15,15,15
Dadar,42,42,42,42,42,42
Fort,46,46,46,46,46,46
Girgaon,35,35,35,35,35,35
Kalbadevi,9,9,9,9,9,9
Kamathipura,9,9,9,9,9,9
Matunga,34,34,34,34,34,34
Parel,14,14,14,14,14,14


#### Let's find out how many unique categories can be curated from all the returned venues

In [65]:
print('There are {} uniques categories.'.format(len(SouthMumbai_venues['Venue Category'].unique())))

There are 80 uniques categories.


## 4. Analysis <a name="analysis"></a>

#### Analyze Each Neighborhood

In [66]:
# one hot encoding
SouthMumbai_onehot = pd.get_dummies(SouthMumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
SouthMumbai_onehot['Neighbourhood'] = SouthMumbai_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [SouthMumbai_onehot.columns[-1]] + list(SouthMumbai_onehot.columns[:-1])
SouthMumbai_onehot = SouthMumbai_onehot[fixed_columns]

SouthMumbai_onehot.head()

Unnamed: 0,Neighbourhood,Arcade,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Bookstore,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Cosmetics Shop,Department Store,Dessert Shop,Donut Shop,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Truck,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,History Museum,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Jewelry Store,Juice Bar,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Music Venue,Opera House,Outdoors & Recreation,Park,Parsi Restaurant,Pharmacy,Pizza Place,Platform,Playground,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Tea Room,Tennis Court,Thai Restaurant,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Women's Store,Zoo
0,Antop Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Antop Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,Antop Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Antop Hill,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Byculla,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [67]:
SouthMumbai_onehot.shape

(230, 81)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [68]:
SouthMumbai_grouped = SouthMumbai_onehot.groupby('Neighbourhood').mean().reset_index()
SouthMumbai_grouped

Unnamed: 0,Neighbourhood,Arcade,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Bookstore,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Cosmetics Shop,Department Store,Dessert Shop,Donut Shop,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Truck,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,History Museum,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Jewelry Store,Juice Bar,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Music Venue,Opera House,Outdoors & Recreation,Park,Parsi Restaurant,Pharmacy,Pizza Place,Platform,Playground,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Tea Room,Tennis Court,Thai Restaurant,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Women's Store,Zoo
0,Antop Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
1,Byculla,0.0,0.0,0.1,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
2,Colaba,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.133333,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0
3,Dadar,0.02381,0.0,0.0,0.0,0.02381,0.047619,0.02381,0.0,0.02381,0.0,0.02381,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.190476,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.02381,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.0,0.02381,0.02381,0.0
4,Fort,0.0,0.021739,0.021739,0.021739,0.021739,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.065217,0.0,0.021739,0.021739,0.021739,0.021739,0.0,0.0,0.043478,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.217391,0.0,0.043478,0.0,0.0,0.021739,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.0,0.021739,0.021739,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0
5,Girgaon,0.0,0.0,0.028571,0.0,0.028571,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.057143,0.0,0.0,0.0,0.085714,0.114286,0.028571,0.0,0.057143,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0
6,Kalbadevi,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Kamathipura,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Matunga,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.029412,0.0,0.029412,0.029412,0.029412,0.029412,0.029412,0.088235,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.117647,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0
9,Parel,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0


#### Let's confirm the new size

In [69]:
SouthMumbai_grouped.shape

(11, 81)

#### Let's print each neighborhood along with the top 5 most common venues

In [70]:
num_top_venues = 5

for hood in SouthMumbai_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = SouthMumbai_grouped[SouthMumbai_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Antop Hill----
                  venue  freq
0                 Trail  0.25
1     Indian Restaurant  0.25
2  Gym / Fitness Center  0.25
3           Bus Station  0.25
4                Arcade  0.00


----Byculla----
               venue  freq
0  Indian Restaurant   0.3
1                Zoo   0.1
2     History Museum   0.1
3   Asian Restaurant   0.1
4           Pharmacy   0.1


----Colaba----
                venue  freq
0  Chinese Restaurant  0.13
1                Café  0.07
2    Toy / Game Store  0.07
3                 Spa  0.07
4         Pizza Place  0.07


----Dadar----
                venue  freq
0   Indian Restaurant  0.19
1  Chinese Restaurant  0.10
2      Ice Cream Shop  0.07
3       Movie Theater  0.05
4                 Bar  0.05


----Fort----
                  venue  freq
0     Indian Restaurant  0.22
1    Seafood Restaurant  0.07
2                  Café  0.07
3  Fast Food Restaurant  0.07
4                Lounge  0.04


----Girgaon----
                venue  freq
0   Indian 

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [71]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [72]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
SouthMumbai_venues_sorted = pd.DataFrame(columns=columns)
SouthMumbai_venues_sorted['Neighbourhood'] = SouthMumbai_grouped['Neighbourhood']

for ind in np.arange(SouthMumbai_grouped.shape[0]):
    SouthMumbai_venues_sorted.iloc[ind, 1:] = return_most_common_venues(SouthMumbai_grouped.iloc[ind, :], num_top_venues)

SouthMumbai_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Antop Hill,Indian Restaurant,Trail,Gym / Fitness Center,Bus Station,Zoo,German Restaurant,Donut Shop,Fast Food Restaurant,Flower Shop,Food
1,Byculla,Indian Restaurant,Zoo,Bar,Park,History Museum,Pharmacy,Bakery,Asian Restaurant,Hotel,Hostel
2,Colaba,Chinese Restaurant,Coffee Shop,Brewery,Hotel,German Restaurant,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Pizza Place,Restaurant
3,Dadar,Indian Restaurant,Chinese Restaurant,Ice Cream Shop,Movie Theater,Bar,Café,Playground,Food & Drink Shop,Maharashtrian Restaurant,Liquor Store
4,Fort,Indian Restaurant,Café,Fast Food Restaurant,Seafood Restaurant,Dessert Shop,Irani Cafe,Lounge,Bookstore,Coffee Shop,Cocktail Bar


#### Cluster Neighborhoods

In [73]:
# set number of clusters
kclusters = 3

SouthMumbai_grouped_clustering = SouthMumbai_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(SouthMumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 0, 0, 0, 0, 2, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [74]:
# add clustering labels
SouthMumbai_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

SouthMumbai_merged = SouthMumbai

# merge the data to add latitude/longitude for each neighborhood
SouthMumbai_merged= SouthMumbai_merged.join(SouthMumbai_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

SouthMumbai_merged # check the last columns!

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Population Density,Average real estate pricing,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,South Mumbai,Antop Hill,19.020761,72.865256,Highest,Low,1,Indian Restaurant,Trail,Gym / Fitness Center,Bus Station,Zoo,German Restaurant,Donut Shop,Fast Food Restaurant,Flower Shop,Food
1,South Mumbai,Byculla,18.976622,72.832794,Highest,High,0,Indian Restaurant,Zoo,Bar,Park,History Museum,Pharmacy,Bakery,Asian Restaurant,Hotel,Hostel
2,South Mumbai,Colaba,18.915091,72.825969,Highest,Very High,0,Chinese Restaurant,Coffee Shop,Brewery,Hotel,German Restaurant,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Pizza Place,Restaurant
3,South Mumbai,Dadar,19.023823,72.839427,Highest,High,0,Indian Restaurant,Chinese Restaurant,Ice Cream Shop,Movie Theater,Bar,Café,Playground,Food & Drink Shop,Maharashtrian Restaurant,Liquor Store
4,South Mumbai,Fort,18.933267,72.834515,Very High,Low,0,Indian Restaurant,Café,Fast Food Restaurant,Seafood Restaurant,Dessert Shop,Irani Cafe,Lounge,Bookstore,Coffee Shop,Cocktail Bar
5,South Mumbai,Girgaon,18.954317,72.817908,Highest,Low,0,Indian Restaurant,Ice Cream Shop,Juice Bar,Café,Italian Restaurant,Harbor / Marina,Pizza Place,Sandwich Place,Indie Movie Theater,Gym
6,South Mumbai,Kalbadevi,18.949258,72.827938,Highest,High,0,Indian Restaurant,Arts & Crafts Store,Jewelry Store,Food,Cheese Shop,Café,Snack Place,Zoo,Grocery Store,Flower Shop
7,South Mumbai,Kamathipura,18.965322,72.826435,Highest,Low,2,Department Store,Fast Food Restaurant,Arts & Crafts Store,Hotel,Chinese Restaurant,Asian Restaurant,Middle Eastern Restaurant,Hostel,History Museum,Harbor / Marina
8,South Mumbai,Matunga,19.027436,72.850147,Highest,High,0,Indian Restaurant,Bar,Café,Fast Food Restaurant,Snack Place,Vegetarian / Vegan Restaurant,Train Station,Coffee Shop,Cosmetics Shop,Department Store
9,South Mumbai,Parel,19.009482,72.837661,Highest,Very High,0,Bakery,Coffee Shop,Dessert Shop,Sandwich Place,Train Station,Women's Store,Jewelry Store,Indian Restaurant,Pizza Place,Food & Drink Shop


Finally, let's visualize the resulting clusters

In [75]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(SouthMumbai_merged['Latitude'], SouthMumbai_merged['Longitude'], SouthMumbai_merged['Neighbourhood'], SouthMumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-3],
        fill=True,
        fill_color=rainbow[cluster-3],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine Clusters

#### Cluster 1

In [76]:
SouthMumbai_merged.loc[SouthMumbai_merged['Cluster Labels'] == 0, SouthMumbai_merged.columns[[1] + list(range(3, SouthMumbai_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Longitude,Population Density,Average real estate pricing,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Byculla,72.832794,Highest,High,0,Indian Restaurant,Zoo,Bar,Park,History Museum,Pharmacy,Bakery,Asian Restaurant,Hotel,Hostel
2,Colaba,72.825969,Highest,Very High,0,Chinese Restaurant,Coffee Shop,Brewery,Hotel,German Restaurant,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Pizza Place,Restaurant
3,Dadar,72.839427,Highest,High,0,Indian Restaurant,Chinese Restaurant,Ice Cream Shop,Movie Theater,Bar,Café,Playground,Food & Drink Shop,Maharashtrian Restaurant,Liquor Store
4,Fort,72.834515,Very High,Low,0,Indian Restaurant,Café,Fast Food Restaurant,Seafood Restaurant,Dessert Shop,Irani Cafe,Lounge,Bookstore,Coffee Shop,Cocktail Bar
5,Girgaon,72.817908,Highest,Low,0,Indian Restaurant,Ice Cream Shop,Juice Bar,Café,Italian Restaurant,Harbor / Marina,Pizza Place,Sandwich Place,Indie Movie Theater,Gym
6,Kalbadevi,72.827938,Highest,High,0,Indian Restaurant,Arts & Crafts Store,Jewelry Store,Food,Cheese Shop,Café,Snack Place,Zoo,Grocery Store,Flower Shop
8,Matunga,72.850147,Highest,High,0,Indian Restaurant,Bar,Café,Fast Food Restaurant,Snack Place,Vegetarian / Vegan Restaurant,Train Station,Coffee Shop,Cosmetics Shop,Department Store
9,Parel,72.837661,Highest,Very High,0,Bakery,Coffee Shop,Dessert Shop,Sandwich Place,Train Station,Women's Store,Jewelry Store,Indian Restaurant,Pizza Place,Food & Drink Shop


For our analysis we might as well ignore **Fort** and **Girgoan** as their real estate pricing is "Low" and we are targeting high end plush localities. Following table summarizes our observations for cluster 1.


   
|  Neighbourhood |  Key  venues |  Population Density|  Average real estate pricing | Key observations | 
|---|---|---|---|---| 
| Byculla |  History Museum, Hotel, Park, Zoo, Beach |Highest   | High  | Consists of inter-neighbourhood popular venues. Place attracts huge chunks of people from all over Mumbai/India | 
|Colaba|Indian, german, chinese, mediterrenean, middle eastern restaurants|Highest|Very High|Consists of inter- neighbourhood popular venues.Place attracts reasonable number of people.| 
|Dadar|Movie theatre, arcade|Highest|High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|
|Kalbadevi|Indian Restaurant, Fast Food Restaurant|Highest|High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|
|Matunga|Departmental Store, Train station|Highest|High|An existing popular departmental store.
|Parel|Food court, Pizza place|Highest|Very High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|

#### Cluster 2

In [77]:
SouthMumbai_merged.loc[SouthMumbai_merged['Cluster Labels'] == 1, SouthMumbai_merged.columns[[1] + list(range(3, SouthMumbai_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Longitude,Population Density,Average real estate pricing,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Antop Hill,72.865256,Highest,Low,1,Indian Restaurant,Trail,Gym / Fitness Center,Bus Station,Zoo,German Restaurant,Donut Shop,Fast Food Restaurant,Flower Shop,Food


Since the average real estate pricing for Antop Hill is low, we might as well ignore it too.

#### Cluster 3

In [79]:
SouthMumbai_merged.loc[SouthMumbai_merged['Cluster Labels'] == 2, SouthMumbai_merged.columns[[1] + list(range(3, SouthMumbai_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Longitude,Population Density,Average real estate pricing,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Kamathipura,72.826435,Highest,Low,2,Department Store,Fast Food Restaurant,Arts & Crafts Store,Hotel,Chinese Restaurant,Asian Restaurant,Middle Eastern Restaurant,Hostel,History Museum,Harbor / Marina
10,Tardeo,72.822151,Highest,Highest,2,Fast Food Restaurant,Department Store,Indian Restaurant,Smoke Shop,Ice Cream Shop,Train Station,Hotel,Restaurant,Movie Theater,Donut Shop


Both of the above mentioned neighbourhoods have Departental Store as their popular venue. This means more competition for the supermarket. Hence, we ignore these neighbourhoods too.

## 5. Results and Discussion <a name="results"></a>

 The aim of this project was to determine an ideal location to set up a new supermarket given a set of pre-requisites. They are summarised as follows:
1.	The location should be densely populated to attract more crowds in order to have a good client base.
2.	The location should not have competitors in the form of existing departmental stores/ malls/supermarkets etc.
3.	Location should be plush and high end to target customers with high income. This information is sort from the real estate pricing which is a key indicator of income groups of people.
Keeping the above set of information in mind, the overall count of large scale supermarkets in Mumbai City was derived using foursquare location data. It was observed majority of supermarkets were in the ‘Western borough’ of Mumbai city. That being said, it was also observed that the population density was highest in ‘South Mumbai’ borough. Both the observations supported the idea that the new supermarket should be located in ‘South Mumbai’ but the question was where? This necessitated a detailed analysis of all neighbourhoods in ‘South Mumbai’. Top 10 venues for each neighbourhood were extracted and the neighbourhoods were clustered as follows:

|Cluster no.|Details|Cluster Name|
|-----|----|----|
|Cluster 1|Neighbourhoods with restaurants as their most popular venue|Food Centric Venues|
|Cluster 2|Neighbourhood with smoke joint and hostel|Students centric venue|
|Cluster 3|Neighbourhoods with Departmental stores as their most popular venue|Consumer goods centric venues|

The obvious focus was on cluster 1 namely “Food centric Venues”. A detailed analysis of neighbourhoods in cluster 1 is as follows along with its conclusion:

|  Neighbourhood |  Key  venues |  Population Density|  Average real estate pricing | Key observations | 
|---|---|---|---|---| 
| Byculla |  History Museum, Hotel, Park, Zoo, Beach |Highest   | High  | Consists of inter-neighbourhood popular venues. Place attracts huge chunks of people from all over Mumbai/India | 
|Colaba|Indian, german, chinese, mediterrenean, middle eastern restaurants|Highest|Very High|Consists of inter- neighbourhood popular venues.Place attracts reasonable number of people.| 
|Dadar|Movie theatre, arcade|Highest|High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|
|Kalbadevi|Indian Restaurant, Fast Food Restaurant|Highest|High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|
|Matunga|Departmental Store, Train station|Highest|High|An existing popular departmental store.
|Parel|Food court, Pizza place|Highest|Very High|Consists of intra- neighbourhood popular venues and hence attracts the nearby crowd.|

**From the above discussion, it is clear that “Byculla” and “Colaba” seems to be the ideal places to set up a new super market**

The choice among the two will be left onto the stakeholders involved. The key differences between the two most suitable neighbourhoods are:

|Neighbourhood| Difference| 
|---|---|
|Byculla|Real estate pricing is ‘High’ (40,000 to 60,000 per sq.ft)|
|Colaba|Real estate pricing is ‘Very High’ (60,000 to 80,000 per sq.ft)|

Depending on amount of funds the stakeholders are ready to invest; either of the neighbourhoods can be selected as an ideal location for setting up the new supermarket.

## 6. Conclusion <a name="conclusion"></a>

This project demonstrated that an ideal location for supermarket can be arrived at given certain requirements. First the city of interest was decided, which in our case was Mumbai. Then a data frame was formulated which consisted of boroughs, neighbourhoods, latitude, longitude, real estate pricing and population density of each neighbourhood in Mumbai. An exploratory analysis was conducted to filter out the most suitable borough and then the most suitable neighbourhood given the fact that our aim was to target areas of high population density and high average real estate pricing. It was thus concluded that the neighbourhoods **“Byculla”** and **“Colaba”** were most suitable as our specific requirements were successfully met.