# Capstone project - The Battle of Neighborhoods (Week 2)

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## 1) Introduction: Business Problem<a name="introduction"></a>

In this project we will try to find an optimal location for a Shopping Mall. This report will be targeted to property developers and businessmen interested in opening an **Shopping Mall** in **Pune**, India.

We will try to detect **locations that are not already crowded with shopping malls**. We are also particularly interested in **areas with no Shopping malls in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

## 2) Data <a name="data"></a>

Following data sources will be needed to extract/generate the required information:
* neighborhood data is retrived using excel file
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using 'Geocoder function'
* number of shopping malls and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Pune center will be obtained using **Foursquare API**

#### 2.1   Import required libraries

In [4]:
#!conda install -c conda-forge beautifulsoup4 --yes
#!conda install -c conda-forge geopy --yes!conda install -c conda-forge geopy --yes
!pip install geocoder
!pip install folium
import numpy as np # library to handle data in arrays

import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json  # library to handle JSON files
from geopy.geocoders import Nominatim  # convert an address into latitude and longitude values
import geocoder  # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 12.7MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
Libraries imported.


#### 2.2 Retrieving a neighborhood data of Pune City in India from excel file

In [5]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighborhood
0,Ambegaon
1,Aundh
2,Baner
3,Bavdhan Khurd
4,Bavdhan Budruk


In [6]:
df = df_data_0 # storing to a new dataframe with name 'df' just for convenience 

In [7]:
df.head()

Unnamed: 0,Neighborhood
0,Ambegaon
1,Aundh
2,Baner
3,Bavdhan Khurd
4,Bavdhan Budruk


In [8]:
# print the number of rows i.e.,names of Neighborhood and column name 'Neighborhood' from the dataframe df.
df.shape

(82, 1)

#### 2.3 Finding coordinates of every Neighborhood

In [9]:
#define a function to get coordinates

def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Pune, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list called 'coordinates'
coordinates = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [11]:
coordinates

[[19.00496000000004, 73.94583000000006],
 [18.563450000000046, 73.81227000000007],
 [18.548200000000065, 73.77316000000008],
 [18.511100000000056, 73.77773000000008],
 [18.51827000000003, 73.76557000000008],
 [18.576020000000028, 73.77983000000006],
 [18.733490000000074, 74.28288000000003],
 [18.471870000000024, 73.86336000000006],
 [18.499220000000037, 73.75316000000004],
 [18.495100000000036, 73.72124000000008],
 [18.46628000000004, 73.85326000000003],
 [18.57856000000004, 73.89264000000003],
 [18.447020000000066, 73.80757000000006],
 [18.509650000000022, 73.83124000000004],
 [18.473650000000077, 73.97473000000008],
 [18.522320000000036, 73.89712000000003],
 [18.502530000000036, 73.92706000000004],
 [18.479790000000037, 73.83075000000008],
 [18.49150000000003, 73.82172000000008],
 [18.578450000000032, 73.87489000000005],
 [18.447320000000047, 73.86405000000008],
 [18.561140000000023, 73.85300000000007],
 [18.544620000000066, 73.93922000000003],
 [18.43825000000004, 73.89895000000007]

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coordinates, columns=['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [14]:
# check the neighborhoods and the coordinates
print(df.shape)
df

(82, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Ambegaon,19.00496,73.94583
1,Aundh,18.56345,73.81227
2,Baner,18.5482,73.77316
3,Bavdhan Khurd,18.5111,73.77773
4,Bavdhan Budruk,18.51827,73.76557
5,Balewadi,18.57602,73.77983
6,Bhamburde,18.73349,74.28288
7,Bibvewadi,18.47187,73.86336
8,Bhugaon,18.49922,73.75316
9,Bhukum,18.4951,73.72124


In [15]:
# save the DataFrame as CSV file
df.to_csv("new_df.csv", index=False)

In [16]:
# get the coordinates of 'Pune' city in India
address = 'Pune, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Pune, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Pune, India 18.521428, 73.8544541.


## 3) Methodology <a name="methodology"></a>

Now we have found the cocordinates of all neighborhood. First we will observe all of the neighborhoods on a map using folium library. 
Then we will visualize all the venues from Pune city center within 5 km of radius. 

We will analyze each Neighborhood using 'one hot encoding'.
In the last, we will form clusers of all shopping malls in all the vicinities. Thereafter finding normalized number of shopping malls in each cluster's neighborhood.

#### 3.1 Create a map of Pune city along with the neighborhoods

In [17]:
# create map of Pune using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map)  
    
map

#### 3.2 Using Foursquare API to explore Neighborhoods in Pune City

In [20]:
from IPython.display import HTML
from IPython.display import display
tag = HTML('''<script>
code_show=true; 
function code_toggle() {
    if (code_show){
        $('div.cell.code_cell.rendered.selected div.input').hide();
    } else {
        $('div.cell.code_cell.rendered.selected div.input').show();
    }
    code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
This cell contains Credentials.To show/hide this cell's raw code input, click <a href="javascript:code_toggle()">here</a>.''')
# Write code below:
display(tag)
CLIENT_ID = 'SX3XST1Y3HYBM4PHFG1IHCVSHQMQ1B5EBRRB5LWLIZZFMJKN'
CLIENT_SECRET = 'XLVECKD0DO2MIXB5IG4IOYBTXK1QYRXXFYNLK5TKXRJEFLH5'
VERSION = '20180604'

In [239]:
# define Foursquare Credentials and Version
CLIENT_ID = 'MY_CLIENT_ID'
CLIENT_SECRET = 'MY_CLIENT_SECRET'
VERSION = '20180604'

In [21]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(4603, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ambegaon,19.00496,73.94583,My Idea Store,19.007062,73.949491,Mobile Phone Shop
1,Ambegaon,19.00496,73.94583,Axis Bank ATM,19.00098,73.944656,ATM
2,Ambegaon,19.00496,73.94583,Go cheese world,18.995747,73.944337,Museum
3,Ambegaon,19.00496,73.94583,Axis Bank ATM,19.00798,73.927422,ATM
4,Ambegaon,19.00496,73.94583,Axis Bank ATM,19.00798,73.927422,ATM


#### 3.3 Let's check number of venues for each Neighborhood

In [23]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Akurdi,5,5,5,5,5,5
Ambegaon,8,8,8,8,8,8
Ambi,6,6,6,6,6,6
Aundh,100,100,100,100,100,100
Balewadi,100,100,100,100,100,100
Baner,100,100,100,100,100,100
Bavdhan Budruk,72,72,72,72,72,72
Bavdhan Khurd,75,75,75,75,75,75
Bhamburde,1,1,1,1,1,1
Bhosari,41,41,41,41,41,41


#### 3.4 Finding out unique categories of venues

In [24]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 156 uniques categories.


In [25]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Mobile Phone Shop', 'ATM', 'Museum', 'Indian Restaurant',
       'Asian Restaurant', 'Bookstore', 'Shopping Mall',
       'English Restaurant', 'Dessert Shop', 'Coffee Shop', 'Donut Shop',
       'Ice Cream Shop', 'Gym', 'Multiplex', 'Lounge', 'Clothing Store',
       'Chinese Restaurant', 'Bakery', 'Mexican Restaurant',
       'Chocolate Shop', 'Hotel', 'Brewery', 'Italian Restaurant',
       'Malay Restaurant', 'South Indian Restaurant', 'Jewelry Store',
       'BBQ Joint', 'Snack Place', 'Breakfast Spot',
       'Fast Food Restaurant', 'Bistro',
       'Molecular Gastronomy Restaurant', 'Vegetarian / Vegan Restaurant',
       'Nightclub', 'Trail', 'Café', 'Theme Park', 'Other Great Outdoors',
       'Motorcycle Shop', 'Restaurant', 'Sandwich Place',
       'French Restaurant', 'Bar', 'Punjabi Restaurant', 'Pizza Place',
       'Seafood Restaurant', 'Beer Garden', 'Golf Course',
       'Department Store', 'Stadium', 'Food Court', 'Burger Joint',
       'American Restaurant', 

In [26]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

## 4) Analysis <a name="analysis"></a>

#### 4.1 Analyzing each Neighborhood

In [27]:
# one hot encoding
onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

print(onehot.shape)
onehot.head()

(4603, 157)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Service,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beach Bar,Bed & Breakfast,Beer Garden,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Cafeteria,Café,Chaat Place,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Lake,Lounge,Maharashtrian Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Park,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Resort,Rest Area,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Southern / Soul Food Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Toll Booth,Toll Plaza,Town,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Shop,Yoga Studio,Zoo
0,Ambegaon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ambegaon,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ambegaon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ambegaon,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ambegaon,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### 4.2 We will group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [28]:
df_grouped = onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(df_grouped.shape)
df_grouped

(78, 157)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Service,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beach Bar,Bed & Breakfast,Beer Garden,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Cafeteria,Café,Chaat Place,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Lake,Lounge,Maharashtrian Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Park,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Resort,Rest Area,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Southern / Soul Food Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Toll Booth,Toll Plaza,Town,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Shop,Yoga Studio,Zoo
0,Akurdi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0
1,Ambegaon,0.375,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ambi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aundh,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.01,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.13,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0
4,Balewadi,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.06,0.0,0.14,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
5,Baner,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.05,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.05,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.14,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bavdhan Budruk,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.013889,0.041667,0.0,0.013889,0.0,0.0,0.013889,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,0.0,0.0,0.138889,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.083333,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.013889,0.0,0.152778,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.027778,0.013889,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bavdhan Khurd,0.0,0.0,0.013333,0.0,0.0,0.0,0.026667,0.0,0.013333,0.04,0.0,0.013333,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.0,0.146667,0.0,0.0,0.026667,0.0,0.0,0.0,0.053333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.04,0.0,0.0,0.026667,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.026667,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.026667,0.0,0.12,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.013333,0.026667,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0
8,Bhamburde,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bhosari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.097561,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.02439,0.0,0.243902,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.097561,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
len(df_grouped[df_grouped["Shopping Mall"] > 0])

47

In [30]:
df_grouped["Shopping Mall"][df_grouped["Shopping Mall"]==0].count()

31

#### 4.3 Create a new DataFrame for Shopping Mall data only and name it as 'df_mall'

In [31]:
df_mall = df_grouped[["Neighborhoods","Shopping Mall"]]

In [32]:
df_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Akurdi,0.0
1,Ambegaon,0.0
2,Ambi,0.0
3,Aundh,0.02
4,Balewadi,0.01


#### 4.4 Cluster Nighborhood using K means clustering

In [33]:
# set number of clusters
clusters = 4

clustering = df_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=clusters, random_state=0).fit(clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 0, 3, 3, 1, 1, 1, 0], dtype=int32)

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
df_merged = df_mall.copy()

# add clustering labels
df_merged["Cluster Labels"] = kmeans.labels_

In [35]:
df_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
df_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Akurdi,0.0,1
1,Ambegaon,0.0,1
2,Ambi,0.0,1
3,Aundh,0.02,0
4,Balewadi,0.01,3


In [36]:
# merge 'merged df' with original 'df' to add latitude/longitude for each neighborhood
df_merged = df_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(df_merged.shape)
df_merged.head() # check the last columns!

(78, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Akurdi,0.0,1,18.76408,73.69573
1,Ambegaon,0.0,1,19.00496,73.94583
2,Ambi,0.0,1,18.7593,73.66972
3,Aundh,0.02,0,18.56345,73.81227
4,Balewadi,0.01,3,18.57602,73.77983


In [37]:
# sort the results by Cluster Labels
print(df_merged.shape)
df_merged.sort_values(["Cluster Labels"], inplace=True)
df_merged

(78, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
65,Undri,0.036364,0,18.45427,73.91788
42,Mohammed Wadi,0.03,0,18.47867,73.91594
21,Dhayari,0.034483,0,18.44702,73.80757
49,Pashan,0.02,0,18.53674,73.7929
72,Wadgaon Sheri,0.02,0,18.53789,73.93267
25,Ghorpadi,0.022727,0,18.52232,73.89712
26,Hadapsar,0.02,0,18.50253,73.92706
27,Hingne Khurd,0.020833,0,18.47979,73.83075
28,Hinjawadi,0.023529,0,18.59142,73.73895
45,Nanded,0.032258,0,18.45642,73.792


In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.3)

# set color scheme for the clusters
x = np.arange(clusters)
ys = [i+x+(i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

#### 4.5 Exploring cluters

##### Cluster 1

In [40]:
df1= df_merged.loc[df_merged['Cluster Labels'] == 0]
df1

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
65,Undri,0.036364,0,18.45427,73.91788
42,Mohammed Wadi,0.03,0,18.47867,73.91594
21,Dhayari,0.034483,0,18.44702,73.80757
49,Pashan,0.02,0,18.53674,73.7929
72,Wadgaon Sheri,0.02,0,18.53789,73.93267
25,Ghorpadi,0.022727,0,18.52232,73.89712
26,Hadapsar,0.02,0,18.50253,73.92706
27,Hingne Khurd,0.020833,0,18.47979,73.83075
28,Hinjawadi,0.023529,0,18.59142,73.73895
45,Nanded,0.032258,0,18.45642,73.792


In [41]:
print("Cluster 1 details:")
print("    Total vicinities-",df1["Shopping Mall"].count())
print("    Vicinities without Shopping Malls",df1["Shopping Mall"][df1["Shopping Mall"]==0].count())

Cluster 1 details:
    Total vicinities- 30
    Vicinities without Shopping Malls 0


##### Cluster 2

In [42]:
df2=df_merged.loc[df_merged['Cluster Labels'] == 1]
df2

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
48,Parvati,0.0,1,18.48696,73.85006
46,Panmala,0.0,1,18.87647,73.89708
59,Somatne,0.0,1,18.6954,73.68223
60,Sus,0.0,1,18.5467,73.75113
55,Pirangut,0.0,1,18.51123,73.68317
54,Pimpri,0.0,1,18.3471,74.2051
43,Moshi,0.0,1,18.67612,73.84952
61,Talawade,0.0,1,18.89533,73.7441
62,Talegaon,0.0,1,18.73413,73.68334
57,Ravet,0.0,1,18.64513,73.73638


In [43]:
print("Cluster 2 details:")
print("    Total vicinities-",df2["Shopping Mall"].count())
print("    Vicinities without Shopping Malls",df2["Shopping Mall"][df2["Shopping Mall"]==0].count())

Cluster 2 details:
    Total vicinities- 31
    Vicinities without Shopping Malls 31


##### Cluster 3

In [44]:
df3=df_merged.loc[df_merged['Cluster Labels'] == 2]
df3

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
22,Dighi,0.076923,2,18.61522,73.87241
36,Kondhwa,0.052632,2,18.43825,73.89895
32,Kasarwadi,0.042553,2,18.60263,73.82435


In [45]:
print("Cluster 2 details:")
print("    Total vicinities-",df3["Shopping Mall"].count())
print("    Vicinities without Shopping Malls",df3["Shopping Mall"][df3["Shopping Mall"]==0].count())

Cluster 2 details:
    Total vicinities- 3
    Vicinities without Shopping Malls 0


##### Cluster 4

In [46]:
df4=df_merged.loc[df_merged['Cluster Labels'] == 3]
df4

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
71,Vishrantwadi,0.01,3,18.55533,73.87492
38,Kothrud,0.01,3,18.50517,73.80245
63,Tathawade,0.010309,3,18.62548,73.75069
58,Shivane,0.011628,3,18.46781,73.78897
40,Manjri,0.01,3,18.48194,73.865618
37,Koregaon Park,0.01,3,18.53533,73.89382
31,Karve Nagar,0.01,3,18.4915,73.82172
29,Kalas,0.01,3,18.57845,73.87489
23,Erandwane,0.01,3,18.50965,73.83124
20,Dhanori,0.010989,3,18.57856,73.89264


In [47]:
print("Cluster 4 details:")
print("    Total vicinities-",df4["Shopping Mall"].count())
print("    Vicinities without Shopping Malls",df4["Shopping Mall"][df4["Shopping Mall"]==0].count())

Cluster 4 details:
    Total vicinities- 14
    Vicinities without Shopping Malls 0


## 5) Results and Discussion <a name="results"></a>

Most of the shopping malls are concentrated in the central area of Pune city, with the highest number in cluster 1 and moderate number in cluster 3 and cluster 4. On the other hand, cluster 2 has lowest number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. 

Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 2 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 3 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.

## 6) Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Pune city areas close to center with low number of shopping malls in order to aid stakeholders in narrowing down the search for optimal location for a new shopping mall. By calculating shopping mall density distribution from Foursquare data we have first identified general vicinity that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby shopping malls. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decision on optimal shopping mall location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like population of each location, proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.