# IBM Applied Data Science Capstone Course by Coursera
### Week 5 Final Report
**_Opening a New Shopping Mall in Karachi, Pakistan_**
- Build a dataframe of neighborhoods in Karachi, Pakistan by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new shopping mall
***
### 1. Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!pip install geocoder
import geocoder

import requests # library to handle requests
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!pip install folium 
import folium # map rendering library

from bs4 import BeautifulSoup
print('Libraries imported.')

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Karachi").text
soup = BeautifulSoup(data, 'html.parser')

In [3]:
# create a list to store neighborhood data
neighborhoodList = []

In [4]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [5]:
# create a new DataFrame from the list
k_df = pd.DataFrame({"Neighborhood": neighborhoodList})
k_df.head()

Unnamed: 0,Neighborhood
0,Abbas Town
1,Abbasi Shaheed
2,Abdul Rehman Goth
3,Abdullah Goth
4,Abidabad


In [6]:
# print the number of rows of the dataframe
k_df.shape

(200, 1)

### 3. Get the geographical coordinates

In [7]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Karachi, Pakistan'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# call the function to get the coordinates, store in a new list using list comprehension
#coords = [ get_latlng(neighborhood) for neighborhood in k_df["Neighborhood"].tolist() ]
#coords

In [11]:
#df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# saving downloaded coordinates in a csv file
#df_coords.to_csv('coordsK.csv',index=False)
df_coords= pd.read_csv('coordsK.csv')

In [12]:
df_coords.shape

(200, 2)

In [13]:
# merge the coordinates into the original dataframe
k_df['Latitude'] = df_coords['Latitude']
k_df['Longitude'] = df_coords['Longitude']

In [14]:
print(k_df.shape)
k_df.head()

(200, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abbas Town,24.9056,67.0822
1,Abbasi Shaheed,24.9056,67.0822
2,Abdul Rehman Goth,25.44306,66.04722
3,Abdullah Goth,24.987,66.9298
4,Abidabad,24.9179,66.9816


In [15]:
# cleaning outliers 
k_df = k_df[k_df['Latitude']<25.4]
k_df = k_df[k_df['Longitude']<67.4]
k_df = k_df[k_df['Latitude']>24.45]
k_df = k_df[k_df['Longitude']>66.4]
k_df.shape 


(162, 3)

In [16]:
# save the cleaned DataFrame as CSV file
k_df.to_csv("karachi_df.csv", index=False)

### 4. Create a map of Karachi, Pakistan with neighborhoods superimposed on top

In [17]:
# get the coordinates of Karachi, Pakistan
address = 'Karachi, Pakistan'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Karachi, Pakistan {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Karachi, Pakistan 25.1446897, 67.1847767315734.


In [20]:

# create map of karachi using latitude and longitude values
map_k = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(k_df['Latitude'], k_df['Longitude'], k_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_k)  
    
map_k

### 5. Use the Foursquare API to explore the neighborhoods


In [21]:
# define Foursquare Credentials and Version
CLIENT_ID = 'V42SHK1ZPM1RKJJ0P2LU0MXLUIXRP0J4XFSVUPNGS5MGCAPT' # your Foursquare ID
CLIENT_SECRET = 'FXNENVXM10RAH1AMECF0BSBKXCKDERYQDQJY5V2QM5V2HAWE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: V42SHK1ZPM1RKJJ0P2LU0MXLUIXRP0J4XFSVUPNGS5MGCAPT
CLIENT_SECRET:FXNENVXM10RAH1AMECF0BSBKXCKDERYQDQJY5V2QM5V2HAWE


**Now, let's get the top 100 venues that are within a radius of 2000 meters.**

In [22]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(k_df['Latitude'], k_df['Longitude'], k_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [23]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(3315, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abbas Town,24.9056,67.0822,14th Street Pizza,24.910596,67.096607,Pizza Place
1,Abbas Town,24.9056,67.0822,Pie in the Sky,24.912653,67.083871,Bakery
2,Abbas Town,24.9056,67.0822,Aga Khan Sports & Rehabilitation Center,24.892157,67.079419,Gym / Fitness Center
3,Abbas Town,24.9056,67.0822,Dunkin',24.904756,67.0789,Donut Shop
4,Abbas Town,24.9056,67.0822,Bismillah Roll & BBQ,24.92221,67.084872,BBQ Joint


In [24]:
#check how many venues were returned for each neighorhood
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbas Town,25,25,25,25,25,25
Abbasi Shaheed,25,25,25,25,25,25
Abidabad,2,2,2,2,2,2
Abu Zar Ghaffari,25,25,25,25,25,25
Abyssinia Lines,49,49,49,49,49,49
Afridi Colony,25,25,25,25,25,25
Agra Taj Colony,3,3,3,3,3,3
Ahsanabad,2,2,2,2,2,2
Aisha Manzil,25,25,25,25,25,25
Akhtar Colony,25,25,25,25,25,25


In [25]:
#finding out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 121 uniques categories.


In [26]:
# checking if the results contain "Shopping Mall"
venues_df[venues_df['VenueCategory'] == "Shopping Mall"]

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
93,Abyssinia Lines,24.86071,67.05436,Dolmen Mall,24.876835,67.06261,Shopping Mall
381,Bahadurabad,24.8842,67.0677,Dolmen Mall,24.876835,67.06261,Shopping Mall
569,Bath Island,24.82905,67.02918,Gulf Way Shopping Centre,24.833267,67.034029,Shopping Mall
627,Bhutta Village,24.92671,67.03437,Marx,24.925458,67.034197,Shopping Mall
630,Bhutta Village,24.92671,67.03437,Dolmen Mall,24.9356,67.040476,Shopping Mall
776,Central Jacob Lines,24.86814,67.03673,Star City Mall,24.864073,67.025822,Shopping Mall
777,Central Jacob Lines,24.86814,67.03673,Atrium Mall,24.856162,67.030271,Shopping Mall
1192,Delhi Colony,24.83136,67.0451,Gulf Way Shopping Centre,24.833267,67.034029,Shopping Mall
1500,Garden East,24.8808,67.03,Star City Mall,24.864073,67.025822,Shopping Mall
1665,Gole Market Nazimabad,24.91521,67.0308,Marx,24.925458,67.034197,Shopping Mall


### 6. Analyze Each Neighborhood

In [27]:
# one hot encoding
k_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
k_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [k_onehot.columns[-1]] + list(k_onehot.columns[:-1])
k_onehot = k_onehot[fixed_columns]

print(k_onehot.shape)
k_onehot.head()

(3315, 122)


Unnamed: 0,Neighborhoods,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Beach,Bistro,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Buffet,Building,Burger Joint,Bus Station,Business Service,Cafeteria,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Gas Station,General Entertainment,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hospital,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Lawyer,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Night Market,Outdoor Sculpture,Pakistani Restaurant,Park,Performing Arts Venue,Persian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Resort,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Social Club,Spa,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Tea Room,Thai Restaurant,Theater,Theme Park,Train Station,Video Store,Women's Store,Yoga Studio
0,Abbas Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbas Town,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbas Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbas Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abbas Town,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [29]:
# finding out if shopping malls already exixts in the neighborhoods
m_df = k_onehot[k_onehot["Shopping Mall"] > 0]
m_df.head()

Unnamed: 0,Neighborhoods,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Beach,Bistro,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Buffet,Building,Burger Joint,Bus Station,Business Service,Cafeteria,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Gas Station,General Entertainment,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hospital,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Lawyer,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Night Market,Outdoor Sculpture,Pakistani Restaurant,Park,Performing Arts Venue,Persian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Resort,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Social Club,Spa,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Tea Room,Thai Restaurant,Theater,Theme Park,Train Station,Video Store,Women's Store,Yoga Studio
93,Abyssinia Lines,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
381,Bahadurabad,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
569,Bath Island,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
627,Bhutta Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
630,Bhutta Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [30]:
# creating dataframe for neighborhoods having Shoping Mall
m_df = m_df[["Neighborhoods","Shopping Mall"]]
m_df.head()

Unnamed: 0,Neighborhoods,Shopping Mall
93,Abyssinia Lines,1
381,Bahadurabad,1
569,Bath Island,1
627,Bhutta Village,1
630,Bhutta Village,1


In [33]:
k_grouped = k_onehot.groupby(["Neighborhoods"]).mean().reset_index()
print(k_grouped.shape)
k_grouped
len(k_grouped[k_grouped["Shopping Mall"] > 0])

(151, 122)


11

In [34]:
k_mall = k_grouped[["Neighborhoods","Shopping Mall"]]
k_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Abbas Town,0.0
1,Abbasi Shaheed,0.0
2,Abidabad,0.0
3,Abu Zar Ghaffari,0.0
4,Abyssinia Lines,0.020408


In [35]:
# set number of clusters
kclusters = 3

k_clustering = k_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(k_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 2, 0, 0, 0, 0, 0])

In [36]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
k_merged = k_mall.copy()

# add clustering labels
k_merged["Cluster Labels"] = kmeans.labels_

In [37]:
k_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
k_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Abbas Town,0.0,0
1,Abbasi Shaheed,0.0,0
2,Abidabad,0.0,0
3,Abu Zar Ghaffari,0.0,0
4,Abyssinia Lines,0.020408,2


In [38]:
k_merged = k_merged.join(k_df.set_index("Neighborhood"), on="Neighborhood")
print(k_merged.shape)
k_merged.head() 

(151, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abbas Town,0.0,0,24.9056,67.0822
1,Abbasi Shaheed,0.0,0,24.9056,67.0822
2,Abidabad,0.0,0,24.9179,66.9816
3,Abu Zar Ghaffari,0.0,0,24.9056,67.0822
4,Abyssinia Lines,0.020408,2,24.86071,67.05436


In [39]:
# sort the results by Cluster Labels
print(k_merged.shape)
k_merged.sort_values(["Cluster Labels"], inplace=True)
k_merged

(151, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abbas Town,0.0,0,24.9056,67.0822
96,"Islamnagar, Karachi",0.0,0,24.9056,67.0822
97,Ittehad Town,0.0,0,24.9541,66.959
98,Jafar-e-Tayyar,0.0,0,24.9056,67.0822
99,Jahanabad (Karachi),0.0,0,24.9056,67.0822
100,"Jalalabad, Karachi",0.0,0,24.9056,67.0822
101,Jamali Colony,0.0,0,24.9056,67.0822
95,Ibrahim Hyderi,0.0,0,24.9056,67.0822
102,Jamshed Quarters,0.0,0,24.9056,67.0822
104,Jut Line,0.0,0,24.9151,67.1888


In [40]:
# create map of cluster
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(k_merged['Latitude'], k_merged['Longitude'], k_merged['Neighborhood'], k_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

In [41]:
m_df.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)

In [42]:
m_merged = m_df.join(k_df.set_index("Neighborhood"), on="Neighborhood")
print(m_merged.shape)
m_merged.head() 

(14, 4)


Unnamed: 0,Neighborhood,Shopping Mall,Latitude,Longitude
93,Abyssinia Lines,1,24.86071,67.05436
381,Bahadurabad,1,24.8842,67.0677
569,Bath Island,1,24.82905,67.02918
627,Bhutta Village,1,24.92671,67.03437
630,Bhutta Village,1,24.92671,67.03437


In [46]:
# create map of shopping malls

map_m = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(m_merged['Latitude'], m_merged['Longitude'], m_merged['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_m)  
    
map_m

In [48]:

# create map of shopping malls and clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(k_merged['Latitude'], k_merged['Longitude'], k_merged['Neighborhood'], k_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1.0).add_to(map_clusters)
for lat, lng, neighborhood in zip(m_merged['Latitude'], m_merged['Longitude'], m_merged['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.5).add_to(map_clusters)  
map_clusters

### 8. Examine Clusters

In [50]:
k_merged.loc[k_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abbas Town,0.0,0,24.9056,67.0822
96,"Islamnagar, Karachi",0.0,0,24.9056,67.0822
97,Ittehad Town,0.0,0,24.9541,66.959
98,Jafar-e-Tayyar,0.0,0,24.9056,67.0822
99,Jahanabad (Karachi),0.0,0,24.9056,67.0822
100,"Jalalabad, Karachi",0.0,0,24.9056,67.0822
101,Jamali Colony,0.0,0,24.9056,67.0822
95,Ibrahim Hyderi,0.0,0,24.9056,67.0822
102,Jamshed Quarters,0.0,0,24.9056,67.0822
104,Jut Line,0.0,0,24.9151,67.1888


In [51]:
k_merged.loc[k_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
143,Musa Colony,0.083333,1,24.9202,67.0483
68,Gole Market Nazimabad,0.076923,1,24.91521,67.0308
24,Bhutta Village,0.090909,1,24.92671,67.03437
60,Garden East,0.111111,1,24.8808,67.03


In [52]:
k_merged.loc[k_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
30,Central Jacob Lines,0.039216,2,24.86814,67.03673
47,Delhi Colony,0.012195,2,24.83136,67.0451
17,Bahadurabad,0.014925,2,24.8842,67.0677
79,Gulshan-e-Iqbal,0.03125,2,24.9222,67.09
4,Abyssinia Lines,0.020408,2,24.86071,67.05436
110,Kehkashan,0.011905,2,24.8303,67.0292
21,Bath Island,0.012658,2,24.82905,67.02918


#### Observations:
Most of the shopping malls are concentrated in the cluster 1 and 2 of Karachi. cluster 0 has very low number of shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 1 and 2 are already suffering from competition due to high number of shopping malls in the neighborhood. Maps above show there are very few shopping malls in the suburbs. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition