### Introduction/Business Problem
A client is interested in opening a bakery in the city of Kochi in India. Opening a bakery presents many unique challenges that are different from other types of businesses as there is high degree of competition. To minimise the competition, and to explore areas that do not have many bakeries, Data Science and Machine Learning tools are used to identify the best cluster of neighborhoods for opening a bakery in Kochi, India.

### Data

List of neighborhoods in Kochi, India is available in Wikipedia at https://en.wikipedia.org/wiki/Category:Suburbs_of_Kochi. Dataframe of neighborhoods in Kochi, India can be made by scraping the data from Wikipedia page using __BeautifulSoup__ library.

### Methodology

Once the Dataframe of neighborhoods in Kochi, India is made by scraping the data from Wikipedia page using __BeautifulSoup__ library, the neighborhood addresses are converted into their equivalent latitude and longitude values using geocoder library. Using the lattitude & longitude coordinates, __Foursquare API__ is invoked to explore neighborhoods in Kochi, India. Explore function is used to get the common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. __k-means__ clustering algorithm is used to cluseter the neighborhoods into three based on mumber of Bakeries: High, Medium, Low. Finally, __Folium__ library is used to visualize the neighborhoods in Kochi India and their clusters..

### 0. Install & Import Libraries

In [87]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
!conda install -c conda-forge geocoder --yes

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [89]:
import geocoder

In [90]:
import numpy as np 
import pandas as pd 
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json 

In [91]:
from geopy.geocoders import Nominatim 

import requests 
from bs4 import BeautifulSoup 

from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

import folium 

print("Libraries imported.")

Libraries imported.


### 1. Download and Explore Dataset - Scrap data from Wikipedia page into a Data Frame

In [92]:
df = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Kochi").text

In [93]:
soup = BeautifulSoup(df, 'html.parser')

In [94]:
neighborhood= []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhood.append(row.text)
loc_df = pd.DataFrame({"Neighborhood": neighborhood})
loc_df.head()

Unnamed: 0,Neighborhood
0,Alangad
1,Angamaly
2,Aroor
3,Chellanam
4,Chendamangalam


In [95]:
loc_df.shape

(44, 1)

### 2. Get Lattitude & Longitude of the Neigborhoods

In [96]:
def get_latlng(neighborhood):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kochi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [97]:
coords = [ get_latlng(neighborhood) for neighborhood in loc_df["Neighborhood"].tolist() ]

In [98]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [99]:
loc_df['Latitude'] = df_coords['Latitude']
loc_df['Longitude'] = df_coords['Longitude']

In [100]:
loc_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alangad,10.8475,76.43609
1,Angamaly,10.20366,76.38268
2,Aroor,9.93599,76.26145
3,Chellanam,9.83526,76.27029
4,Chendamangalam,10.17292,76.23346


### 3. Use Foursquare API to explore the venues in Neighborhoods

In [101]:
# The code was removed by Watson Studio for sharing.

Your Foursquare credentails


In [102]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(loc_df['Latitude'], loc_df['Longitude'], loc_df['Neighborhood']):
    
  
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    

    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [103]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(811, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Angamaly,10.20366,76.38268,Carnival Cinemas,10.195147,76.386157,Multiplex
1,Angamaly,10.20366,76.38268,Carnival Cinemas Multiplex,10.195266,76.386193,Multiplex
2,Angamaly,10.20366,76.38268,Angamally Bus Stand,10.196622,76.385227,Bus Station
3,Angamaly,10.20366,76.38268,Saravana Bhavan,10.195313,76.38404,Indian Restaurant
4,Angamaly,10.20366,76.38268,Elite Palazzo,10.189762,76.38608,Hotel


In [104]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Angamaly,5,5,5,5,5,5
Aroor,7,7,7,7,7,7
Chendamangalam,5,5,5,5,5,5
"Chengamanad, Ernakulam district",4,4,4,4,4,4
Cheranallur,37,37,37,37,37,37
Chilavannoor,63,63,63,63,63,63
Choornikkara,6,6,6,6,6,6
Chottanikkara,7,7,7,7,7,7
Edathala,3,3,3,3,3,3
Fort Kochi,48,48,48,48,48,48


In [105]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))
venues_df['VenueCategory'].unique()[:50]

There are 120 uniques categories.


array(['Multiplex', 'Bus Station', 'Indian Restaurant', 'Hotel',
       'Fried Chicken Joint', 'Indie Movie Theater', 'Restaurant',
       'Fast Food Restaurant', 'Hotel Bar', 'Light Rail Station',
       'Airport', 'Historic Site', 'River', 'Boat or Ferry', 'Astrologer',
       'Resort', 'Asian Restaurant', 'Comfort Food Restaurant', 'Bakery',
       'Shopping Mall', 'Multicuisine Indian Restaurant', 'Burger Joint',
       'Convenience Store', 'Juice Bar', 'Donut Shop', 'Coffee Shop',
       'Snack Place', 'Café', 'Ice Cream Shop', 'Thai Restaurant',
       'Pizza Place', 'Electronics Store', 'Middle Eastern Restaurant',
       'Clothing Store', 'Southern / Soul Food Restaurant', 'Arcade',
       'Vegetarian / Vegan Restaurant', 'American Restaurant',
       'Food Court', 'Gym / Fitness Center', 'French Restaurant',
       'Nightclub', 'Stadium', 'Athletics & Sports', 'Sandwich Place',
       'Chinese Restaurant', 'Motorcycle Shop', 'Park', 'Dhaba', 'Bar'],
      dtype=object)

In [108]:

loc_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")


loc_onehot['Neighborhoods'] = venues_df['Neighborhood'] 


fixed_columns = [loc_onehot.columns[-1]] + list(loc_onehot.columns[:-1])
loc_onehot = loc_onehot[fixed_columns]

print(loc_onehot.shape)
loc_onehot.head()

(811, 121)


Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Asian Restaurant,Astrologer,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beach,Bed & Breakfast,Boat or Ferry,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Electronics Store,Fast Food Restaurant,Fish Market,Fishing Store,Flea Market,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kerala Restaurant,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Neighborhood,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Pool,Portuguese Restaurant,Punjabi Restaurant,Recreation Center,Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant
0,Angamaly,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Angamaly,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Angamaly,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Angamaly,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Angamaly,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [109]:
loc_grouped = loc_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(loc_grouped.shape)
loc_grouped.head()

(40, 121)


Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Asian Restaurant,Astrologer,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beach,Bed & Breakfast,Boat or Ferry,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Electronics Store,Fast Food Restaurant,Fish Market,Fishing Store,Flea Market,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kerala Restaurant,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Neighborhood,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Pool,Portuguese Restaurant,Punjabi Restaurant,Recreation Center,Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant
0,Angamaly,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aroor,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Chendamangalam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Chengamanad, Ernakulam district",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cheranallur,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.054054,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.027027,0.081081,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.135135,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027


### 4.Filter the Neighborhoods based on Bakery

In [110]:
len(loc_grouped[loc_grouped["Bakery"] > 0])

16

In [111]:
loc_venue = loc_grouped[["Neighborhoods","Bakery"]]

In [112]:
loc_venue.head()

Unnamed: 0,Neighborhoods,Bakery
0,Angamaly,0.0
1,Aroor,0.0
2,Chendamangalam,0.0
3,"Chengamanad, Ernakulam district",0.25
4,Cheranallur,0.027027


### 5. Clustering Neighborhoods Using K-means

In [113]:
kclusters = 3

loc_clustering = loc_venue.drop(["Neighborhoods"], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(loc_clustering)

kmeans.labels_[0:10]

array([0, 0, 0, 1, 0, 2, 0, 0, 0, 0], dtype=int32)

In [114]:
loc_merged = loc_venue.copy()

loc_merged["Cluster Labels"] = kmeans.labels_

In [115]:
loc_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
loc_merged.head()

Unnamed: 0,Neighborhood,Bakery,Cluster Labels
0,Angamaly,0.0,0
1,Aroor,0.0,0
2,Chendamangalam,0.0,0
3,"Chengamanad, Ernakulam district",0.25,1
4,Cheranallur,0.027027,0


In [116]:
loc_merged = loc_merged.join(loc_df.set_index("Neighborhood"), on="Neighborhood")

print(loc_merged.shape)
loc_merged.head() 

(40, 5)


Unnamed: 0,Neighborhood,Bakery,Cluster Labels,Latitude,Longitude
0,Angamaly,0.0,0,10.20366,76.38268
1,Aroor,0.0,0,9.93599,76.26145
2,Chendamangalam,0.0,0,10.17292,76.23346
3,"Chengamanad, Ernakulam district",0.25,1,10.15354,76.34068
4,Cheranallur,0.027027,0,10.039888,76.300583


In [117]:
loc_merged.sort_values(["Cluster Labels"], inplace=True)
loc_merged.head()

Unnamed: 0,Neighborhood,Bakery,Cluster Labels,Latitude,Longitude
0,Angamaly,0.0,0,10.20366,76.38268
36,Varappuzha,0.0,0,10.08261,76.27041
35,Vallarpadam,0.0,0,9.99789,76.24981
33,Twenty20 Kizhakkambalam,0.0,0,10.04626,76.40411
30,Thrikkakkara,0.010989,0,10.01736,76.31637


### 6. Visualising the clusters on a Map using Folium

In [118]:
address = 'Kochi, India'

geolocator = Nominatim(user_agent="Kochi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kochi are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kochi are 9.9633864, 76.2536614.


In [119]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(loc_merged['Latitude'], loc_merged['Longitude'], loc_merged['Neighborhood'], loc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 7. Examining the clusters

In [120]:
loc_merged.loc[loc_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Bakery,Cluster Labels,Latitude,Longitude
0,Angamaly,0.0,0,10.20366,76.38268
36,Varappuzha,0.0,0,10.08261,76.27041
35,Vallarpadam,0.0,0,9.99789,76.24981
33,Twenty20 Kizhakkambalam,0.0,0,10.04626,76.40411
30,Thrikkakkara,0.010989,0,10.01736,76.31637
29,Thiruvankulam,0.0,0,9.94635,76.36746
26,Pathalam,0.0,0,9.93599,76.26145
25,Palluruthy,0.0,0,9.91642,76.27567
22,Nedumbassery,0.0,0,10.15669,76.3778
21,Mundamveli,0.0,0,9.93069,76.25317


In [121]:
loc_merged.loc[loc_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Bakery,Cluster Labels,Latitude,Longitude
34,Vaduthala,0.428571,1,10.0183,76.27587
3,"Chengamanad, Ernakulam district",0.25,1,10.15354,76.34068


In [122]:
loc_merged.loc[loc_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Bakery,Cluster Labels,Latitude,Longitude
37,Vazhakkala,0.04878,2,10.01789,76.32906
32,Thrippunithura,0.058824,2,9.94124,76.3469
31,Thrikkakkara South,0.055556,2,10.03324,76.32519
27,Thammanam,0.042553,2,9.98557,76.3113
28,Thevara,0.05,2,9.94209,76.29839
24,Pachalam,0.090909,2,10.0035,76.28123
23,Nettoor,0.08,2,9.92726,76.31181
20,Mulavukad,0.1,2,9.99896,76.26169
18,Maradu,0.04,2,9.94051,76.32395
12,Karanakodam,0.044776,2,9.988453,76.303426


### 8. Conclusion

It can be concluded that the Neighborhoods in cluster with label 0 are the best locations to open a Bakery as the number of Bakeries are less. Neighborhoods in cluster with label 1 have medium number of Bakeries while Neighborhoods in cluster with label 2 has the highest number of Bakeries. 