# # Capstone Final project

### Analysis of restaurants in Colombo, Sri Lanka neighbourhood to advise restaurant owners on the neighbourhood to open new restaurants

* Build a dataframe of neighborhoods in Colombo SriLanka by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data of the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods

Import libraries

In [24]:

import numpy as np # library to handle arrays

import pandas as pd # import pandas library as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

# library to handle JSON files
import json 

# convert address into latitude and longitude values
from geopy.geocoders import Nominatim 

!conda install -c conda-forge geocoder
import geocoder 

# library to handle requests
import requests 

# library for web scraping
from bs4 import BeautifulSoup 

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes

# plotting library
import folium

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          59 KB

The following NEW packages will be INSTALLED:

    geocoder: 1.38.1-py_1 conda-forge
    ratelim:  0.1.6-py_2  conda-forge


Downloading and Extracting Packages
ratelim-0.1.6        | 6 KB      | ##################################### | 100% 
geocoder-1.38.1      | 53 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environment: d

In [28]:
data = requests.get('https://en.wikipedia.org/wiki/Category:Suburbs_of_Colombo').text
# webscrape wikipedia page using beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
neighborhoodList = []
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [29]:
# create a new DataFrame from the list
col_df = pd.DataFrame({"Neighborhood": neighborhoodList})
col_df.head()

Unnamed: 0,Neighborhood
0,Athurugiriya
1,Bambalapitiya
2,Battaramulla
3,Batuwatta
4,Bloemendhal


In [30]:
# print the number of rows of the dataframe
col_df.shape

(65, 1)

Latitude and Longitude of Colombo

In [38]:
# function to get latitude and longitude of Colombo
def get_latlng(neighborhood):
    coords = None
    while(coords is None):
        geo = geocoder.arcgis('{}, Colombo, SriLanka'.format(neighborhood))
        coords = geo.latlng
    return coords

coords = [ get_latlng(neighborhood) for neighborhood in col_df["Neighborhood"].tolist() ]

coords

[[6.871710000000064, 79.99736000000007],
 [6.904660000000035, 79.85480000000007],
 [6.905200000000036, 79.91554000000008],
 [6.856881411183795, 79.89399987308153],
 [6.954920000000072, 79.86670000000004],
 [6.840800000000058, 79.90441000000004],
 [6.915870000000041, 79.87760000000003],
 [6.9094800000000305, 79.86924000000005],
 [6.931940000000054, 79.84555000000006],
 [6.9705600000000345, 79.91224000000005],
 [6.851320000000044, 79.86590000000007],
 [6.840470000000039, 79.87824000000006],
 [6.937220000000025, 79.88221000000004],
 [6.94148000000007, 79.84664000000004],
 [6.946550000000059, 79.87034000000006],
 [6.887580000000071, 79.86255000000006],
 [6.876463499999996, 79.935723],
 [6.936210000000074, 79.85844000000003],
 [6.931940000000054, 79.84555000000006],
 [6.856959882086081, 79.87853264931098],
 [6.935580000000073, 79.98416000000003],
 [6.7816400000000385, 79.98748000000006],
 [6.866550000000075, 79.87646000000007],
 [6.842750000000024, 79.87202000000008],
 [6.98237000000006, 79

In [41]:
# dataframe to load the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

col_df['Latitude'] = df_coords['Latitude']
col_df['Longitude'] = df_coords['Longitude']
print(col_df.shape)
col_df

(65, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Athurugiriya,6.87171,79.99736
1,Bambalapitiya,6.90466,79.8548
2,Battaramulla,6.9052,79.91554
3,Batuwatta,6.856881,79.894
4,Bloemendhal,6.95492,79.8667
5,Boralesgamuwa,6.8408,79.90441
6,Borella,6.91587,79.8776
7,Cinnamon Gardens,6.90948,79.86924
8,Colombo,6.93194,79.84555
9,Dalugama,6.97056,79.91224


In [42]:
col_df.to_csv("col_df.csv", index=False)

Map of Colombo with neighbourhoods

In [45]:
address = 'Colombo, SriLanka'

geolocator = Nominatim(user_agent="my-application")
loc = geolocator.geocode(address)
lat = loc.latitude
long = loc.longitude
print('The geograpical coordinate of Colombo, SriLanka {}, {}.'.format(lat, long))

The geograpical coordinate of Colombo, SriLanka 6.9218124, 79.8655608840961.


In [48]:
map_col = folium.Map(location=[lat, long], zoom_start=11)

for lat, long, neighborhood in zip(col_df['Latitude'], col_df['Longitude'], col_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, long],radius=3,popup=label,color='blue',fill=True,fill_color='#3170cc',fill_opacity=0.6).add_to(map_col)  
    
map_col

In [49]:
map_col.save('map_col.html')

Explore Colombo, SriLanka neighbourhoods thourgh Foursquare API

In [78]:
CLIENT_ID = '3SQEYD1KWPOCNP0QL0DOZCEYGJYMXR0CLUFFATVYG5UUJESQ'
CLIENT_SECRET = 'IKKAY0J4Q4OLJGARA05CGN4UZMAR0IZSRJMFSV1GRSRZQ40Z'
VERSION = '20180605'

In [79]:
radius = 500
LIMIT = 100
venues = []
for lat,long,neighborhood in zip(col_df['Latitude'],col_df['Longitude'],col_df['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(CLIENT_ID,CLIENT_SECRET,
        VERSION,lat,long,radius, LIMIT)
    results = requests.get(url).json()["response"]['groups'][0]['items']
    for venue in results:
        venues.append((neighborhood,lat,long,venue['venue']['name'],venue['venue']['location']['lat'], venue['venue']['location']['lng'],
        venue['venue']['categories'][0]['name']))

In [80]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head()

(997, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Athurugiriya,6.87171,79.99736,Vithanage Stores,6.870551,79.998906,Convenience Store
1,Athurugiriya,6.87171,79.99736,Pasindu Auto Parts & P.A. Products (Pvt) Ltd.,6.873748,79.99484,Auto Workshop
2,Athurugiriya,6.87171,79.99736,Walgama Junction,6.869385,79.994476,Intersection
3,Bambalapitiya,6.90466,79.8548,99X Technology,6.90562,79.854946,IT Services
4,Bambalapitiya,6.90466,79.8548,The Cake Factory,6.905188,79.856409,Dessert Shop


In [54]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Athurugiriya,3,3,3,3,3,3
Bambalapitiya,40,40,40,40,40,40
Battaramulla,11,11,11,11,11,11
Batuwatta,7,7,7,7,7,7
Boralesgamuwa,5,5,5,5,5,5
Borella,16,16,16,16,16,16
Cinnamon Gardens,41,41,41,41,41,41
Colombo,51,51,51,51,51,51
Dalugama,2,2,2,2,2,2
Dehiwala,22,22,22,22,22,22


In [81]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 154 uniques categories.


In [82]:
#  All the categories
venues_df['VenueCategory'].unique()[:50]

array(['Convenience Store', 'Auto Workshop', 'Intersection',
       'IT Services', 'Dessert Shop', 'Restaurant', 'Seafood Restaurant',
       'Chinese Restaurant', 'Coffee Shop', 'Bakery', 'Boutique', 'Café',
       'Middle Eastern Restaurant', "Men's Store", 'Casino',
       'Fast Food Restaurant', 'Sandwich Place', 'Optical Shop', 'Office',
       'Jewelry Store', 'Bookstore', 'Grocery Store', 'Lingerie Store',
       'Sri Lankan Restaurant', 'Sporting Goods Shop', 'Nightclub',
       'Clothing Store', 'Japanese Restaurant', 'Bar', 'Food Court',
       'Electronics Store', 'Lake', 'Cocktail Bar', 'Department Store',
       'Italian Restaurant', 'Asian Restaurant', 'Mobile Phone Shop',
       'Supermarket', 'Playground', 'Print Shop', 'Shopping Mall',
       "Women's Store", 'Gym', 'Pizza Place', 'Hotel', 'Cosmetics Shop',
       'Movie Theater', 'Park', 'Vegetarian / Vegan Restaurant',
       'Theater'], dtype=object)

In [83]:
# check if restaurant category is there"
"Restaurant" in venues_df['VenueCategory'].unique()

True

In [84]:
colombo_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

colombo_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

fixed_columns = list(colombo_onehot.columns[-1:]) + list(colombo_onehot.columns[:-1])
colombo_onehot = colombo_onehot[fixed_columns]

print(colombo_onehot.shape)
colombo_onehot.head()

(997, 155)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Beach Bar,Bed & Breakfast,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Camera Store,Candy Store,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Dance Studio,Department Store,Design Studio,Dessert Shop,Diner,Donut Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Fondue Restaurant,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Lingerie Store,Lounge,Malay Restaurant,Market,Men's Store,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Multiplex,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Print Shop,Pub,Record Shop,Recording Studio,Recreation Center,Resort,Rest Area,Restaurant,Road,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Water Park,Women's Store,Zoo
0,Athurugiriya,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Athurugiriya,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Athurugiriya,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bambalapitiya,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bambalapitiya,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [85]:
col_com = colombo_onehot.groupby(["Neighborhoods"]).mean().reset_index()
print(col_com.shape)
col_com

(62, 155)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Beach Bar,Bed & Breakfast,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Camera Store,Candy Store,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Dance Studio,Department Store,Design Studio,Dessert Shop,Diner,Donut Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Fondue Restaurant,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Lingerie Store,Lounge,Malay Restaurant,Market,Men's Store,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Multiplex,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Print Shop,Pub,Record Shop,Recording Studio,Recreation Center,Resort,Rest Area,Restaurant,Road,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Water Park,Women's Store,Zoo
0,Athurugiriya,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bambalapitiya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.025,0.075,0.025,0.0,0.075,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.05,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Battaramulla,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Batuwatta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Boralesgamuwa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0
5,Borella,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Cinnamon Gardens,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.02439,0.121951,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.04878,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0
7,Colombo,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.078431,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.156863,0.019608,0.019608,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.098039,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.078431,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.039216,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0
8,Dalugama,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dehiwala,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.045455,0.136364,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0


In [86]:
len(col_com[col_com["Restaurant"] > 0])

25

In [87]:
col_rest = col_com[["Neighborhoods","Restaurant"]]
col_rest.head()

Unnamed: 0,Neighborhoods,Restaurant
0,Athurugiriya,0.0
1,Bambalapitiya,0.05
2,Battaramulla,0.0
3,Batuwatta,0.0
4,Boralesgamuwa,0.2


k-means clustering to cluster the neighborhoods nto 3 clusters

In [88]:
kclusters = 3

col_clustering = col_com.drop(["Neighborhoods"], 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(col_clustering)
kmeans.labels_[0:10]

array([1, 1, 1, 0, 0, 1, 1, 1, 1, 1], dtype=int32)

In [89]:
col_group = col_rest.copy()
col_group["Cluster Labels"] = kmeans.labels_
col_group.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
col_group = col_group.join(col_df.set_index("Neighborhood"), on="Neighborhood")
print(col_group.shape)
col_group.head()

(62, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Athurugiriya,0.0,1,6.87171,79.99736
1,Bambalapitiya,0.05,1,6.90466,79.8548
2,Battaramulla,0.0,1,6.9052,79.91554
3,Batuwatta,0.0,0,6.856881,79.894
4,Boralesgamuwa,0.2,0,6.8408,79.90441


In [90]:
col_group.sort_values(["Cluster Labels"], inplace=True)
col_group

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
54,Sri Jayawardenepura Kotte,0.0,0,6.88396,79.9008
28,Kotahena,0.0,0,6.95915,79.86807
3,Batuwatta,0.0,0,6.856881,79.894
4,Boralesgamuwa,0.2,0,6.8408,79.90441
27,Koswatte,0.142857,0,6.90192,79.89408
26,Kolonnawa,0.0,0,6.93263,79.88886
41,Narahenpita,0.166667,0,6.89826,79.87981
52,Ratmalana,0.0,0,6.81447,79.87795
42,Nawala,0.125,1,6.89323,79.89003
40,Mount-Lavinia,0.171429,1,6.83981,79.86683


In [91]:
map_clusters = folium.Map(location=[lat, long], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(col_group['Latitude'], col_group['Longitude'], col_group['Neighborhood'], col_group['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=3,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],
                        fill_opacity=0.6).add_to(map_clusters)
       
map_clusters

In [92]:
map_clusters.save('map_clusters.html')

Verify clusters

In [93]:
col_group.loc[col_group['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
54,Sri Jayawardenepura Kotte,0.0,0,6.88396,79.9008
28,Kotahena,0.0,0,6.95915,79.86807
3,Batuwatta,0.0,0,6.856881,79.894
4,Boralesgamuwa,0.2,0,6.8408,79.90441
27,Koswatte,0.142857,0,6.90192,79.89408
26,Kolonnawa,0.0,0,6.93263,79.88886
41,Narahenpita,0.166667,0,6.89826,79.87981
52,Ratmalana,0.0,0,6.81447,79.87795


In [94]:
col_group.loc[col_group['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
42,Nawala,0.125,1,6.89323,79.89003
40,Mount-Lavinia,0.171429,1,6.83981,79.86683
39,Moratuwa,0.0,1,6.77891,79.8831
0,Athurugiriya,0.0,1,6.87171,79.99736
43,Nugegoda,0.066667,1,6.87706,79.89292
36,Maradana,0.0,1,6.92473,79.8657
35,Maligawatta,0.142857,1,6.93366,79.87185
37,Mattakkuliya,0.0,1,6.9723,79.87532
44,Pamankada,0.0,1,6.87954,79.86937
47,Pelawatte,0.0,1,6.8932,79.93719


In [95]:
col_group.loc[col_group['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
38,Modara,0.0,2,6.96673,79.87141


## Conclusion

As per the observation, almost 85% of restaurants are in second Cluster area of Colombo and approx 13% in frst cluster. Instead third cluster has only one restaurant in the neighborhood. This presents a great business potential to open new restaurants in third cluster as there is negligible competition from other restaurants and also people need not go to far area restaurants. Restaurants in second cluster will be going through intense competition due to high concentration of restaurants. It shows that there are surplus of restaurants in a single neighbourhood of the city, with other neighbourhoods still having very few restaurants. Hence, this analysis recommends restaurant owners to take advantage of these findings to open new restaurants in neighborhoods in third cluster to expand their business. Restaurants with exclusive cuisines who stand out from the competition can also open new restaurants in neighborhoods in first cluster with moderate competition. Lastly, restaurant owners are advised to shun neighborhoods in second cluster that already has high number of restaurats and going through intense competition.