## IBM Applied Data Science Capstone Course by Coursera

## Week 5 Final Code

### Opening a New Pizza Outlet in Bangalore, Karnataka

* Build a dataframe of neighborhoods in Bangalore, Karnataka by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new pizza shop

In [1]:
#import libraries
import requests
import lxml.html as lh
import pandas as pd
from bs4 import BeautifulSoup
import urllib3
import geocoder # import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium

In [2]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Scrap data from Wikipedia page into a DataFrame

In [3]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore').text

In [4]:
soup = BeautifulSoup(website_url,'lxml')
#print(soup.prettify())

In [5]:
boroughs_spans = soup.findAll('span',{'class':'mw-headline'})

In [6]:
tables = soup.findAll('table',{'class':'wikitable sortable'})

In [12]:
columns = ['Borough', 'Neighbourhood']
bangalore_data=pd.DataFrame(columns=columns)

In [13]:
borough = []
neighbourhood = []
for index in range(0,len(tables)):
    tbody = tables[index].findAll('td')
    for ind in range(0,len(tbody),3):
        borough.append((str(boroughs_spans[index].get('id'))))
        neighbourhood.append(tbody[ind].find('a').get('title'))
Dict = {'Borough':borough,'Neighbourhood':neighbourhood}

In [14]:
bangalore_data=pd.DataFrame(Dict)

In [15]:
bangalore_data.head(100)

Unnamed: 0,Borough,Neighbourhood
0,Central,Bangalore Cantonment
1,Central,Domlur
2,Central,Indiranagar
3,Central,Jeevanbheemanagar
4,Central,Malleswaram
5,Central,Bengaluru Pete
6,Central,Sadashivanagar
7,Central,Seshadripuram
8,Central,"Shivajinagar, Bangalore"
9,Central,Ulsoor


#### Get the geographical coordinates

In [16]:
def get_latlng(post_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Bengaluru, Karnataka'.format(post_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [17]:
get_latlng('Sadashivanagar')

[13.014820000000043, 77.57771000000008]

In [18]:
coords = [ get_latlng(post_code) for post_code in bangalore_data['Neighbourhood'].tolist() ]
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
bangalore_data['Latitude'] = df_coords['Latitude']
bangalore_data['Longitude'] = df_coords['Longitude']
bangalore_data.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Central,Bangalore Cantonment,12.97568,77.60538
1,Central,Domlur,12.9433,77.65603
2,Central,Indiranagar,12.97393,77.6439
3,Central,Jeevanbheemanagar,12.96605,77.65765
4,Central,Malleswaram,13.0063,77.568289


#### Find how many boroughs and neighborhoods

In [21]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(bangalore_data['Borough'].unique()),
        bangalore_data.shape[0]
    )
)

The dataframe has 8 boroughs and 65 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Bangalore, Karnataka

In [22]:
address = 'Bangalore, Karnataka'

geolocator = Nominatim(user_agent="bang_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore are 12.9791198, 77.5912997.


#### Create a map of Bangalore to visualize the neighbourhoods

In [23]:
# create map of Toronto using latitude and longitude values
map_bangalore = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bangalore_data['Latitude'], bangalore_data['Longitude'], bangalore_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup= label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bangalore)  
    
map_bangalore

#### Utilizing the Foursquare API to explore the neighborhoods and segment them.

In [24]:
LIMIT = 100

CLIENT_ID = '15XSZFYFKU3DNVVTSTJYH3VHO0B3DP33IQBDBKGYZ3ICIELG' # your Foursquare ID
CLIENT_SECRET = '05QPNWNRZUQD1C5L2ZH5Q4E4CB4JCCI4ZPOEQGDN12JCVBNU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 15XSZFYFKU3DNVVTSTJYH3VHO0B3DP33IQBDBKGYZ3ICIELG
CLIENT_SECRET:05QPNWNRZUQD1C5L2ZH5Q4E4CB4JCCI4ZPOEQGDN12JCVBNU


### Explore Neighborhoods in Bangalore
#### Let's create a function to repeat the same process to all the neighborhoods in Bangalore

In [166]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### The code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [167]:
bangalore_venues = getNearbyVenues(names=bangalore_data['Neighbourhood'],
                                   latitudes=bangalore_data['Latitude'],
                                   longitudes=bangalore_data['Longitude']
                                  )

Bangalore Cantonment
Domlur
Indiranagar
Jeevanbheemanagar
Malleswaram
Bengaluru Pete
Sadashivanagar
Seshadripuram
Shivajinagar, Bangalore
Ulsoor
Vasanth Nagar
Bellandur
CV Raman Nagar
Hoodi
Krishnarajapuram
Mahadevapura, Bangalore
Marathahalli
Varthur
Whitefield, Bangalore
Banaswadi
HBR Layout
Horamavu
Kalyan Nagar
Kammanahalli
Lingarajapuram
Ramamurthy Nagar
Hebbal
Jalahalli
Mathikere
Peenya
R. T. Nagar
Vidyaranyapura
Yelahanka
Yeshwanthpur
Bommanahalli
Bommasandra
BTM Layout
Electronic City
HSR Layout
Koramangala
Madiwala
Banashankari
Basavanagudi
Girinagar
J. P. Nagar
Jayanagar, Bangalore
Kumaraswamy Layout
Padmanabhanagar
Uttarahalli
Anjanapura
Arekere
Begur, Bangalore
Gottigere
Hulimavu
Kothnur
Basaveshwaranagar
Kamakshipalya
Kengeri
Mahalakshmi Layout
Nagarbhavi
Nandini Layout
Nayandahalli
Rajajinagar
Rajarajeshwari Nagar, Bangalore
Vijayanagar, Bangalore


#### Dimensions and Contents of the new Dataset

In [168]:
print(bangalore_venues.shape)
bangalore_venues.head()

(1784, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bangalore Cantonment,12.97568,77.60538,The 13th Floor,12.975364,77.604995,Lounge
1,Bangalore Cantonment,12.97568,77.60538,M.G Road Boulevard,12.975771,77.603979,Plaza
2,Bangalore Cantonment,12.97568,77.60538,Blossom Book House,12.975042,77.604813,Bookstore
3,Bangalore Cantonment,12.97568,77.60538,Matteo,12.974496,77.607115,Café
4,Bangalore Cantonment,12.97568,77.60538,Starbucks,12.974436,77.607308,Coffee Shop


#### checking how many venues were returned for each neighborhood

In [169]:
bangalore_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anjanapura,2,2,2,2,2,2
Arekere,37,37,37,37,37,37
BTM Layout,45,45,45,45,45,45
Banashankari,45,45,45,45,45,45
Banaswadi,33,33,33,33,33,33
Bangalore Cantonment,100,100,100,100,100,100
Basavanagudi,42,42,42,42,42,42
Basaveshwaranagar,24,24,24,24,24,24
"Begur, Bangalore",4,4,4,4,4,4
Bellandur,39,39,39,39,39,39


#### Finding out how many unique categories can be curated from all the returned venues

In [170]:
print('There are {} uniques categories.'.format(len(bangalore_venues['Venue Category'].unique())))

There are 197 uniques categories.


##  Analyze Each Neighborhood

In [171]:
bangalore_venues['Venue Category'].unique()

array(['Lounge', 'Plaza', 'Bookstore', 'Café', 'Coffee Shop',
       'Pizza Place', 'Toy / Game Store', 'Pub', 'American Restaurant',
       'Indian Restaurant', 'Music Store', 'Ice Cream Shop', 'Steakhouse',
       'Burger Joint', 'Andhra Restaurant', 'Afghan Restaurant',
       'Chinese Restaurant', 'Breakfast Spot', 'Cricket Ground',
       'Bubble Tea Shop', 'Brewery', 'Donut Shop', 'Road',
       'Arts & Crafts Store', 'Dessert Shop', 'Hotel', 'Gym',
       'Fast Food Restaurant', 'Deli / Bodega', 'Italian Restaurant',
       'Bar', 'South Indian Restaurant', 'Furniture / Home Store',
       "Women's Store", 'Korean Restaurant', 'Clothing Store',
       'Jewelry Store', 'Market', 'Spa', 'Paella Restaurant',
       'Sandwich Place', 'Cosmetics Shop', 'Mexican Restaurant',
       'Multiplex', 'Candy Store', 'Juice Bar', 'Bakery', "Men's Store",
       'Pet Store', 'Boutique', 'Asian Restaurant', 'German Restaurant',
       'Arcade', 'Chocolate Shop', 'Snack Place',
       'Multicuis

#### Performing One Hot Encoding

In [174]:
# one hot encoding
bangalore_onehot = pd.get_dummies(bangalore_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bangalore_onehot['Neighborhood'] = bangalore_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bangalore_onehot.columns[134]] + list(bangalore_onehot.columns[:134]) + list(bangalore_onehot.columns[135:])
bangalore_onehot = bangalore_onehot[fixed_columns]
#print(bangalore_onehot.columns.get_loc("Neighborhood"))
bangalore_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Australian Restaurant,BBQ Joint,Badminton Court,Bakery,Bar,Basketball Court,Beer Bar,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Campground,Candy Store,Caribbean Restaurant,Chaat Place,Chettinad Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Dive Bar,Donut Shop,Dumpling Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Hyderabadi Restaurant,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karnataka Restaurant,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Nightclub,North Indian Restaurant,Office,Optical Shop,Outdoor Supply Store,Outlet Store,Paella Restaurant,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Resort,Restaurant,Road,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Szechuan Restaurant,Tea Room,Tennis Court,Thai Restaurant,Toll Booth,Toy / Game Store,Trail,Train Station,Tram Station,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Wine Bar,Wings Joint,Women's Store
0,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [175]:
# Checking the size
bangalore_onehot.shape

(1784, 197)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [176]:
bangalore_grouped = bangalore_onehot.groupby('Neighborhood').mean().reset_index()
bangalore_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Australian Restaurant,BBQ Joint,Badminton Court,Bakery,Bar,Basketball Court,Beer Bar,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Campground,Candy Store,Caribbean Restaurant,Chaat Place,Chettinad Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Dive Bar,Donut Shop,Dumpling Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Hyderabadi Restaurant,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karnataka Restaurant,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Nightclub,North Indian Restaurant,Office,Optical Shop,Outdoor Supply Store,Outlet Store,Paella Restaurant,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Resort,Restaurant,Road,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Szechuan Restaurant,Tea Room,Tennis Court,Thai Restaurant,Toll Booth,Toy / Game Store,Trail,Train Station,Tram Station,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Wine Bar,Wings Joint,Women's Store
0,Anjanapura,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arekere,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.108108,0.0,0.135135,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BTM Layout,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.133333,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
3,Banashankari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.133333,0.022222,0.177778,0.0,0.0,0.0,0.044444,0.0,0.0,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Banaswadi,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.151515,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [177]:
# Checking the new size created
bangalore_grouped.shape

(64, 197)

#### Create a new DataFrame for Pizza Place data only

In [181]:
for item in bangalore_venues['Venue Category'].unique():
    if item != "Neighborhood":
        print(item+": %i" %len(bangalore_grouped[bangalore_grouped["Pizza Place"] > 0]))

Lounge: 36
Plaza: 36
Bookstore: 36
Café: 36
Coffee Shop: 36
Pizza Place: 36
Toy / Game Store: 36
Pub: 36
American Restaurant: 36
Indian Restaurant: 36
Music Store: 36
Ice Cream Shop: 36
Steakhouse: 36
Burger Joint: 36
Andhra Restaurant: 36
Afghan Restaurant: 36
Chinese Restaurant: 36
Breakfast Spot: 36
Cricket Ground: 36
Bubble Tea Shop: 36
Brewery: 36
Donut Shop: 36
Road: 36
Arts & Crafts Store: 36
Dessert Shop: 36
Hotel: 36
Gym: 36
Fast Food Restaurant: 36
Deli / Bodega: 36
Italian Restaurant: 36
Bar: 36
South Indian Restaurant: 36
Furniture / Home Store: 36
Women's Store: 36
Korean Restaurant: 36
Clothing Store: 36
Jewelry Store: 36
Market: 36
Spa: 36
Paella Restaurant: 36
Sandwich Place: 36
Cosmetics Shop: 36
Mexican Restaurant: 36
Multiplex: 36
Candy Store: 36
Juice Bar: 36
Bakery: 36
Men's Store: 36
Pet Store: 36
Boutique: 36
Asian Restaurant: 36
German Restaurant: 36
Arcade: 36
Chocolate Shop: 36
Snack Place: 36
Multicuisine Indian Restaurant: 36
Udupi Restaurant: 36
Karnataka R

In [182]:
bangalore_pizza = bangalore_grouped[["Neighborhood","Pizza Place"]]

In [183]:
bangalore_pizza.head()

Unnamed: 0,Neighborhood,Pizza Place
0,Anjanapura,0.0
1,Arekere,0.054054
2,BTM Layout,0.044444
3,Banashankari,0.066667
4,Banaswadi,0.060606


## Cluster Neighborhoods

In [184]:
# set number of clusters
kclusters = 3

bl_clustering = bangalore_pizza.drop(["Neighborhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 0, 0, 1, 1, 0, 1, 0])

#### Let's create a new dataframe that includes the cluster.

In [185]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
bl_merged = bangalore_pizza.copy()

# add clustering labels
bl_merged["Cluster Labels"] = kmeans.labels_

In [186]:
bl_merged.head()

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels
0,Anjanapura,0.0,1
1,Arekere,0.054054,0
2,BTM Layout,0.044444,0
3,Banashankari,0.066667,0
4,Banaswadi,0.060606,0


In [187]:
# merge bangalore_grouped with bangalore_data to add latitude/longitude for each neighborhood
bl_merged = bl_merged.join(bangalore_data.set_index("Neighbourhood"), on="Neighborhood")

print(bl_merged.shape)
bl_merged.head() # check the last columns!

(64, 6)


Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Borough,Latitude,Longitude
0,Anjanapura,0.0,1,Southern_suburbs,12.8581,77.55906
1,Arekere,0.054054,0,Southern_suburbs,12.88564,77.59669
2,BTM Layout,0.044444,0,South-Eastern,12.91489,77.61004
3,Banashankari,0.066667,0,Southern,12.92228,77.56986
4,Banaswadi,0.060606,0,North-Eastern,13.028473,77.631892


In [188]:
# sort the results by Cluster Labels
print(bl_merged.shape)
bl_merged.sort_values(["Cluster Labels"], inplace=True)
bl_merged.head()

(64, 6)


Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Borough,Latitude,Longitude
18,HBR Layout,0.1,0,North-Eastern,13.02642,77.62432
41,Marathahalli,0.073171,0,Eastern,12.95467,77.70752
33,Kothnur,0.125,0,Southern_suburbs,13.06434,77.64853
42,Mathikere,0.05,0,Northern,13.03236,77.55865
46,Padmanabhanagar,0.047619,0,Southern,12.91547,77.55311


## Visualizing the Resultant Map
#### Finally, let's visualize the resulting clusters

In [189]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bl_merged['Latitude'], bl_merged['Longitude'], bl_merged['Neighborhood'], bl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Saving the Map

In [190]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## Examine Clusters
Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. 

#### Cluster 0

In [191]:
bl_merged.loc[bl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Borough,Latitude,Longitude
18,HBR Layout,0.1,0,North-Eastern,13.02642,77.62432
41,Marathahalli,0.073171,0,Eastern,12.95467,77.70752
33,Kothnur,0.125,0,Southern_suburbs,13.06434,77.64853
42,Mathikere,0.05,0,Northern,13.03236,77.55865
46,Padmanabhanagar,0.047619,0,Southern,12.91547,77.55311
27,Jeevanbheemanagar,0.058824,0,Central,12.96605,77.65765
48,R. T. Nagar,0.105263,0,Northern,13.02447,77.59587
50,"Rajarajeshwari Nagar, Bangalore",0.117647,0,Western,12.93162,77.52699
20,Hebbal,0.1,0,Northern,13.04969,77.58951
19,HSR Layout,0.075,0,South-Eastern,12.91216,77.6449


#### Cluster 1

In [192]:
bl_merged.loc[bl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Borough,Latitude,Longitude
58,Vasanth Nagar,0.027397,1,Central,12.99073,77.58856
40,Malleswaram,0.027778,1,Central,13.0063,77.568289
43,Nagarbhavi,0.0,1,Western,12.95624,77.50933
44,Nandini Layout,0.0,1,Western,13.0148,77.5389
45,Nayandahalli,0.0,1,Western,12.94205,77.52101
49,Rajajinagar,0.012658,1,Western,13.00543,77.55682
60,"Vijayanagar, Bangalore",0.0,1,Western,13.07596,77.65241
52,Sadashivanagar,0.0,1,Central,13.01482,77.57771
53,Seshadripuram,0.0125,1,Central,12.99357,77.57989
39,Mahalakshmi Layout,0.0,1,Western,13.01636,77.54481


#### Cluster 2

In [193]:
bl_merged.loc[bl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Borough,Latitude,Longitude
12,Bommasandra,0.166667,2,South-Eastern,12.81754,77.67879
51,Ramamurthy Nagar,0.25,2,North-Eastern,13.02378,77.67787


#### Observations:

* Most of the pizza shops are concentrated in the Northern and Eastern area of Bangalore city, with the highest number in cluster 1 and moderate number in cluster 2. 
* On the other hand, cluster 1 is having the lowest no of pizza places in the neighborhoods. 
* This represents a great opportunity and high potential areas in opening new pizza palace as there is very little to no competition from existing pizza places. 
* Meanwhile, pizza palace in cluster 0 are likely suffering from intense competition due to oversupply and high concentration of pizza shops. 
* From another perspective, this also shows that the oversupply of pizza shops mostly happened in the northern, western and south-eastern areas of the city, with the central and southern area still have very few pizza shops. 
* Therefore, this project recommends pizza shops investors to capitalize on these findings to open new outlets in neighborhoods in cluster 1 with little to no competition which is the central and southern parts of bangalore. 
* Pizza shops with unique selling propositions to stand out from the competition can also open new pizza shops in neighborhoods in cluster 2 with moderate competition. 
* Lastly, pizza investors are advised to avoid neighborhoods in cluster 1 which already have high concentration of pizza shops and suffering from intense competition.