# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

Bangalore, officially known as Bengaluru, is the capital of the Indian state of Karnataka. It has a population of about 10 million and a metropolitan population of about 8.52 million, making it the third most populous city and fifth most populous urban agglomeration in India. Located in southern India, Bangalore is known for its pleasant climate throughout the year. It is also called the Silicon valley of India. In my present project, I would like to explore the City of Bangalore and find the neighborhood for the people who are trying relocate and also a suitable place for staying based on the amenities around the location.


## Data Section <a name="data"></a>


Data Section:

In my present project of exploring Bangalore, am using a python library called pgeocode (https://github.com/symerio/pgeocode) which is a high-performance off-line querying of GPS coordinates, region name and municipality name from postal codes. Distances between postal codes as well as general distance queries are also supported. GeoNames database includes postal codes for 83 countries. I used ' index postal_codes' function which creates a data frame of unique postal codes of a given country. The data frame consists of following columns:

- country code: iso country code, 2 characters
- postal code: postal code
- place name: place name (Ex: District, City etc)
- state_name: 1. order subdivision (state)
- state_code: 1. order subdivision (state)
- county_name: 2. order subdivision (county/province)
- county_code: 2. order subdivision (county/province)
- community_name: 3. order subdivision (community)
- community_code: 3. order subdivision (community)
- latitude: estimated latitude (wgs84)
- longitude: estimated longitude (wgs84)
- accuracy: accuracy of lat/lng from 1=estimated to 6=centroid

I will be using above data to find the suitable location preferences and their respective facilities available.

Also, I will be using Foursquare Location data to get the most common venues of given area in Bengaluru and use python folium library to visualize geographic details of Bengaluru and its places.


In [4]:
!pip install pgeocode

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/86/44/519e3db3db84acdeb29e24f2e65991960f13464279b61bde5e9e96909c9d/pgeocode-0.2.1-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1


In [5]:
import pgeocode
import pandas as pd
import requests

from geopy.geocoders import Nominatim 

!conda install -c conda-forge folium=0.5.0  
import folium 
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

In [6]:
nomi = pgeocode.Nominatim('in')
nomi.query_postal_code("560076")

postal_code                                                  560076
country code                                                     IN
place_name        Mico Layout, JP Nagar VIII phase, Mount St Jos...
state_name                                                Karnataka
state_code                                                       19
county_name                                               Bengaluru
county_code                                                     583
community_name                                      Bangalore South
community_code                                                  NaN
latitude                                                    12.9833
longitude                                                   77.5833
accuracy                                                          4
Name: 0, dtype: object

### Lets get the different areas of Bengaluru

In [7]:
India = nomi._get_data('in')
india = nomi._index_postal_codes()
karnataka = india[india.state_name == "Karnataka"]
bengaluru = karnataka[karnataka.county_name == "Bengaluru"]
bengaluru.head()

Unnamed: 0,country code,postal_code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
10165,IN,560001,"HighCourt, Vasanthanagar, Mahatma Gandhi Road,...",Karnataka,19,Bengaluru,583.0,Bangalore North,,12.9914,77.592244,3
10166,IN,560002,"Bangalore City, Bangalore Corporation Building",Karnataka,19,Bengaluru,583.0,Bangalore North,,13.2257,77.58435,4
10167,IN,560003,"Malleswaram, Palace Guttahalli, Swimming Pool ...",Karnataka,19,Bengaluru,583.0,Bangalore North,,13.2257,77.56735,4
10168,IN,560004,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",Karnataka,19,Bengaluru,583.0,Bangalore South,,12.9833,77.5833,1
10169,IN,560005,Fraser Town,Karnataka,19,Bengaluru,583.0,Bangalore North,,12.991,77.5843,1


In [8]:
bengaluru.shape

(107, 12)

In [9]:
bengaluru.reset_index(inplace = True)
bengaluru.columns

Index(['index', 'country code', 'postal_code', 'place_name', 'state_name',
       'state_code', 'county_name', 'county_code', 'community_name',
       'community_code', 'latitude', 'longitude', 'accuracy'],
      dtype='object')

### Getting the required Columns

In [10]:
bengaluru = bengaluru[['postal_code','place_name', 'community_name', 'latitude', 'longitude']]
bengaluru.head(10)

Unnamed: 0,postal_code,place_name,community_name,latitude,longitude
0,560001,"HighCourt, Vasanthanagar, Mahatma Gandhi Road,...",Bangalore North,12.9914,77.592244
1,560002,"Bangalore City, Bangalore Corporation Building",Bangalore North,13.2257,77.58435
2,560003,"Malleswaram, Palace Guttahalli, Swimming Pool ...",Bangalore North,13.2257,77.56735
3,560004,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",Bangalore South,12.9833,77.5833
4,560005,Fraser Town,Bangalore North,12.991,77.5843
5,560006,"Training Command IAF, J.C.Nagar",Bangalore North,12.991,77.5843
6,560007,"Air Force Hospital, Agram",Bangalore North,12.991,77.5843
7,560008,"H.A.L II Stage H.O, Hulsur Bazaar",Bangalore North,12.991,77.5843
8,560009,"K. G. Road, Bangalore Dist Offices Bldg",Bangalore North,12.991,77.5843
9,560010,"Industrial Estate (Bangalore), Rajajinagar H.O...",Bangalore North,12.9604,77.5673


In [11]:
bengaluru.community_name.unique()

array(['Bangalore North', 'Bangalore South', 'Bangalore', 'Bellandur',
       'Anekal', 'Hosakote', nan, 'Bengaluru', 'Bangaloresouth',
       'Bg North'], dtype=object)

### Grouping the Community names with Unique name

In [12]:
bengaluru.loc[bengaluru.community_name == 'Bangaloresouth', 'community_name'] = 'Bangalore South'
bengaluru.loc[bengaluru.community_name == 'Bg North', 'community_name'] = 'Bangalore North'

In [13]:
bengaluru.columns

Index(['postal_code', 'place_name', 'community_name', 'latitude', 'longitude'], dtype='object')

### Getting the Latitude and Longitude of Bengaluru

In [14]:
address = 'Bengaluru, KA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bengaluru are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bengaluru are 12.9791198, 77.5912997.


In [15]:
# create map of Bengaluru using latitude and longitude values
map_Bengaluru = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, place, community in zip(bengaluru['latitude'], bengaluru['longitude'], bengaluru['place_name'], bengaluru['community_name']):
    label = '{}, {}'.format(community, place)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Bengaluru)  
    
map_Bengaluru

### Getting the count of available postal codes in each Community

In [16]:
bengaluru.groupby("community_name").count()

Unnamed: 0_level_0,postal_code,place_name,latitude,longitude
community_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Anekal,4,4,4,4
Bangalore,6,6,6,6
Bangalore North,59,59,59,59
Bangalore South,33,33,33,33
Bellandur,1,1,1,1
Bengaluru,1,1,1,1
Hosakote,2,2,2,2


## Methodology <a name="methodology"></a>

In this section I will be using Forsquare API to get the most common venues of given community of Bengaluru and also used python folium library to visualize geographic details of Bengaluru and its places.

As I am interested in Bangalore South, I decided to just focus on Bengaluru South alone

In [17]:
#As I am interested in  Bangalore South, I will be using the same for further clustering
blr_south = bengaluru[bengaluru['community_name'] == 'Bangalore South'].reset_index(drop=True)
blr_south.head()

Unnamed: 0,postal_code,place_name,community_name,latitude,longitude
0,560004,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",Bangalore South,12.9833,77.5833
1,560011,"Madhavan Park, Jayangar III Block",Bangalore South,12.9604,77.5673
2,560018,"Chamrajpet (Bangalore), Goripalya SO",Bangalore South,13.2257,77.57115
3,560026,"Deepanjalinagar, Governmemnt Electric Factory,...",Bangalore South,12.9996,77.6359
4,560027,"Wilson Garden, Shanthinagar, Sampangiramnagar",Bangalore South,13.2257,77.588467


In [18]:
#Lets plot Bangalore South Map
address = 'Bangalore South,KA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore South are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore South are 12.862467899999999, 77.56089325971044.


In [19]:
# create map of Bengaluru using latitude and longitude values
map_blrSouth = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, place, community in zip(blr_south['latitude'], blr_south['longitude'], blr_south['place_name'], blr_south['community_name']):
    label = '{}, {}'.format(community, place)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_blrSouth)  
    
map_blrSouth

## Foursquare
Now that we have our location coordinates, let's use Foursquare API to get info of Bangalore South neighborhood.

In [20]:
# The code was removed by Watson Studio for sharing.

Your credentails:


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's find the available venues in Bengaluru South:

In [22]:
blrSouth_venues = getNearbyVenues(names=blr_south['place_name'],
                                   latitudes=blr_south['latitude'],
                                   longitudes=blr_south['longitude']
                                  )

Gavipuram Extension, Mavalli, Basavanagudi H.O, Thyagarajnagar, Pampamahakavi Road
Madhavan Park, Jayangar III Block
Chamrajpet (Bangalore), Goripalya SO
Deepanjalinagar, Governmemnt Electric Factory, Nayandahalli
Wilson Garden, Shanthinagar, Sampangiramnagar
Adugodi
Koramangala I Block, Koramangala, St. John's Medical College, Agara
Carmelram
Jayanagar H.O, Jayanagar East, Tilaknagar (Bangalore)
Viveknagar (Bangalore), Austin Town
Bidrahalli, Mundur, Bhattarahalli, Virgonagar
Ashoknagar (Bangalore), State Bank Of Mysore Colony, Dasarahalli(Srinagar), Banashankari
Chickpet
Bengaluru Vishwavidyalaya, Mallathahalli
Rv Niketan
Chikkalasandra, Subramanyapura
Doddakallasandra, Konanakunte
Whitefield, EPIP
Madivala, Bommanahalli (Bangalore), Singasandra
Tyagrajnagar, B Sk II Stage, Padmanabhnagar
Ramohalli, Kumbalgodu Gollahalli, Kumbalagodu
Mico Layout, JP Nagar VIII phase, Mount St Joseph, Hulimavu, Bannerghatta Road
J P Nagar, JP Nagar III Phase, Yelachenahalli
Kathriguppe, Banashankari I

In [23]:
blrSouth_venues.head(20)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Taj West End,12.984572,77.584893,Hotel
1,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Bangalore Turf Club,12.983914,77.58314,Racetrack
2,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Masala Klub,12.984993,77.585115,Indian Restaurant
3,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Mynt,12.984629,77.584989,Coffee Shop
4,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Aromas Of South,12.984895,77.580985,Indian Restaurant
5,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,The Blue Bar,12.984872,77.583973,Hotel Bar
6,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Blue Ginger,12.984804,77.584045,Vietnamese Restaurant
7,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Kabab Studio,12.985325,77.579465,Indian Restaurant
8,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Gold Finch Hotel,12.9853,77.579475,Hotel
9,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",12.9833,77.5833,Sana-di-ge,12.985438,77.579605,Seafood Restaurant


In [24]:
# Finding unique venue category
print('There are {} uniques categories.'.format(len(blrSouth_venues['Venue Category'].unique())))

There are 48 uniques categories.


In [25]:
#Let us cluster based on neighborhoods
# one hot encoding
blrSouth_onehot = pd.get_dummies(blrSouth_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
blrSouth_onehot['Neighborhood'] = blrSouth_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [blrSouth_onehot.columns[-1]] + list(blrSouth_onehot.columns[:-1])
blrSouth_onehot = blrSouth_onehot[fixed_columns]

blrSouth_grouped = blrSouth_onehot.groupby('Neighborhood').mean().reset_index()

In [26]:
blrSouth_grouped.head(20)

Unnamed: 0,Neighborhood,ATM,Andhra Restaurant,Art Gallery,Asian Restaurant,Bakery,Bistro,Boutique,Bus Station,Café,...,Restaurant,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,South Indian Restaurant,Supermarket,Theme Park Ride / Attraction,Vietnamese Restaurant,Yoga Studio
0,Adugodi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Bengaluru Vishwavidyalaya, Mallathahalli",0.0,0.076923,0.076923,0.0,0.0,0.0,0.076923,0.0,0.153846,...,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bidrahalli, Mundur, Bhattarahalli, Virgonagar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0
3,Carmelram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chickpet,0.0,0.076923,0.076923,0.0,0.0,0.0,0.076923,0.0,0.153846,...,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Chikkalasandra, Subramanyapura",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Deepanjalinagar, Governmemnt Electric Factory,...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
7,"Doddakallasandra, Konanakunte",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Electronics City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0


In [27]:
#Top common venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
import numpy as np
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
blrSouth_venues_sorted = pd.DataFrame(columns=columns)
blrSouth_venues_sorted['Neighborhood'] = blrSouth_grouped['Neighborhood']

for ind in np.arange(blrSouth_grouped.shape[0]):
    blrSouth_venues_sorted.iloc[ind, 1:] = return_most_common_venues(blrSouth_grouped.iloc[ind, :], num_top_venues)

blrSouth_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Adugodi,Farm,Yoga Studio,Vietnamese Restaurant,Hotel,Hookah Bar,Furniture / Home Store,Fish & Chips Shop,Fast Food Restaurant,Dumpling Restaurant,...,Dance Studio,Convenience Store,Coffee Shop,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro,Bakery
1,"Bengaluru Vishwavidyalaya, Mallathahalli",Indian Restaurant,Café,Hotel,Hotel Pool,Dance Studio,Andhra Restaurant,Art Gallery,Boutique,Restaurant,...,Furniture / Home Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Dumpling Restaurant,Department Store,Chinese Restaurant,Convenience Store,Coffee Shop,Clothing Store
2,"Bidrahalli, Mundur, Bhattarahalli, Virgonagar",Hotel,Indian Restaurant,Seafood Restaurant,Hotel Bar,Vietnamese Restaurant,Juice Bar,Coffee Shop,Racetrack,Hotel Pool,...,Boutique,Andhra Restaurant,Art Gallery,Fish & Chips Shop,Asian Restaurant,Fast Food Restaurant,Bakery,Farm,Dumpling Restaurant,Department Store
3,Carmelram,Convenience Store,Yoga Studio,Vietnamese Restaurant,Hotel,Hookah Bar,Furniture / Home Store,Fish & Chips Shop,Fast Food Restaurant,Farm,...,Department Store,Dance Studio,Coffee Shop,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro,Bakery
4,Chickpet,Indian Restaurant,Café,Hotel,Hotel Pool,Dance Studio,Andhra Restaurant,Art Gallery,Boutique,Restaurant,...,Furniture / Home Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Dumpling Restaurant,Department Store,Chinese Restaurant,Convenience Store,Coffee Shop,Clothing Store
5,"Chikkalasandra, Subramanyapura",ATM,Pizza Place,Fish & Chips Shop,Pharmacy,Hookah Bar,Furniture / Home Store,Fast Food Restaurant,Farm,Dumpling Restaurant,...,Dance Studio,Convenience Store,Coffee Shop,Hotel Bar,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro
6,"Deepanjalinagar, Governmemnt Electric Factory,...",Clothing Store,Fast Food Restaurant,Shopping Mall,Multiplex,Coffee Shop,Hookah Bar,Furniture / Home Store,Fish & Chips Shop,Farm,...,Department Store,Dance Studio,Convenience Store,Yoga Studio,Hotel Bar,Chinese Restaurant,Café,Bus Station,Boutique,Bistro
7,"Doddakallasandra, Konanakunte",ATM,Pizza Place,Fish & Chips Shop,Pharmacy,Hookah Bar,Furniture / Home Store,Fast Food Restaurant,Farm,Dumpling Restaurant,...,Dance Studio,Convenience Store,Coffee Shop,Hotel Bar,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro
8,Electronics City,Farm,Yoga Studio,Vietnamese Restaurant,Hotel,Hookah Bar,Furniture / Home Store,Fish & Chips Shop,Fast Food Restaurant,Dumpling Restaurant,...,Dance Studio,Convenience Store,Coffee Shop,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro,Bakery
9,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",Hotel,Indian Restaurant,Seafood Restaurant,Hotel Bar,Vietnamese Restaurant,Juice Bar,Coffee Shop,Racetrack,Hotel Pool,...,Boutique,Andhra Restaurant,Art Gallery,Fish & Chips Shop,Asian Restaurant,Fast Food Restaurant,Bakery,Farm,Dumpling Restaurant,Department Store


In [29]:
# set number of clusters
kclusters = 20

blrSouth_grouped_clustering = blrSouth_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(blrSouth_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:24]

  return_n_iter=True)


array([ 2, 13, 16,  5, 13,  4,  9,  4,  2, 16,  7,  6,  0,  3,  2, 10,  2,
       16, 12,  3, 13,  1], dtype=int32)

In [30]:
# add clustering labels
blrSouth_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

blrSouth_merged = blr_south

blrSouth_merged = blrSouth_merged.join(blrSouth_venues_sorted.set_index('Neighborhood'), on='place_name')

blrSouth_merged.head(10) 

Unnamed: 0,postal_code,place_name,community_name,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,560004,"Gavipuram Extension, Mavalli, Basavanagudi H.O...",Bangalore South,12.9833,77.5833,16.0,Hotel,Indian Restaurant,Seafood Restaurant,Hotel Bar,...,Boutique,Andhra Restaurant,Art Gallery,Fish & Chips Shop,Asian Restaurant,Fast Food Restaurant,Bakery,Farm,Dumpling Restaurant,Department Store
1,560011,"Madhavan Park, Jayangar III Block",Bangalore South,12.9604,77.5673,10.0,Fast Food Restaurant,Print Shop,Indian Restaurant,Shop & Service,...,Dance Studio,Convenience Store,Coffee Shop,Yoga Studio,Hotel,Chinese Restaurant,Café,Bus Station,Boutique,Bistro
2,560018,"Chamrajpet (Bangalore), Goripalya SO",Bangalore South,13.2257,77.57115,,,,,,...,,,,,,,,,,
3,560026,"Deepanjalinagar, Governmemnt Electric Factory,...",Bangalore South,12.9996,77.6359,9.0,Clothing Store,Fast Food Restaurant,Shopping Mall,Multiplex,...,Department Store,Dance Studio,Convenience Store,Yoga Studio,Hotel Bar,Chinese Restaurant,Café,Bus Station,Boutique,Bistro
4,560027,"Wilson Garden, Shanthinagar, Sampangiramnagar",Bangalore South,13.2257,77.588467,,,,,,...,,,,,,,,,,
5,560030,Adugodi,Bangalore South,13.2257,77.575,2.0,Farm,Yoga Studio,Vietnamese Restaurant,Hotel,...,Dance Studio,Convenience Store,Coffee Shop,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro,Bakery
6,560034,"Koramangala I Block, Koramangala, St. John's M...",Bangalore South,13.0685,77.6415,0.0,ATM,Supermarket,Furniture / Home Store,Dumpling Restaurant,...,Department Store,Dance Studio,Convenience Store,Clothing Store,Hotel Bar,Café,Bus Station,Boutique,Bistro,Bakery
7,560035,Carmelram,Bangalore South,13.0108,77.7494,5.0,Convenience Store,Yoga Studio,Vietnamese Restaurant,Hotel,...,Department Store,Dance Studio,Coffee Shop,Clothing Store,Chinese Restaurant,Café,Bus Station,Boutique,Bistro,Bakery
8,560041,"Jayanagar H.O, Jayanagar East, Tilaknagar (Ban...",Bangalore South,12.9221,77.582033,6.0,Indian Restaurant,Coffee Shop,Pizza Place,Chinese Restaurant,...,Hookah Bar,Asian Restaurant,Bistro,South Indian Restaurant,Restaurant,Dumpling Restaurant,Fish & Chips Shop,Andhra Restaurant,Farm,Art Gallery
9,560047,"Viveknagar (Bangalore), Austin Town",Bangalore South,13.005,77.3214,,,,,,...,,,,,,,,,,


In [32]:
blrSouth_merged['Cluster Labels'].replace(np.NaN, 0, inplace = True)
blrSouth_merged['Cluster Labels']=blrSouth_merged['Cluster Labels'].astype('int')

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(blrSouth_merged['latitude'], blrSouth_merged['longitude'], blrSouth_merged['community_name'], blrSouth_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

In summary, we have 48 unique categories were returned by Foursquare, then I created a table which shows list of top 20 venue category for each neighbourhood. We have some common venue categories in neighbourhoods. For that reason I used  K-means algorithm to cluster the neighbourhoods. K-Means algorithm is one of the most common cluster methods of unsupervised learning. Hence, I ended the study by visualizing the data and clustering information on the Bengaluru map.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify areas with all the facilities and amenities available for staying in order to aid stakeholders in narrowing down the search for optimal location for a new person relocating to Bengaluru. By finding the venue details from Foursquare data we have first identified the venue categories and for further analysis we found unique venue categories. Clustering of those venues was then performed in order to create areas of common venues.

Final decission on finding the location will be made by stakeholders based on specific characteristics of locations in every recommended zone, taking into consideration additional factors like attractiveness of each location proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.