# The Battle of Neighborhoods 

# Introduction

Visakhapatnam also known as Vizag is the proposed executive capital of the Indian state of Andhra Pradesh. It is also the most populated and largest city of Andhra Pradesh. It is the second largest city in the east coast of India after Chennai and also the fourth largest city in South India. It is one of the four smart cities of Andhra Pradesh selected under Smart Cities Mission. With an estimated output of $43.5 billion, the city is the ninth largest contributor to India's overall GDP as of 2016. The city is home to some reputed Central and State educational institutions.The city is a major tourist destination and is particularly known for its beaches,Buddhist sites and natural beauty. It has been nicknamed as the "City of Destiny".The main aim of the project is to study the neighborhoods of Visakhapatnam to determine possible locations for starting a restaurant. This project can be useful for business owners and entrepreneurs who are looking to invest in a restaurant in a smart city like Visakhapatnam. The main objective of this project is to analyze appropriate data and find recommendations for the stakeholders.

# Data :

## Data Collection

The data required for this project is as follows and has been collected from
multiple sources. The following data is required for the project:
1) Neighborhood data of Visakhapatnam
2) Geographical Coordinates of Visakhapatnam and all neighborhoods in Visakhapatnam
3) Venue data for neighborhoods in Visakhapatnam

## Neighborhood Data

The data of the neighborhoods in Visakhapatnam was scraped from
https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam.We
will use web scraping techniques to extract the data from the Wikipedia page, with the help of Python requests and beautifulsoup packages. Then we will get the geographical coordinates of the neighborhoods using Python Geocoder package which will give us the latitude and longitude coordinates of the neighborhoods.After that, we will use Foursquare API to get the venue data for those neighborhoods.Foursquare API will provide many categories of the venue data,we are particularly interested in the Restaurant category in order to help us to solve the business problem put forward. This is a project that will make use of many data science skills, from web scraping (Wikipedia), working with API (Foursquare),data cleaning, data wrangling, to machine learning (K-means clustering) and map visualization (Folium).

# Methodology

#### Importing all the necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
!pip install geocoder
import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
!pip install folium
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 9.7 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.3 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Libraries imported.


## Webscraping and Refining Data

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
#  create a new DataFrame from the list
vi_df = pd.DataFrame({"Neighborhood": neighborhoodList})

vi_df.head()

Unnamed: 0,Neighborhood
0,Abidnagar
1,Adarsh Nagar
2,Adavivaram
3,Aganampudi
4,Akkayyapalem


In [7]:
vi_df.shape

(127, 1)

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Visakhapatnam, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return lat_lng_coords

In [9]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in vi_df["Neighborhood"].tolist()]

In [10]:
coords

[[17.73786000000007, 83.29888000000005],
 [17.763910000000067, 83.33169000000004],
 [17.785830000000033, 83.25242000000009],
 [17.689040000000034, 83.13988000000006],
 [17.734210000000076, 83.29713000000004],
 [17.708720000000028, 83.20904000000007],
 [17.720230000000072, 83.29757000000006],
 [17.68975000000006, 83.00223000000005],
 [17.877720000000068, 83.30459000000008],
 [17.596290000000067, 83.20243000000005],
 [17.768430000000023, 83.31107000000003],
 [17.72276000000005, 83.31078000000008],
 [17.565500000000043, 82.98174000000006],
 [17.743340000000046, 83.31052000000005],
 [17.81253052012206, 83.4078489258925],
 [17.889380000000074, 83.45031000000006],
 [17.70595000000003, 83.19796000000008],
 [17.719840000000033, 83.26278000000008],
 [17.727250000000026, 83.31334000000004],
 [17.69327000000004, 83.29237000000006],
 [17.681190000000072, 83.19786000000005],
 [17.719840000000033, 83.26278000000008],
 [17.726720000000057, 83.33061000000004],
 [17.80147000000005, 83.22367000000008],


In [11]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
vi_df['Latitude'] = df_coords['Latitude']
vi_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates
print(vi_df.shape)
vi_df.head()

(127, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abidnagar,17.73786,83.29888
1,Adarsh Nagar,17.76391,83.33169
2,Adavivaram,17.78583,83.25242
3,Aganampudi,17.68904,83.13988
4,Akkayyapalem,17.73421,83.29713


In [14]:
# get the coordinates of visakhapatnam
address = 'Visakhapatnam, India'

geolocator = Nominatim(user_agent="Visakhapatnam")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Visakhapatnam, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Visakhapatnam, India 17.7231276, 83.3012842.


In [15]:
# create map of visakhapatnam using latitude and longitude values
map_vi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(vi_df['Latitude'], vi_df['Longitude'], vi_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_vi)  
    
map_vi

In [None]:
# save the map as HTML file
#map_vi.save('/home/user/map_vi.html')

## Using the Foursquare API to explore the neighborhoods

In [16]:
# @hidden_cell
CLIENT_ID = 'JMFTJR2AELSZSHLGBAGPXSDVMYOAE3GOPOL4HCZAHKAM3DOU' # your Foursquare ID
CLIENT_SECRET = 'KSEXDD5TVCCR1O5IQ4W5AZLYKJC4MXASEK4REGP5UFGJTX0N' # your Foursquare Secret
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JMFTJR2AELSZSHLGBAGPXSDVMYOAE3GOPOL4HCZAHKAM3DOU
CLIENT_SECRET:KSEXDD5TVCCR1O5IQ4W5AZLYKJC4MXASEK4REGP5UFGJTX0N


In [17]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(vi_df['Latitude'], vi_df['Longitude'], vi_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [18]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(10)

(2330, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,17.73786,83.29888,Sai Ram Parlour,17.726339,83.303465,Indian Restaurant
1,Abidnagar,17.73786,83.29888,cafe coffee day,17.730015,83.314734,Café
2,Abidnagar,17.73786,83.29888,Tandoori Inn Restaurent,17.727051,83.302709,Indian Restaurant
3,Abidnagar,17.73786,83.29888,Deepak Punjabi Dhaba,17.723782,83.309922,Indian Restaurant
4,Abidnagar,17.73786,83.29888,Pizza Hut,17.72665,83.305531,Pizza Place
5,Abidnagar,17.73786,83.29888,Kinnera Kameswari,17.735294,83.31758,Multiplex
6,Abidnagar,17.73786,83.29888,Spencer Shopping Mall,17.730279,83.314701,Shopping Mall
7,Abidnagar,17.73786,83.29888,Bez krishna,17.727828,83.303613,Vegetarian / Vegan Restaurant
8,Abidnagar,17.73786,83.29888,Hill View Park,17.74477,83.308626,Park
9,Abidnagar,17.73786,83.29888,Gupta Brothers Books,17.725467,83.303528,Bookstore


## Lets check the number of venues returned for each neighbourhood

In [19]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abidnagar,13,13,13,13,13,13
Adarsh Nagar,6,6,6,6,6,6
Adavivaram,1,1,1,1,1,1
Aganampudi,8,8,8,8,8,8
Akkayyapalem,17,17,17,17,17,17
Akkireddypalem,6,6,6,6,6,6
Allipuram,41,41,41,41,41,41
Anakapalle,3,3,3,3,3,3
Anandapuram,1,1,1,1,1,1
Arilova,3,3,3,3,3,3


## Let's find out how many unique categories can be curated from all the returned venues

In [20]:
print('There are {} unique categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 108 unique categories.


In [21]:
venues_df['VenueCategory'].unique()#displays all the category names

array(['Indian Restaurant', 'Café', 'Pizza Place', 'Multiplex',
       'Shopping Mall', 'Vegetarian / Vegan Restaurant', 'Park',
       'Bookstore', 'Platform', 'Stadium', 'Volleyball Court',
       'Moving Target', 'Bus Station', 'Mountain', 'Historic Site',
       'Beach', 'IT Services', 'Airport Food Court', 'Train Station',
       'Food', 'Dessert Shop', 'Pharmacy', 'Port', 'Drive-in Theater',
       'Ice Cream Shop', 'Indie Movie Theater', 'Hotel',
       'Fast Food Restaurant', 'Italian Restaurant', 'Restaurant',
       'Clothing Store', 'Electronics Store', 'Tea Room',
       'Asian Restaurant', 'Lake', 'Golf Course',
       'Multicuisine Indian Restaurant', 'Breakfast Spot', 'Juice Bar',
       'Coffee Shop', 'Food Court', 'Convenience Store', 'Boutique',
       "Men's Store", 'Sandwich Place', 'Arts & Entertainment',
       'Movie Theater', 'Spa', 'Dhaba', 'ATM', 'American Restaurant',
       'Snack Place', 'Bar', 'Performing Arts Venue', 'Pier', 'Resort',
       'Scenic Looko

In [22]:
# check if the results contain "Restaurant"
"Restaurant" in venues_df['VenueCategory'].unique()

True

# Analyzing each neighborhood

In [23]:
# one hot encoding
vi_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vi_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vi_onehot.columns[-1]] + list(vi_onehot.columns[:-1])
vi_onehot = vi_onehot[fixed_columns]

print(vi_onehot.shape)
vi_onehot.head(20)

(2330, 109)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Food Court,American Restaurant,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bus Station,Business Service,Cafeteria,Café,Campground,Candy Store,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Drive-in Theater,Dry Cleaner,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Truck,Furniture / Home Store,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hockey Arena,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Men's Store,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Port,Pub,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Smoke Shop,Snack Place,Spa,Stadium,Steakhouse,Supermarket,Surf Spot,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Warehouse Store,Women's Store
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
8,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
vi_grouped = vi_onehot.groupby(["Neighborhoods"]).sum().reset_index()
#df.group/by("state")["last_name"].count()

print(vi_grouped.shape)
vi_grouped

(115, 109)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Food Court,American Restaurant,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bus Station,Business Service,Cafeteria,Café,Campground,Candy Store,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Drive-in Theater,Dry Cleaner,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Truck,Furniture / Home Store,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hockey Arena,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Men's Store,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Port,Pub,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Smoke Shop,Snack Place,Spa,Stadium,Steakhouse,Supermarket,Surf Spot,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Warehouse Store,Women's Store
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0
1,Adarsh Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Adavivaram,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aganampudi,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0
4,Akkayyapalem,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,2,1,0,0,0
5,Akkireddypalem,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Allipuram,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,2,6,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,2,3,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,2,1,0,0,0
7,Anakapalle,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
8,Anandapuram,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Arilova,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
#a=vi_grouped['Neighborhoods'][0]

In [25]:
len((vi_grouped[vi_grouped["Restaurant"] > 0]))
# now we want to select a good location where the no of restaurants are less so that our chances of setting up one at that location should be good

32

## Create a dataframe for Restaurant data only

In [26]:
vi_rest = vi_grouped[["Neighborhoods","Restaurant"]]

In [27]:
vi_rest

Unnamed: 0,Neighborhoods,Restaurant
0,Abidnagar,0
1,Adarsh Nagar,0
2,Adavivaram,0
3,Aganampudi,0
4,Akkayyapalem,0
5,Akkireddypalem,0
6,Allipuram,1
7,Anakapalle,0
8,Anandapuram,0
9,Arilova,0


## Clustering the Neighborhoods

Running k-means to cluster the neighborhoods in Visakhapatnam into 3 clusters.

In [28]:
# set number of clusters
kclusters = 3

vi_clustering = vi_rest.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vi_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 2, 0, 0, 0], dtype=int32)

In [29]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
vi_merged = vi_rest.copy()

# add clustering labels
vi_merged["Cluster Labels"] = kmeans.labels_

In [30]:
vi_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
vi_merged.head(10)

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,Abidnagar,0,0
1,Adarsh Nagar,0,0
2,Adavivaram,0,0
3,Aganampudi,0,0
4,Akkayyapalem,0,0
5,Akkireddypalem,0,0
6,Allipuram,1,2
7,Anakapalle,0,0
8,Anandapuram,0,0
9,Arilova,0,0


In [31]:
#Add latitude and longitude values by using the join operation(the new dataframe with the old dataframe containing the latitude and longitude values)
#vi_merged = vi_merged.join(vi_df.set_index("Neighborhood"), on="Neighborhood")

#print(vi_merged.shape)
vi_merged['Latitude'] = df_coords['Latitude']
vi_merged['Longitude'] = df_coords['Longitude']
vi_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Abidnagar,0,0,17.73786,83.29888
1,Adarsh Nagar,0,0,17.76391,83.33169
2,Adavivaram,0,0,17.78583,83.25242
3,Aganampudi,0,0,17.68904,83.13988
4,Akkayyapalem,0,0,17.73421,83.29713


In [32]:
# sorting the results by Cluster Labels
print(vi_merged.shape)
vi_merged.sort_values(["Cluster Labels"], inplace=True)
vi_merged

(115, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Abidnagar,0,0,17.73786,83.29888
71,Peda Waltair,0,0,17.74261,83.18834
70,Parawada,0,0,17.74269,83.30041
68,One Town (Visakhapatnam),0,0,17.67263,83.19407
67,Nidigattu,0,0,17.71984,83.26278
66,Nathayyapalem,0,0,17.73854,83.33626
65,Narava,0,0,17.74794,83.26313
64,Narasimha Nagar,0,0,17.69859,83.22464
63,Naidu Thota,0,0,17.70228,83.2102
62,Nadupuru,0,0,17.74051,83.24869


In [33]:
vi_merged["Restaurant"].max()

5

## Visualizing the resulting clusters

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vi_merged['Latitude'], vi_merged['Longitude'], vi_merged['Neighborhood'], vi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# save the map as HTML file
#map_clusters.save('/home/user/map_clusters.html')

# Examine Clusters

###### Cluster 0

In [35]:
vi_merged.loc[vi_merged['Cluster Labels'] == 0]
#len(vi_merged.loc[vi_merged['Cluster Labels'] == 0])# -132 neighbourhoods/places in this cluster 0

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Abidnagar,0,0,17.73786,83.29888
71,Peda Waltair,0,0,17.74261,83.18834
70,Parawada,0,0,17.74269,83.30041
68,One Town (Visakhapatnam),0,0,17.67263,83.19407
67,Nidigattu,0,0,17.71984,83.26278
66,Nathayyapalem,0,0,17.73854,83.33626
65,Narava,0,0,17.74794,83.26313
64,Narasimha Nagar,0,0,17.69859,83.22464
63,Naidu Thota,0,0,17.70228,83.2102
62,Nadupuru,0,0,17.74051,83.24869


###### Cluster 1

In [36]:
vi_merged.loc[vi_merged['Cluster Labels'] == 1]
#len(vi_merged.loc[vi_merged['Cluster Labels'] == 1])

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
10,Asilmetta,5,1,17.76843,83.31107
111,Waltair Main Road,5,1,17.7968,83.21119
105,"VIP Road, Visakhapatnam",3,1,17.71927,83.19642
112,Waltair Uplands,5,1,17.71191,83.29994
95,"Siripuram, Visakhapatnam",5,1,17.70327,83.30316
45,Kirlampudi Layout,4,1,17.70027,83.30373
81,Rama Talkies Road,5,1,17.66259,83.15983
54,Maharanipeta,4,1,17.68831,83.12011
38,Jagadamba Centre,4,1,17.73361,83.27486
82,"Ramnagar, Visakhapatnam",4,1,17.8199,83.20574


###### Cluster 2

In [37]:
vi_merged.loc[vi_merged['Cluster Labels'] == 2]
#len(vi_merged.loc[vi_merged['Cluster Labels'] == 2])

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
51,Maddilapalem,1,2,17.720637,83.331556
39,Jalari Peta,2,2,17.74557,83.22813
28,Dwaraka Nagar,1,2,17.61405,83.16261
27,Duvvada,1,2,17.63842,83.11695
26,Dondaparthy,2,2,17.718192,83.317072
6,Allipuram,1,2,17.72023,83.29757
89,Sankara Matam Road,2,2,17.52307,82.99121
99,Suryabagh,2,2,17.76962,83.36437
21,Chinna Waltair,2,2,17.71984,83.26278
84,Relli Veedhi,2,2,17.73563,83.32231


# Conclusion

By analyzing the Clusters above we can say that a good number of Restaurants are concentrated in Cluster 1 (Purple) followed by a moderate number of restaurants in Cluster 2 (Green).Cluster 0(Red) with no restaurants provides a suitable opportunity to set up a new restaurant.Meanwhile, restaurants in cluster 2 are likely suffering from competition due to the existing restaurants from Cluster 1. Therefore this project recommends the restaurant investors to capitalize on these findings to open new restaurants in neighborhoods of Cluster 0 with no competition.Moreover the Neighborhoods in Cluster 0 are good residential areas which attract a lot of customers.Lastly, investors are advised to avoid neighborhoods in cluster 1, which have a high concentration of restaurants and suffer from intense competition.