### Introduction/Business Problem
Using Data Science and Machine Learning, identify the best locations (cluster of neighborhoods) for opening a __Bakery__ in Kochi, India. 

### Data
List of neighborhoods in Kochi, India is available in Wikipedia at https://en.wikipedia.org/wiki/Category:Suburbs_of_Kochi. Dataframe of neighborhoods in Kochi, India can be made by scraping the data from Wikipedia page using __BeautifulSoup__ library.

### Methodology
Once the Dataframe of neighborhoods in Kochi, India is made by scraping the data from Wikipedia page using __BeautifulSoup__ library, the neighborhood addresses are converted into their equivalent latitude and longitude values using geocoder library. Using the lattitude & longitude coordinates, __Foursquare API__ is invoked to explore neighborhoods in Kochi, India. Explore function is used to get the common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. __k-means__ clustering algorithm is used to cluseter the neighborhoods into three based on mumber of Bakeries: High, Medium, Low. Finally, __Folium__ library is used to visualize the neighborhoods in Kochi India and their clusters.

### 0. Install & Import Libraries

In [87]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
!conda install -c conda-forge geocoder --yes

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [89]:
import geocoder

In [90]:
import numpy as np 
import pandas as pd 
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json 

In [91]:
from geopy.geocoders import Nominatim 

import requests 
from bs4 import BeautifulSoup 

from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

import folium 

print("Libraries imported.")

Libraries imported.


### 1. Download and Explore Dataset - Scrap data from Wikipedia page into a Data Frame

In [92]:
df = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Kochi").text

In [93]:
soup = BeautifulSoup(df, 'html.parser')

In [94]:
neighborhood= []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhood.append(row.text)
loc_df = pd.DataFrame({"Neighborhood": neighborhood})
loc_df.head()

Unnamed: 0,Neighborhood
0,Alangad
1,Angamaly
2,Aroor
3,Chellanam
4,Chendamangalam


In [95]:
loc_df.shape

(44, 1)

### 2. Get Lattitude & Longitude of the Neigborhoods

In [96]:
def get_latlng(neighborhood):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kochi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [97]:
coords = [ get_latlng(neighborhood) for neighborhood in loc_df["Neighborhood"].tolist() ]

In [98]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [99]:
loc_df['Latitude'] = df_coords['Latitude']
loc_df['Longitude'] = df_coords['Longitude']

In [100]:
loc_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alangad,10.8475,76.43609
1,Angamaly,10.20366,76.38268
2,Aroor,9.93599,76.26145
3,Chellanam,9.83526,76.27029
4,Chendamangalam,10.17292,76.23346
