# IBM Applied Data Science Capstone Course
### Week 5 Final Report
**Opening New Public Parks in Delhi, India**
- Build a dataframe of neighborhoods in New Delhi, India by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open Public Parks
***
### 1. Import libraries

In [5]:
import numpy as np 

import pandas as pd 
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge geocoder --yes
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup 

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium
import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [6]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Delhi").text

In [7]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [8]:
# create a list to store neighborhood data
neighborhoodList = []

In [9]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [10]:
# create a new DataFrame from the list
df = pd.DataFrame({"Neighborhood": neighborhoodList})

df.head()

Unnamed: 0,Neighborhood
0,Neighbourhoods of Delhi
1,Ashok Nagar (Delhi)
2,Ashok Vihar
3,Ashram Chowk
4,Babarpur


In [11]:
# print the number of rows of the dataframe
df.shape

(136, 1)

### 3. Get the geographical coordinates

In [12]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Delhi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords


In [13]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [14]:
coords

[[28.523450000000025, 77.26178000000004],
 [28.692230000000052, 77.30124000000006],
 [28.69037000000003, 77.17609000000004],
 [28.710597501792023, 77.32696517369723],
 [28.50738000000007, 77.30346000000003],
 [28.504338758607044, 77.30056383652112],
 [28.652164790882363, 77.12971244261769],
 [28.800590000000057, 77.03473000000008],
 [28.549540000000036, 77.18167000000005],
 [28.699880000000064, 77.25906000000003],
 [28.595060000000046, 77.18573000000004],
 [28.656270000000063, 77.23232000000007],
 [28.538400000000024, 77.24832000000004],
 [28.634100000000046, 77.21689000000003],
 [28.634100000000046, 77.21689000000003],
 [28.60761000000008, 77.08714000000003],
 [28.65457890544559, 77.23339989939495],
 [28.62832000000003, 77.24727000000007],
 [28.605710000000045, 77.08217000000008],
 [28.560590000000047, 77.24678000000006],
 [28.57298000000003, 77.23357000000004],
 [28.591510000000028, 77.12945000000008],
 [28.594855843590793, 77.16728911486429],
 [28.684700000000078, 77.32774000000006]

In [15]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [16]:
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [17]:
# check the neighborhoods and the coordinates
print(df.shape)
df

(136, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Neighbourhoods of Delhi,28.52345,77.26178
1,Ashok Nagar (Delhi),28.69223,77.30124
2,Ashok Vihar,28.69037,77.17609
3,Ashram Chowk,28.710598,77.326965
4,Babarpur,28.50738,77.30346
5,"Badarpur, Delhi",28.504339,77.300564
6,Bali Nagar,28.652165,77.129712
7,Bawana,28.80059,77.03473
8,Ber Sarai,28.54954,77.18167
9,Bhajanpura,28.69988,77.25906


### 4. Create a map of Delhi with neighborhoods superimposed on top

In [18]:
address = 'Delhi, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Delhi, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Delhi, India 28.6517178, 77.2219388.


In [19]:
# create map of Delhi using latitude and longitude values
map_dl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_dl)  
    
map_dl

In [20]:
# save the map as HTML file
map_dl.save('map_kl.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [21]:
# define Foursquare Credentials and Version
CLIENT_ID = 'HUEV0BQM13VF5INPHAJEWZHB5KMTK5TR02EOM2IOH34CEYHH' # your Foursquare ID
CLIENT_SECRET = 'SQXXLXAB5IVVYODNC1AGL1EXK1ZAELOYBNGRTLP4FZCNVIQN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HUEV0BQM13VF5INPHAJEWZHB5KMTK5TR02EOM2IOH34CEYHH
CLIENT_SECRET:SQXXLXAB5IVVYODNC1AGL1EXK1ZAELOYBNGRTLP4FZCNVIQN


**Now, let's get the top 100 venues that are within a radius of 20000 meters.**

In [22]:
radius = 20000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [23]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(13462, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Neighbourhoods of Delhi,28.52345,77.26178,Starbucks,28.534053,77.243059,Coffee Shop
1,Neighbourhoods of Delhi,28.52345,77.26178,The Big Chill Cafe,28.552728,77.241923,Italian Restaurant
2,Neighbourhoods of Delhi,28.52345,77.26178,The Big Chill Cafe,28.528201,77.217748,Restaurant
3,Neighbourhoods of Delhi,28.52345,77.26178,"Fitness First Platinum, Select City Walk",28.528615,77.218361,Gym / Fitness Center
4,Neighbourhoods of Delhi,28.52345,77.26178,Select Citywalk,28.528678,77.219136,Shopping Mall


**Let's check how many venues were returned for each neighorhood**

In [24]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashok Nagar (Delhi),100,100,100,100,100,100
Ashok Vihar,100,100,100,100,100,100
Ashram Chowk,100,100,100,100,100,100
Babarpur,100,100,100,100,100,100
"Badarpur, Delhi",100,100,100,100,100,100
Bali Nagar,100,100,100,100,100,100
Bawana,40,40,40,40,40,40
Ber Sarai,100,100,100,100,100,100
Bhajanpura,100,100,100,100,100,100
Chanakyapuri,100,100,100,100,100,100


**Let's find out how many unique categories can be curated from all the returned venues**

In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 97 uniques categories.


In [26]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Coffee Shop', 'Italian Restaurant', 'Restaurant',
       'Gym / Fitness Center', 'Shopping Mall', 'Bakery', 'Lounge',
       'Indian Restaurant', 'Dessert Shop', 'Food Court',
       'American Restaurant', 'Thai Restaurant', 'Hotel', 'Café',
       'Monument / Landmark', 'Multiplex', 'Market', 'Temple',
       'Asian Restaurant', 'Japanese Restaurant', 'Indie Movie Theater',
       'Park', 'South Indian Restaurant', 'Comfort Food Restaurant',
       'Ice Cream Shop', 'Historic Site', 'Golf Course', 'Irani Cafe',
       'Chinese Restaurant', 'Movie Theater', 'Art Museum',
       'Cocktail Bar', 'French Restaurant', 'Stadium', 'Speakeasy',
       'Sculpture Garden', 'North Indian Restaurant', 'Art Gallery',
       'Other Nightlife', 'Gastropub', 'Hindu Temple', 'Deli / Bodega',
       'Pool', 'Athletics & Sports', 'Plaza', 'Boutique', 'Palace',
       'Food & Drink Shop', 'Clothing Store', 'Food Truck'], dtype=object)

In [27]:
# check if the results contain "Park"
"Park" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [28]:
# one hot encoding
dl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dl_onehot.columns[-1]] + list(dl_onehot.columns[:-1])
dl_onehot = dl_onehot[fixed_columns]

print(dl_onehot.shape)
dl_onehot.head()

(13462, 98)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bed & Breakfast,Big Box Store,Bistro,Boutique,Bowling Alley,Breakfast Spot,Brewery,Café,Castle,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Falafel Restaurant,Farm,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,Golf Course,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Lounge,Market,Mediterranean Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Park,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Restaurant,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Stadium,Temple,Tex-Mex Restaurant,Thai Restaurant,Tibetan Restaurant,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Whisky Bar
0,Neighbourhoods of Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Neighbourhoods of Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Neighbourhoods of Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Neighbourhoods of Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Neighbourhoods of Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [29]:
dl_grouped = dl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(dl_grouped.shape)
dl_grouped

(136, 98)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bed & Breakfast,Big Box Store,Bistro,Boutique,Bowling Alley,Breakfast Spot,Brewery,Café,Castle,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Falafel Restaurant,Farm,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,Golf Course,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Lounge,Market,Mediterranean Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Park,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Restaurant,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Stadium,Temple,Tex-Mex Restaurant,Thai Restaurant,Tibetan Restaurant,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Whisky Bar
0,Ashok Nagar (Delhi),0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.02,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.12,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.13,0.0,0.01,0.15,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.0,0.01,0.03,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
1,Ashok Vihar,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.03,0.03,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.14,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.12,0.0,0.01,0.17,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.02,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
2,Ashram Chowk,0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.02,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.12,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.13,0.0,0.01,0.15,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.0,0.01,0.03,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.03,0.01,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
3,Babarpur,0.01,0.0,0.02,0.01,0.03,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.02,0.0,0.01,0.07,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.1,0.0,0.02,0.09,0.01,0.01,0.01,0.04,0.02,0.0,0.03,0.04,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.04,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
4,"Badarpur, Delhi",0.01,0.0,0.02,0.01,0.03,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.02,0.0,0.01,0.07,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.09,0.0,0.01,0.09,0.0,0.01,0.01,0.05,0.02,0.0,0.03,0.05,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.05,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
5,Bali Nagar,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.03,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.1,0.0,0.01,0.02,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.17,0.0,0.01,0.13,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.02,0.01,0.02,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
6,Bawana,0.025,0.0,0.0,0.0,0.0,0.025,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.075,0.0,0.0,0.025,0.0,0.05,0.1,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.0,0.0,0.0,0.0,0.025,0.05,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0
7,Ber Sarai,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.0,0.01,0.0,0.01,0.08,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.13,0.0,0.01,0.12,0.0,0.0,0.01,0.05,0.01,0.0,0.03,0.03,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.04,0.01,0.05,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0
8,Bhajanpura,0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.01,0.03,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.13,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.14,0.0,0.01,0.17,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
9,Chanakyapuri,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.08,0.0,0.01,0.01,0.01,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.14,0.01,0.01,0.12,0.0,0.0,0.01,0.04,0.01,0.01,0.02,0.03,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.04,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0


In [30]:
len(dl_grouped[dl_grouped["Park"] > 0])

135

**Create a new DataFrame for Park data only**

In [31]:
dl_park = dl_grouped[["Neighborhoods","Park"]]

In [32]:
dl_park.head()

Unnamed: 0,Neighborhoods,Park
0,Ashok Nagar (Delhi),0.01
1,Ashok Vihar,0.01
2,Ashram Chowk,0.01
3,Babarpur,0.01
4,"Badarpur, Delhi",0.01


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Delhi into 3 clusters.

In [33]:
# set number of clusters
kclusters = 3

dl_clustering = dl_park.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 0, 1, 0, 1, 0], dtype=int32)

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
dl_merged = dl_park.copy()

# add clustering labels
dl_merged["Cluster Labels"] = kmeans.labels_

In [35]:
dl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
dl_merged.head()

Unnamed: 0,Neighborhood,Park,Cluster Labels
0,Ashok Nagar (Delhi),0.01,1
1,Ashok Vihar,0.01,1
2,Ashram Chowk,0.01,1
3,Babarpur,0.01,1
4,"Badarpur, Delhi",0.01,1


In [36]:
# merge dataframes to add latitude/longitude for each neighborhood
dl_merged = dl_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(dl_merged.shape)
dl_merged.head() # check the last columns!

(136, 5)


Unnamed: 0,Neighborhood,Park,Cluster Labels,Latitude,Longitude
0,Ashok Nagar (Delhi),0.01,1,28.69223,77.30124
1,Ashok Vihar,0.01,1,28.69037,77.17609
2,Ashram Chowk,0.01,1,28.710598,77.326965
3,Babarpur,0.01,1,28.50738,77.30346
4,"Badarpur, Delhi",0.01,1,28.504339,77.300564


In [37]:
# sort the results by Cluster Labels
print(dl_merged.shape)
dl_merged.sort_values(["Cluster Labels"], inplace=True)
dl_merged

(136, 5)


Unnamed: 0,Neighborhood,Park,Cluster Labels,Latitude,Longitude
67,Meera Bagh,0.02,0,28.66121,77.0869
103,"Rani Bagh, Delhi",0.02,0,28.68584,77.13188
104,"Rohini, Delhi",0.02,0,28.64809,77.12852
64,Mayapuri,0.02,0,28.62334,77.12096
63,Malviya Nagar (Delhi),0.02,0,28.6341,77.21689
61,Mahipalpur,0.02,0,28.54842,77.13636
106,"Sadar Bazaar, Delhi",0.02,0,28.592,77.12099
107,Safdarjung (Delhi),0.02,0,28.569641,77.196495
108,Sagar Pur,0.02,0,28.60588,77.09552
57,Lodhi Colony,0.02,0,28.58476,77.22534


**Finally, let's visualize the resulting clusters**

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dl_merged['Latitude'], dl_merged['Longitude'], dl_merged['Neighborhood'], dl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [40]:
dl_merged.loc[dl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Park,Cluster Labels,Latitude,Longitude
67,Meera Bagh,0.02,0,28.66121,77.0869
103,"Rani Bagh, Delhi",0.02,0,28.68584,77.13188
104,"Rohini, Delhi",0.02,0,28.64809,77.12852
64,Mayapuri,0.02,0,28.62334,77.12096
63,Malviya Nagar (Delhi),0.02,0,28.6341,77.21689
61,Mahipalpur,0.02,0,28.54842,77.13636
106,"Sadar Bazaar, Delhi",0.02,0,28.592,77.12099
107,Safdarjung (Delhi),0.02,0,28.569641,77.196495
108,Sagar Pur,0.02,0,28.60588,77.09552
57,Lodhi Colony,0.02,0,28.58476,77.22534


#### Cluster 1

In [41]:
dl_merged.loc[dl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Park,Cluster Labels,Latitude,Longitude
129,Urdu Bazaar,0.01,1,28.649888,77.235128
132,Vivek Vihar subdivision,0.01,1,28.64642,77.30615
93,Pandav Nagar,0.01,1,28.61458,77.27574
130,Vasant Kunj,0.01,1,28.53152,77.1502
116,Shahdara district,0.01,1,28.6539,77.29641
112,Sarita Vihar,0.01,1,28.55038,77.28341
111,Sarai Kale Khan,0.01,1,28.58518,77.26346
126,Sriniwaspuri,0.01,1,28.56568,77.25733
105,Roop Nagar,0.01,1,28.68372,77.19747
109,Saket (Delhi),0.01,1,28.708,77.04971


#### Cluster 2

In [42]:
dl_merged.loc[dl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Park,Cluster Labels,Latitude,Longitude
78,Narela,0.045455,2,28.83979,77.07696


#### Observations:
Most of the Parks are concentrated in the New Delhi and South Delhi area of Delhi city where mainly VIPs, Politicians and businessmen live, which falls under cluster 0 area.On the other hand, Rural areas(Cluster 2) and conjested areas(Cluster 1) which are homes to lower and middle class people have very low number of Parks. This means that Government should build Public Parks in These clusters. Which will enable the people of these areas to combat pollution, provide play grounds for children and trees in these parks will benefit the climate also. The rising pollution in Delhi is concern and building parks will lower down the pollution to much extent.