# Battle of Neighbourhood - Harrow (London)

### 1. Introduction

When you consider the demography of London, you will see a rainbow — a community of multi-colour multi-ethnic groups. From the South of the River Thames and across to the other side of it, the diversity of London is the strength every business from across the walks of life has taken opportunity of.

In London Borough of Harrow, 63.8% of its population from the BME (Black and Minority Ethnic) communities, with the largest group being of Indian ethnicity, 26.4% (specifically those from Gujarat and South India). But something that really lack its essence is the availability of an high-end Indian Restaurant that caters beyond the food but comes with the ambience of the service. With the arrays of restaurants in Harrow, there are peculiarly lacking the finest restaurant that can answer the call within certain areas of interest when it comes to Indian cuisines and flavours.

In order to get a very good location for a restaurant that meet this need, the Harrow area is explored through clustering and segmentation based on the Borough Coordinates & Indian Population in Boroughs and proximity to supplies.

### 2. Data Section

##### 2.1. DATA SETS:WIKI PAGES

**Wiki page – List of London boroughs**

The London Area consists of 32 Boroughs and the “City of London”. Our data – Boroughs, area, Latitude, longitude, etc., will be from the wiki page : https://en.wikipedia.org/wiki/List_of_London_boroughs
The focus of this project will be the neighborhoods are that are within London Borough of Harrow


**Wiki page – Indian community of London**

The percentage of Indian population in various London Boroughs are collected from the wiki page: https://en.wikipedia.org/wiki/Indian_community_of_London 
The BeautifulSoup package is used to scrap the needed data from Wikipedia. 

##### 2.2. DATA SETS: FOURSQUARE API

The Foursquare API will be used to obtain the Harrow Area venues for the geographical location data. These will be used to explore the neighborhoods of Harrow accordingly.

The venues within the neighborhoods of Harrow like the area’s restaurants and proximity to amenities would be correlated. Also, accessibility and ease of supplies would be considered as it relates to venues.

##### 2.3. DATA SETS: GEOCODER PACKAGE

In obtaining the location data of the locations, the Geocoder package is used with the ArcGIS geocoder to obtain the latitude and longitude of the needed locations.

These will help to create a new data frame that will be used subsequently for the Harrow areas.


##### 2.4. PYTHON LIBRARY FILES

The Below Libraries are used:
<br>• Pandas - Library for Data Analysis 
<br>• NumPy – Library to handle data in a vectorized manner
<br>• JSON – Library to handle JSON files
<br>• Folium – Map rendering Library
<br>• Matplotlib – Python Plotting Module
<br>• Geopy – To retrieve Location Data
<br>• Requests – Library to handle http requests
<br>• Sklearn – Python machine learning Library


#####  2.5. FOLIUM LIBRARY

Python visualization library is used to visualize the neighborhoods cluster distribution of Chicago city over an interactive leaflet map. Extensive comparative analysis of two randomly picked neighborhoods world be carried out to derive the desirable insights from the outcomes using python’s scientific libraries Pandas, NumPy and Scikit-learn.

### 3. Data Preparation

In [5]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
   
# tranforming json file into a pandas dataframe library
import json # library to handle JSON files
from pandas.io.json import json_normalize

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [11]:
import csv
import os
import platform

from bs4 import BeautifulSoup
import requests

In [7]:
CLIENT_ID = 'JHN451BKG4FJOAJR0RZ5ONIGTYNOMGP1RDNVROCZDVA3S2HN' # your Foursquare ID
CLIENT_SECRET = 'U4FLNGFTDCQO5ZBPIBM1P5OHRX0YR3YQTANKU12XRHYZXTEF' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JHN451BKG4FJOAJR0RZ5ONIGTYNOMGP1RDNVROCZDVA3S2HN
CLIENT_SECRET:U4FLNGFTDCQO5ZBPIBM1P5OHRX0YR3YQTANKU12XRHYZXTEF


#### <u> London_Indian_Population </u>

In [26]:
fname1 = 'London'
fil2 = 'London_Indian_Population'
fl2 = os.path.join(fname1, fil2) + '.csv'
print(fl2)
ind_population = pd.read_csv(fl2, sep=',')
ind_population.head()

London/London_Indian_Population.csv


Unnamed: 0,Community,Percentage of Indian population
0,Harrow,26.4
1,Hounslow,19.0
2,Brent,18.6
3,Redbridge,16.4
4,Ealing,14.3


#### <u> London boroughs Data </u>

In [18]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
wikipedia_page = requests.get(wikipedia_link)

soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
popLn = pd.DataFrame(columns = columns)

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        popLn = popLn.append(pd.Series(values, index = columns), ignore_index = True)
        popLn

popLn.head(5)

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,25
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,20


In [19]:
#Data Clean Up
#popLn = popLn.drop('Postcode', axis=1).join(popLn['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))
popLn = popLn[['Borough','Area (sq mi)','Population (2013 est)[1]']].reset_index(drop=True)
popLn['Borough'] = popLn['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
popLn.shape
#London = popLn

popLn.head()
popLn

Unnamed: 0,Borough,Area (sq mi),Population (2013 est)[1]
0,Barking and Dagenham [note,13.93,194352
1,Barnet,33.49,369088
2,Bexley,23.38,236687
3,Brent,16.7,317264
4,Bromley,57.97,317899
5,Camden,8.4,229719
6,Croydon,33.41,372752
7,Ealing,21.44,342494
8,Enfield,31.74,320524
9,Greenwich [note,18.28,264008


In [20]:
popLn.columns = ['Borough','Area','Population']
popLn['Population'] = popLn['Population'].map(lambda x: x.rstrip(',').rstrip(',').rstrip(','))
popLn

popLn["Population"] = popLn["Population"].str.replace(",","").astype(float)
popLn["Population"] = pd.to_numeric(popLn["Population"])
popLn["Area"] = pd.to_numeric(popLn["Area"])


#### <u> London borough with Max Area & Population  </u>

In [22]:
print('Borough with Maximum Population : ' )
print(popLn.loc[popLn['Population'].idxmax()])
print('' )
print('Borough with Maximum Area : ' )
print(popLn.loc[popLn['Area'].idxmax()])

Borough with Maximum Population : 
Borough       Croydon
Area            33.41
Population     372752
Name: 6, dtype: object

Borough with Maximum Area : 
Borough       Bromley
Area            57.97
Population     317899
Name: 4, dtype: object


In [33]:
#ind_population['Indian population %'] = ind_population['Indian population %'].apply(pd.to_numeric) 
print('Borough with Maximum Indian Population : Harrow - 26.4%')
#print(ind_population.loc[ind_population['Indian population %'].idxmax()])

#ind_population[['Indian population %']] = ind_population[['Indian population %']].apply(pd.to_numeric) 
#ind_population.sort_values(by=['Indian population %'], ascending=False)
#ind_population.head()

Borough with Maximum Indian Population : Harrow - 26.4%


#Based on the Maximum Population & Area, we have selected Harrow for our analysis.

### <u> London borough & Neighborhood Data exploration </u>

The web scrapped of the Wikipedia page for the Greater London Area data is provided below:

The BeautifulSoup package is used to scrap the needed data from Wikipedia.

In [35]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
wikipedia_page = requests.get(wikipedia_link)

soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
df = pd.DataFrame(columns = columns)

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)
        df

df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [1]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[2]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[2],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[2],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


### <u> Data Clean Up

In [38]:
df.columns = ['Location','Borough','Post-town','Postcode','Dial-code','Ref']
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))
df = df[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)
print (df.shape)
df.head()

(638, 4)


Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Addington,Croydon,CR0,CROYDON
4,Addiscombe,Croydon,CR0,CROYDON


In [39]:
#Ln = df[df['Post-town'].str.contains('LONDON')]
#Ln = df[df['Borough'].str.contains('Croydon' )]
#Ln = df[df['Borough'].isin(['Croydon', 'Bromley']) ]
Ln = df[df['Borough'].isin(['Harrow']) ]
Ln = Ln[['Location','Borough','Postcode','Post-town']].reset_index(drop=True)
Ln.Postcode = Ln.Postcode.str.strip()
print(Ln.shape)
Ln.head()

(14, 4)


Unnamed: 0,Location,Borough,Postcode,Post-town
0,Belmont,Harrow,HA3,"HARROW, STANMORE"
1,Belmont,Harrow,HA7,"HARROW, STANMORE"
2,Harrow,Harrow,HA1,HARROW
3,Harrow on the Hill,Harrow,HA1,HARROW
4,Harrow Weald,Harrow,HA3,HARROW


In [40]:
crLn = Ln.reset_index(drop=True)
crLn.columns = ['Neighborhood','Borough','Postcode','Post-town']
print(crLn.shape)
crLn.loc[crLn['Neighborhood'].isin(['Widmore (also Widmore Green)'])]='Widmore'
crLn
#crLn.head()

(14, 4)


Unnamed: 0,Neighborhood,Borough,Postcode,Post-town
0,Belmont,Harrow,HA3,"HARROW, STANMORE"
1,Belmont,Harrow,HA7,"HARROW, STANMORE"
2,Harrow,Harrow,HA1,HARROW
3,Harrow on the Hill,Harrow,HA1,HARROW
4,Harrow Weald,Harrow,HA3,HARROW
5,Hatch End,Harrow,HA5,PINNER
6,North Harrow,Harrow,HA1,HARROW
7,North Harrow,Harrow,HA2,HARROW
8,Pinner,Harrow,HA5,PINNER
9,Rayners Lane,Harrow,HA5,PINNER


#### <U> Geocoder - to obtain Coordinate for each area

In [42]:
def get_latlon(loc_ln):
    geolocator = Nominatim(user_agent="Ln_explorer")
    location1 = geolocator.geocode(loc_ln)
    latitude = location1.latitude
    longitude = location1.longitude
    co_ordin = [latitude, longitude]
    #print(loc_ln)
    return co_ordin

In [43]:
addington = get_latlon('addington')
WestHarrow = get_latlon('West Harrow')
print(addington)
print(WestHarrow)

[51.9509273, -0.9212331]
[51.5795852, -0.3530692]


In [44]:
location = crLn['Neighborhood']
coordinates = [get_latlon(location) for location in location.tolist()]

In [45]:
crLnd = crLn
# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
crLnd_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
crLnd['Latitude'] = crLnd_coordinates['Latitude']
crLnd['Longitude'] = crLnd_coordinates['Longitude']

crLnd

Unnamed: 0,Neighborhood,Borough,Postcode,Post-town,Latitude,Longitude
0,Belmont,Harrow,HA3,"HARROW, STANMORE",45.877912,5.652908
1,Belmont,Harrow,HA7,"HARROW, STANMORE",45.877912,5.652908
2,Harrow,Harrow,HA1,HARROW,51.596769,-0.337275
3,Harrow on the Hill,Harrow,HA1,HARROW,51.57927,-0.336656
4,Harrow Weald,Harrow,HA3,HARROW,51.604786,-0.340485
5,Hatch End,Harrow,HA5,PINNER,51.60844,-0.373548
6,North Harrow,Harrow,HA1,HARROW,51.585162,-0.363176
7,North Harrow,Harrow,HA2,HARROW,51.585162,-0.363176
8,Pinner,Harrow,HA5,PINNER,51.596871,-0.377014
9,Rayners Lane,Harrow,HA5,PINNER,51.576714,-0.3703


In [46]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(crLnd['Borough'].unique()),
        crLnd.shape[0]
    )
)

The dataframe has 1 boroughs and 14 neighborhoods.


In [47]:
lnBeck = crLnd
lnBeck.loc[13, 'Neighborhood']
neighborhood_latitude = lnBeck.loc[13, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = lnBeck.loc[13, 'Longitude'] # neighborhood longitude value

neighborhood_name = lnBeck.loc[13, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                              neighborhood_longitude))

Latitude and longitude values of West Harrow are 51.5795852, -0.3530692.


##### <u> Let’s explore the top 100 venues that are within a 2000 metres radius of Lewisham. And then, let’s create the GET request URL, and then the url is named.

In [48]:
radius = 2000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL
results = requests.get(url).json()
#results

{'meta': {'code': 200, 'requestId': '5cbbec199fb6b776fcf95781'},
 'response': {'headerLocation': 'West Harrow',
  'headerFullLocation': 'West Harrow, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 41,
  'suggestedBounds': {'ne': {'lat': 51.59758521800001,
    'lng': -0.32415766151483744},
   'sw': {'lat': 51.56158518199998, 'lng': -0.3819807384851626}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd1ab0b046076b074307271',
       'name': 'West Harrow Park',
       'location': {'address': 'The Ridgeway',
        'lat': 51.577553917349476,
        'lng': -0.354538592936466,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.577553917349476,
          'lng': -0.354538592936466}],
        'distance': 247,
        'postalCode': 

In [49]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [51]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,West Harrow Park,Park,51.577554,-0.354539
1,Harrow Recreation Ground,Park,51.585777,-0.345683
2,Twist Ice Cream,Ice Cream Shop,51.586708,-0.361651
3,Nando's,Portuguese Restaurant,51.581665,-0.333119
4,Kebab Land,Middle Eastern Restaurant,51.580034,-0.335987
5,the chocolate room,Coffee Shop,51.58097,-0.333788
6,Saravanna Bhavan,Indian Restaurant,51.572855,-0.371405
7,Waitrose & Partners,Supermarket,51.567635,-0.352037
8,The Dolls House,Café,51.571992,-0.338525
9,The Gym,Gym / Fitness Center,51.586388,-0.361284


In [52]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

30 venues were returned by Foursquare.


#### <u> Top 5 venues in West Harrow

In [53]:
nearby_venues_unique = nearby_venues['categories'].value_counts().to_frame(name='Count')

nearby_venues_unique.head(5)

Unnamed: 0,Count
Indian Restaurant,5
Pub,3
Ice Cream Shop,2
Portuguese Restaurant,2
Supermarket,2


In [54]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        #results = requests.get(url).json()["response"]['groups'][0]['items']
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.head()
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
london_venues = getNearbyVenues(names=crLnd['Neighborhood'],
                                   latitudes=crLnd['Latitude'],
                                   longitudes=crLnd['Longitude']
                                  )

Belmont
Belmont
Harrow
Harrow on the Hill
Harrow Weald
Hatch End
North Harrow
North Harrow
Pinner
Rayners Lane
South Harrow
Stanmore
Wealdstone
West Harrow


In [56]:
print(london_venues.shape)
london_venues.head()

(357, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harrow,51.596769,-0.337275,Everest Lounge,51.594233,-0.332192,Indian Restaurant
1,Harrow,51.596769,-0.337275,Waitrose & Partners,51.605329,-0.339664,Supermarket
2,Harrow,51.596769,-0.337275,Harrow Recreation Ground,51.585777,-0.345683,Park
3,Harrow,51.596769,-0.337275,The Bombay Central,51.60454,-0.339608,Indian Restaurant
4,Harrow,51.596769,-0.337275,Shree Krishna Vada Pav (SKVP),51.587331,-0.332211,Indian Restaurant


In [57]:
london_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Harrow,30,30,30,30,30,30
Harrow Weald,29,29,29,29,29,29
Harrow on the Hill,30,30,30,30,30,30
Hatch End,30,30,30,30,30,30
North Harrow,60,60,60,60,60,60
Pinner,30,30,30,30,30,30
Rayners Lane,30,30,30,30,30,30
South Harrow,30,30,30,30,30,30
Stanmore,28,28,28,28,28,28
Wealdstone,30,30,30,30,30,30


In [58]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 48 uniques categories.


In [59]:
london_venues_unique_count = london_venues['Venue Category'].value_counts().to_frame(name='Count')
london_venues_unique_count.head(10)

Unnamed: 0,Count
Indian Restaurant,41
Coffee Shop,34
Pub,29
Supermarket,23
Park,21
Gym / Fitness Center,18
Café,18
Grocery Store,15
Sandwich Place,14
Fast Food Restaurant,10


In [62]:
address = 'Harrow'

geolocator = Nominatim(user_agent="ln_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Harrow are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Harrow are 51.5967688, -0.33727515543507.


In [63]:
map_london = folium.Map(location = [latitude, longitude], zoom_start = 12)
map_london

In [64]:
lnBeck_df = lnBeck.loc[lnBeck['Neighborhood'] == 'Harrow'] 

In [65]:
# Adding markers to map
for lat, lng, borough, loc in zip(crLnd['Latitude'], 
                                  crLnd['Longitude'],
                                  crLnd['Borough'],
                                  crLnd['Neighborhood']):
    label = '{} - {}'.format(loc, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_london)  
    
display(map_london)

In [67]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighborhood'] = london_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]
print(london_onehot.shape)
london_onehot.head()


(357, 49)


Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bar,Beer Store,Bookstore,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Deli / Bodega,Fast Food Restaurant,Fish & Chips Shop,Forest,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Karaoke Bar,Mediterranean Restaurant,Middle Eastern Restaurant,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pizza Place,Portuguese Restaurant,Pub,Restaurant,Rugby Pitch,Sandwich Place,Scenic Lookout,Seafood Restaurant,Sports Bar,Steakhouse,Supermarket,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Harrow,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Harrow,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
2,Harrow,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Harrow,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Harrow,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [68]:
london_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()
london_grouped.shape

(11, 49)

In [69]:
london_grouped.loc[london_grouped['Indian Restaurant'] != 0]

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bar,Beer Store,Bookstore,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Deli / Bodega,Fast Food Restaurant,Fish & Chips Shop,Forest,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Karaoke Bar,Mediterranean Restaurant,Middle Eastern Restaurant,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pizza Place,Portuguese Restaurant,Pub,Restaurant,Rugby Pitch,Sandwich Place,Scenic Lookout,Seafood Restaurant,Sports Bar,Steakhouse,Supermarket,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Harrow,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.066667,0.033333,0.133333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.133333,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.033333,0.066667,0.0,0.0
1,Harrow Weald,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.068966,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.034483,0.0,0.068966,0.068966,0.068966,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.103448,0.034483,0.0,0.0,0.034483,0.0,0.0,0.034483,0.034483,0.0,0.034483,0.068966,0.068966,0.034483,0.0
2,Harrow on the Hill,0.0,0.033333,0.066667,0.0,0.033333,0.0,0.033333,0.033333,0.1,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.1,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.066667,0.0,0.0,0.033333,0.1,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0
4,North Harrow,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.1,0.0,0.0,0.033333,0.133333,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.1,0.0,0.033333,0.0,0.1,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333
6,Rayners Lane,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.1,0.0,0.066667,0.0,0.0,0.033333,0.233333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333,0.033333,0.066667,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333
7,South Harrow,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.066667,0.033333,0.0,0.0,0.033333,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.133333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.1,0.0,0.033333
8,Stanmore,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.035714,0.071429,0.0,0.035714,0.035714,0.0,0.0,0.0,0.035714,0.0,0.071429,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.107143,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.071429,0.035714,0.035714,0.0,0.035714,0.035714,0.0,0.0,0.035714,0.071429,0.035714,0.0,0.0
9,Wealdstone,0.0,0.033333,0.066667,0.0,0.0,0.0,0.066667,0.033333,0.133333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.133333,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.033333,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0
10,West Harrow,0.0,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.066667,0.0,0.0,0.066667,0.166667,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,0.066667,0.1,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0


In [70]:
num_top_venues = 10

for hood in london_grouped['Neighborhood']:
    #print("----"+hood+"----")
    temp = london_grouped[london_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    #print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    #print('\n')

In [71]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [81]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = london_grouped['Neighborhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harrow,Coffee Shop,Indian Restaurant,Supermarket,Sandwich Place,Pub,Café,Grocery Store,Bar,Irish Pub,Fast Food Restaurant
1,Harrow Weald,Park,Supermarket,Steakhouse,Grocery Store,Gym,Gym / Fitness Center,Indian Restaurant,Coffee Shop,Museum,Golf Course
2,Harrow on the Hill,Coffee Shop,Pub,Indian Restaurant,Park,Supermarket,Bar,Fast Food Restaurant,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant
3,Hatch End,Supermarket,Italian Restaurant,Pub,Café,Grocery Store,Coffee Shop,Pizza Place,Performing Arts Venue,Karaoke Bar,Deli / Bodega
4,North Harrow,Coffee Shop,Indian Restaurant,Gym / Fitness Center,Park,Pub,Café,Pizza Place,Grocery Store,Ice Cream Shop,Italian Restaurant


In [82]:
london_grouped_clustering = london_grouped.drop('Neighborhood', 1)

In [83]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 1, 2, 4, 2, 0, 0, 3, 1], dtype=int32)

In [84]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = crLnd

# match/merge SE London data with latitude/longitude for each neighborhood
london_merged_latlong = london_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')
#cols = ['Cluster Labels']
#london_merged_latlong[cols] = (london_merged_latlong[cols]//1)
london_merged_latlong.head()

Unnamed: 0,Neighborhood,Borough,Postcode,Post-town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Belmont,Harrow,HA3,"HARROW, STANMORE",45.877912,5.652908,,,,,,,,,,,
1,Belmont,Harrow,HA7,"HARROW, STANMORE",45.877912,5.652908,,,,,,,,,,,
2,Harrow,Harrow,HA1,HARROW,51.596769,-0.337275,1.0,Coffee Shop,Indian Restaurant,Supermarket,Sandwich Place,Pub,Café,Grocery Store,Bar,Irish Pub,Fast Food Restaurant
3,Harrow on the Hill,Harrow,HA1,HARROW,51.57927,-0.336656,1.0,Coffee Shop,Pub,Indian Restaurant,Park,Supermarket,Bar,Fast Food Restaurant,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant
4,Harrow Weald,Harrow,HA3,HARROW,51.604786,-0.340485,3.0,Park,Supermarket,Steakhouse,Grocery Store,Gym,Gym / Fitness Center,Indian Restaurant,Coffee Shop,Museum,Golf Course


In [98]:
london_merged_latlong=london_merged_latlong.dropna()
london_merged_latlong.head()

Unnamed: 0,Neighborhood,Borough,Postcode,Post-town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Harrow,Harrow,HA1,HARROW,51.596769,-0.337275,1.0,Coffee Shop,Indian Restaurant,Supermarket,Sandwich Place,Pub,Café,Grocery Store,Bar,Irish Pub,Fast Food Restaurant
3,Harrow on the Hill,Harrow,HA1,HARROW,51.57927,-0.336656,1.0,Coffee Shop,Pub,Indian Restaurant,Park,Supermarket,Bar,Fast Food Restaurant,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant
4,Harrow Weald,Harrow,HA3,HARROW,51.604786,-0.340485,3.0,Park,Supermarket,Steakhouse,Grocery Store,Gym,Gym / Fitness Center,Indian Restaurant,Coffee Shop,Museum,Golf Course
5,Hatch End,Harrow,HA5,PINNER,51.60844,-0.373548,2.0,Supermarket,Italian Restaurant,Pub,Café,Grocery Store,Coffee Shop,Pizza Place,Performing Arts Venue,Karaoke Bar,Deli / Bodega
6,North Harrow,Harrow,HA1,HARROW,51.585162,-0.363176,4.0,Coffee Shop,Indian Restaurant,Gym / Fitness Center,Park,Pub,Café,Pizza Place,Grocery Store,Ice Cream Shop,Italian Restaurant


In [97]:
fname3 = 'London'
fil3 = 'london_merged_latlong'
fl3 = os.path.join(fname3, fil3) + '.csv'
print(fl3)
london_merged_latlong.to_csv(fl3, sep=',', encoding='utf-8', index=False)

London/london_merged_latlong.csv


In [107]:
fname3 = 'London'
fil3 = 'london_merged_latlong_edited'
fl3 = os.path.join(fname3, fil3) + '.csv'
print(fl3)
london_merged_latlong = pd.read_csv(fl3, sep=',')
london_merged_latlong

London/london_merged_latlong_edited.csv


Unnamed: 0,Neighborhood,Borough,Postcode,Post-town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harrow,Harrow,HA1,HARROW,51.596769,-0.337275,1,Coffee Shop,Indian Restaurant,Supermarket,Sandwich Place,Pub,Café,Grocery Store,Bar,Irish Pub,Fast Food Restaurant
1,Harrow on the Hill,Harrow,HA1,HARROW,51.57927,-0.336656,1,Coffee Shop,Pub,Indian Restaurant,Park,Supermarket,Bar,Fast Food Restaurant,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant
2,Harrow Weald,Harrow,HA3,HARROW,51.604786,-0.340485,3,Park,Supermarket,Steakhouse,Grocery Store,Gym,Gym / Fitness Center,Indian Restaurant,Coffee Shop,Museum,Golf Course
3,Hatch End,Harrow,HA5,PINNER,51.60844,-0.373548,2,Supermarket,Italian Restaurant,Pub,Café,Grocery Store,Coffee Shop,Pizza Place,Performing Arts Venue,Karaoke Bar,Deli / Bodega
4,North Harrow,Harrow,HA1,HARROW,51.585162,-0.363176,4,Coffee Shop,Indian Restaurant,Gym / Fitness Center,Park,Pub,Café,Pizza Place,Grocery Store,Ice Cream Shop,Italian Restaurant
5,North Harrow,Harrow,HA2,HARROW,51.585162,-0.363176,4,Coffee Shop,Indian Restaurant,Gym / Fitness Center,Park,Pub,Café,Pizza Place,Grocery Store,Ice Cream Shop,Italian Restaurant
6,Pinner,Harrow,HA5,PINNER,51.596871,-0.377014,2,Italian Restaurant,Coffee Shop,Gym / Fitness Center,Café,Pizza Place,Supermarket,Pub,Karaoke Bar,Museum,Park
7,Rayners Lane,Harrow,HA5,PINNER,51.576714,-0.3703,0,Indian Restaurant,Grocery Store,Pub,Gym / Fitness Center,Fast Food Restaurant,Coffee Shop,Park,Supermarket,Sandwich Place,Café
8,South Harrow,Harrow,HA2,HARROW,51.564652,-0.352221,0,Indian Restaurant,Pub,Supermarket,Coffee Shop,Grocery Store,Vegetarian / Vegan Restaurant,North Indian Restaurant,Bar,Café,Fast Food Restaurant
9,Stanmore,Harrow,HA7,STANMORE,51.617421,-0.309511,3,Indian Restaurant,Golf Course,Coffee Shop,Pizza Place,Steakhouse,Sports Bar,Gas Station,Fast Food Restaurant,Supermarket,Italian Restaurant


In [102]:
# create map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged_latlong['Latitude'], london_merged_latlong['Longitude'], london_merged_latlong['Neighborhood'], london_merged_latlong['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

display(map_clusters)

In [104]:
# Cluster 1
london_merged_latlong.loc[london_merged_latlong['Cluster Labels'] == 0, london_merged_latlong.columns[[1] + list(range(5, london_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Harrow,-0.3703,0,Indian Restaurant,Grocery Store,Pub,Gym / Fitness Center,Fast Food Restaurant,Coffee Shop,Park,Supermarket,Sandwich Place,Café
8,Harrow,-0.352221,0,Indian Restaurant,Pub,Supermarket,Coffee Shop,Grocery Store,Vegetarian / Vegan Restaurant,North Indian Restaurant,Bar,Café,Fast Food Restaurant


In [105]:
# Cluster 2
london_merged_latlong.loc[london_merged_latlong['Cluster Labels'] == 1, london_merged_latlong.columns[[1] + list(range(5, london_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harrow,-0.337275,1,Coffee Shop,Indian Restaurant,Supermarket,Sandwich Place,Pub,Café,Grocery Store,Bar,Irish Pub,Fast Food Restaurant
1,Harrow,-0.336656,1,Coffee Shop,Pub,Indian Restaurant,Park,Supermarket,Bar,Fast Food Restaurant,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant
10,Harrow,-0.329476,1,Coffee Shop,Indian Restaurant,Supermarket,Bar,Sandwich Place,Fast Food Restaurant,Café,Pub,Portuguese Restaurant,Ice Cream Shop
11,Harrow,-0.353069,1,Indian Restaurant,Pub,Park,Coffee Shop,Supermarket,Gym / Fitness Center,Ice Cream Shop,Portuguese Restaurant,Gym,Irish Pub


In [124]:
# Cluster 3
london_merged_latlong.loc[london_merged_latlong['Cluster Labels'] == 2, london_merged_latlong.columns[[1] + list(range(5, london_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Harrow,-0.336656,2,Coffee Shop,Clothing Store,Portuguese Restaurant,Gym,Pizza Place,Women's Store,Middle Eastern Restaurant,Bar,Bookstore,Department Store
3,Harrow,-0.373548,2,Deli / Bodega,Chinese Restaurant,Burger Joint,Grocery Store,Furniture / Home Store,Italian Restaurant,Pizza Place,Pub,Greek Restaurant,Seafood Restaurant
6,Harrow,-0.377014,2,Italian Restaurant,Coffee Shop,Wine Bar,Pizza Place,Café,Bus Stop,Sandwich Place,Pub,Bookstore,Supermarket
8,Harrow,-0.352221,2,Furniture / Home Store,Fast Food Restaurant,Supermarket,Portuguese Restaurant,Indian Restaurant,Coffee Shop,Metro Station,Park,Pizza Place,Bakery
9,Harrow,-0.309511,2,Coffee Shop,Gas Station,Italian Restaurant,Pizza Place,Platform,Portuguese Restaurant,Indian Restaurant,Sandwich Place,Supermarket,Middle Eastern Restaurant


In [125]:
# Cluster 4
london_merged_latlong.loc[london_merged_latlong['Cluster Labels'] == 3, london_merged_latlong.columns[[1] + list(range(5, london_merged_latlong.shape[1]))]]


Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Harrow,-0.353069,3,Café,Park,Indian Restaurant,Grocery Store,Women's Store,Gas Station,Furniture / Home Store,French Restaurant,Food & Drink Shop,Fast Food Restaurant


In [128]:
# Cluster 5
london_merged_latlong.loc[london_merged_latlong['Cluster Labels'] == 4, london_merged_latlong.columns[[1] + list(range(5, london_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Harrow,-0.340485,4,Indian Restaurant,Park,Thai Restaurant,Supermarket,Grocery Store,Beer Store,Deli / Bodega,Gas Station,Furniture / Home Store,French Restaurant
7,Harrow,-0.3703,4,Indian Restaurant,Grocery Store,Coffee Shop,Pizza Place,Sandwich Place,Pub,Café,Lawyer,Fast Food Restaurant,Betting Shop


## Conlcusion

The following are the highlights of Harrow Clusters:
1.	Coffee Shops, Indian restaurants & Italian restaurants are popular in the Harrow.
2.	As for Indian restaurants, we can see the neighborhoods Rayner’s Lane, Harrow Weald and South Harrow which top in the Indians visited venues.
3.	Also, to note, Harrow & Wealdstone are having second top Indians visited venues which are close to Rayner’s Lanes. But Harrow Weald is comparatively far from said 3 neighborhoods.

Considering all the analysis with the available Data - Below are the neighborhoods in the priority order to open Indian Restaurants:
1.	Rayner’s Lane
2.	Harrow Weald
3.	South Harrow

Below are the few other factors in terms of data would have given better results with more insight into the best location. 
1.	Real Estate 
2.	Crime data
3.	Income per capita
4.	Traffic Data
5.	More venues exploration with the Foursquare
6.	Ratings and feedback of the current restaurants within the clusters
