#Battle of the Neighbourhoods - Introduction

###**Description of the problem and a discussion of the background**
After my bachelor's degree I worked in London for two years and absolutely loved the city. There are so many restaurants everywhere, with so many cuisines to choose from. It can be especially challenging for an entrepreneur to choose a good location for their new venue. Furthermore, rent is ridiculously expensive and it can be very challenging for entrepreneurs to open a new restaurant or venue in the city centre. It might be more attractive to them to open a venue in one of the boroughs of London, so they pay cheaper rent whilst they build up their brand.

The challenge then is: which borough should an entrepreneur open a new venue in?

This is what this study will aim to answer by analysing data fomr the different boroughs of London.

###**Description of the data and how it will be used to solve the problem**
The data that will be used for this project will be as follows:

- Data about the bouroughs in London from [Wikipedia](https://en.wikipedia.org/wiki/List_of_London_boroughs)

- Data about average rent in London from the [UK Government website](https://www.gov.uk/government/statistics/private-rental-market-summary-statistics-april-2018-to-march-2019)
- Geographical data: using geopy the coordinates for each borough will be found
- Venue data: Foursquare will be used to collect data about the existing venues in each borough
After the data is collected a k-means clustering method will be used to cluster the boroughs and visualise this data on a map.

First the required libraries will be imported

In [100]:
import pandas as pd
import numpy as np

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.distance import great_circle

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import json # library to handle JSON files

import requests # library to handle requests
from requests import get

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

import seaborn as sns

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported')

Libraries imported


<h1>Battle of the Neighbourhoods - Data Importing and Cleaning</h1>

Now, the borough data will be scraped from Wikipedia, inputted into a dataframe, saved as a csv and used for the analysis in this project. <br>
Some more libraries, specficially those for scraping (Beautiful Soup and Urllibrequest) need to be imported as well.<br>
Following this the data will be cleaned before continuing to the analysis.

In [101]:
#scraping wikipedia data

import urllib.request #importing the library we use to open URLs

url = "https://en.wikipedia.org/wiki/List_of_London_boroughs" 
# specify which URL/web page we are going to be scraping

page = urllib.request.urlopen(url)
# open the url using urllib.request and put the HTML into the page variable

from bs4 import BeautifulSoup
# import the BeautifulSoup library so we can parse HTML and XML documents

soup = BeautifulSoup(page, "lxml")
# parse the HTML from our URL into the BeautifulSoup parse tree format

soup.title #this shows that there is only one table

<title>List of London boroughs - Wikipedia</title>

In [102]:
right_table=soup.find('table', class_='wikitable sortable') #this specifies the table we want to scrape

In [103]:
#this code loops through the rows in the table on Wikipedia to populate it
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]
G=[]
H=[]
I=[]
J=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==10:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))
        D.append(cells[3].find(text=True))
        E.append(cells[4].find(text=True))
        F.append(cells[5].find(text=True))
        G.append(cells[6].find(text=True))
        H.append(cells[7].find(text=True))
        I.append(cells[8].find(text=True))
        J.append(cells[9].find(text=True))

In [104]:
#transfering the scraped data to a pandas data frame
df=pd.DataFrame(A,columns=['Borough'])
df['Inner']=B
df['Status']=C
df['Local authority']=D
df['Political control']=E
df['HQ']=F
df["Area (sq m)"]=G
df['Population']=H
df['Co-oordinates']=I
df['# on map']=J

In [105]:
df.head #this is what the raw scraped data looks like

<bound method NDFrame.head of                    Borough     Inner Status  ... Population Co-oordinates # on map
0     Barking and Dagenham        \n     \n  ...  194,352\n    51°33′39″N     25\n
1                   Barnet        \n     \n  ...  369,088\n    51°37′31″N     31\n
2                   Bexley        \n     \n  ...  236,687\n    51°27′18″N     23\n
3                    Brent        \n     \n  ...  317,264\n    51°33′32″N     12\n
4                  Bromley        \n     \n  ...  317,899\n    51°24′14″N     20\n
5                   Camden         Y     \n  ...  229,719\n    51°31′44″N     11\n
6                  Croydon        \n     \n  ...  372,752\n    51°22′17″N     19\n
7                   Ealing        \n     \n  ...  342,494\n    51°30′47″N     13\n
8                  Enfield        \n     \n  ...  320,524\n    51°39′14″N     30\n
9                Greenwich         Y  Royal  ...  264,008\n    51°29′21″N     22\n
10                 Hackney         Y     \n  ...  257,379

In [106]:
#removing of  "\n" at the end of some of the row items
df["Area (sq m)"] = df["Area (sq m)"].str.replace(r'\n', '')
df['Population'] = df['Population'].str.replace(r'\n', '')
df['Co-oordinates'] = df['Co-oordinates'].str.replace(r'\n', '')
df['# on map'] = df['# on map'].str.replace(r'\n', '')

df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,HQ,Area (sq m),Population,Co-oordinates,# on map
0,Barking and Dagenham,\n,\n,Barking and Dagenham London Borough Council,Labour,Town Hall,13.93,194352,51°33′39″N,25
1,Barnet,\n,\n,Barnet London Borough Council,Conservative,Barnet House,33.49,369088,51°37′31″N,31
2,Bexley,\n,\n,Bexley London Borough Council,Conservative,Civic Offices,23.38,236687,51°27′18″N,23
3,Brent,\n,\n,Brent London Borough Council,Labour,Brent Civic Centre,16.7,317264,51°33′32″N,12
4,Bromley,\n,\n,Bromley London Borough Council,Conservative,Civic Centre,57.97,317899,51°24′14″N,20


In [107]:
from google.colab import files #save and import is as a csv
uploaded = files.upload()

Saving London_data.csv to London_data (5).csv


In [108]:
import io
LD = pd.read_csv(io.BytesIO(uploaded['London_data.csv']))

In [109]:
#cleaning up the borough names and defining the boroughs that are in inner London
LD = LD.drop(['Status','Local authority','Political control','Headquarters','Nr. in map'], axis=1)
LD['Inner'].replace(np.nan,'0', inplace=True)
LD['Borough'].replace('Barking and Dagenham [note 1]','Barking and Dagenham', inplace=True)
LD['Borough'].replace('Greenwich [note 2]','Greenwich', inplace=True)
LD['Borough'].replace('Hammersmith and Fulham [note 4]','Hammersmith and Fulham', inplace=True)
Inn = ['Camden','Greenwich','Hackney','Hammersmith and Fulham','Islington','Kensington and Chelsea','Lewisham','Lambeth','Southwark','Tower Hamlets','Wandsworth','Westminster']
LD

Unnamed: 0,Borough,Inner,Area (sq mi),Population (2013 est)[1],Co-ordinates
0,Barking and Dagenham,0,13.93,194352,51.5541°N 0.1340°E
1,Barnet,0,33.49,369088,51.6050°N 0.2076°W
2,Bexley,0,23.38,236687,51.4519°N 0.1172°E
3,Brent,0,16.7,317264,51.5673°N 0.2711°W
4,Bromley,0,57.97,317899,51.5673°N 0.2711°W
5,Camden,1,8.4,229719,51.5290°N 0.1255°W
6,Croydon,0,33.41,372752,51.3827°N 0.0985°W
7,Ealing,0,21.44,342494,51.5250°N 0.3414°W
8,Enfield,0,31.74,320524,51.6623°N 0.1181°W
9,Greenwich,1,18.28,264008,51.4892°N 0.0648°E


In [110]:
#adding separate latitude and longitude columns
geolocator = Nominatim(user_agent="London_explorer")
LD['Co-ordinates']= LD['Borough'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
LD[['Latitude', 'Longitude']] = LD['Co-ordinates'].apply(pd.Series)
LD

Unnamed: 0,Borough,Inner,Area (sq mi),Population (2013 est)[1],Co-ordinates,Latitude,Longitude
0,Barking and Dagenham,0,13.93,194352,"(51.5541171, 0.15050434261994267)",51.554117,0.150504
1,Barnet,0,33.49,369088,"(51.65309, -0.2002261)",51.65309,-0.200226
2,Bexley,0,23.38,236687,"(39.9692378, -82.936864)",39.969238,-82.936864
3,Brent,0,16.7,317264,"(32.9373463, -87.1647184)",32.937346,-87.164718
4,Bromley,0,57.97,317899,"(51.4028046, 0.0148142)",51.402805,0.014814
5,Camden,1,8.4,229719,"(39.9448402, -75.1198911)",39.94484,-75.119891
6,Croydon,0,33.41,372752,"(51.3713049, -0.101957)",51.371305,-0.101957
7,Ealing,0,21.44,342494,"(51.5126553, -0.3051952)",51.512655,-0.305195
8,Enfield,0,31.74,320524,"(51.6520851, -0.0810175)",51.652085,-0.081018
9,Greenwich,1,18.28,264008,"(52.0367323, 1.168934)",52.036732,1.168934


In [111]:
#renaming the columns to make it more readable
LD = LD.rename(columns={"Area (sq mi)": 'Area', "Population (2013 est)[1]":'Population'})
LD.head()

Unnamed: 0,Borough,Inner,Area,Population,Co-ordinates,Latitude,Longitude
0,Barking and Dagenham,0,13.93,194352,"(51.5541171, 0.15050434261994267)",51.554117,0.150504
1,Barnet,0,33.49,369088,"(51.65309, -0.2002261)",51.65309,-0.200226
2,Bexley,0,23.38,236687,"(39.9692378, -82.936864)",39.969238,-82.936864
3,Brent,0,16.7,317264,"(32.9373463, -87.1647184)",32.937346,-87.164718
4,Bromley,0,57.97,317899,"(51.4028046, 0.0148142)",51.402805,0.014814


In [112]:
from google.colab import files
uploaded = files.upload()

Saving London_rent.csv to London_rent (5).csv


In [113]:
#importing the average rent for each borough
import io
borough_rent = pd.read_csv(io.BytesIO(uploaded['London_rent.csv'])) #importing the average rent per borough as a dataframe
borough_rent.head()

average_rent = borough_rent["Average Rent"]

LD_rent = pd.concat([LD,average_rent], axis = 1)
LD_rent

Unnamed: 0,Borough,Inner,Area,Population,Co-ordinates,Latitude,Longitude,Average Rent
0,Barking and Dagenham,0,13.93,194352,"(51.5541171, 0.15050434261994267)",51.554117,0.150504,1192
1,Barnet,0,33.49,369088,"(51.65309, -0.2002261)",51.65309,-0.200226,1548
2,Bexley,0,23.38,236687,"(39.9692378, -82.936864)",39.969238,-82.936864,1084
3,Brent,0,16.7,317264,"(32.9373463, -87.1647184)",32.937346,-87.164718,1578
4,Bromley,0,57.97,317899,"(51.4028046, 0.0148142)",51.402805,0.014814,1318
5,Camden,1,8.4,229719,"(39.9448402, -75.1198911)",39.94484,-75.119891,2427
6,Croydon,0,33.41,372752,"(51.3713049, -0.101957)",51.371305,-0.101957,1112
7,Ealing,0,21.44,342494,"(51.5126553, -0.3051952)",51.512655,-0.305195,1484
8,Enfield,0,31.74,320524,"(51.6520851, -0.0810175)",51.652085,-0.081018,1325
9,Greenwich,1,18.28,264008,"(52.0367323, 1.168934)",52.036732,1.168934,1380


In [114]:
LD_rent.shape

(32, 8)

Now I'm going to remove the boroughs that are in the inner part of the city, so as to exclude any borough where the rent will be too high for the entrepreneurs to pay.

In [115]:
LDR = LD_rent[LD_rent.Inner != 1]

In [116]:
LDR.shape

(20, 8)

You can see that the 12 boroughs that were in the inner city were removed to leave only the outer boroughs of London for the analysis

In [118]:
LDR.dtypes #checking the type of data in the columns

Borough          object
Inner            object
Area            float64
Population       object
Co-ordinates     object
Latitude        float64
Average Rent      int64
Longitude       float64
dtype: object

In [119]:
LDR = LDR.drop(['Inner'], axis=1) #dropping inner as we don't need it anymore
LDR = LDR.replace(',','', regex=True) #removing the comma from the population to convert it to float
LDR["Population"] = pd.to_numeric(LDR["Population"], downcast="float") #converting population to float
LDR["Average Rent"] = pd.to_numeric(LDR["Average Rent"], downcast="float") #converting rent to float
LDR.dtypes

Borough          object
Area            float64
Population      float32
Co-ordinates     object
Latitude        float64
Average Rent    float32
Longitude       float64
dtype: object

In [120]:
LDR = LDR.rename(columns={"Average Rent": 'Average_Rent'})
LDR_fin = LDR #final data set to be used for the analysis

Now that the data is all cleaned up, the analysis can be performed.

<h1>Battle of the Neighbourhoods - Methodology and Data Analysis</h1>

First we use the geopy library to get the coordinates for London to create a map and visualise the data

In [121]:
address = 'London'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [122]:
# create map of Toronto using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(LDR['Latitude'], LDR['Longitude'], LDR['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='Red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

Now that we have the boroughs on a map, its time to get the venues from Foursquare. Due to a the big area boroughs cover, a rather big search radius will be used.

In [123]:
#access foursquare
CLIENT_ID = 'WISPN3AD52EE5IOI2WPTMM3COFCSWD2GO1UQ5D1MG43AFEZT' # your Foursquare ID
CLIENT_SECRET = 'PAFTWTG4BTYCG0NPTRNESLEIX3ONBQ3ICC0FR5DJR51LRFTD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WISPN3AD52EE5IOI2WPTMM3COFCSWD2GO1UQ5D1MG43AFEZT
CLIENT_SECRET:PAFTWTG4BTYCG0NPTRNESLEIX3ONBQ3ICC0FR5DJR51LRFTD


In [124]:
radius = 5000
LIMIT = 100

def getVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
    
    return(nearby_venues)

In [126]:
Brgh_Venues = getVenues(names=LDR_fin['Borough'],
                        latitudes=LDR_fin['Latitude'],
                        longitudes=LDR_fin['Longitude'])

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Croydon
Ealing
Enfield
Haringey
Harrow
Havering
Hillingdon
Hounslow
Kingston upon Thames
Merton
Newham
Redbridge
Richmond upon Thames
Sutton
Waltham Forest


Lets count the venues in every borough, to get the data scale

In [127]:
Brgh_Venues.groupby('Borough').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,98,98,98,98,98,98
Barnet,100,100,100,100,100,100
Bexley,83,83,83,83,83,83
Brent,16,16,16,16,16,16
Bromley,100,100,100,100,100,100
Croydon,100,100,100,100,100,100
Ealing,100,100,100,100,100,100
Enfield,100,100,100,100,100,100
Haringey,100,100,100,100,100,100
Harrow,100,100,100,100,100,100


In [128]:
London_Brgh_onehot = pd.get_dummies(Brgh_Venues[['Venue_Category']], prefix="", prefix_sep="")
mid =  Brgh_Venues['Borough']

London_Brgh_onehot.insert(0, 'Borough', mid)

London_Brgh_onehot.head()

Unnamed: 0,Borough,ATM,African Restaurant,Airfield,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Canal Lock,Caribbean Restaurant,Cave,Chinese Restaurant,Chocolate Shop,...,Salad Place,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Smoke Shop,Soccer Field,Soccer Stadium,South Indian Restaurant,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [129]:
Brgh_grouped = London_Brgh_onehot.groupby('Borough').mean().reset_index()
Brgh_grouped

Unnamed: 0,Borough,ATM,African Restaurant,Airfield,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Canal Lock,Caribbean Restaurant,Cave,Chinese Restaurant,Chocolate Shop,...,Salad Place,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Smoke Shop,Soccer Field,Soccer Stadium,South Indian Restaurant,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.020408,0.0,0.05102,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,...,0.0,0.010204,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102041,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.010204,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.024096,0.012048,0.012048,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.024096,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.036145,0.0,...,0.0,0.024096,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.012048,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.024096,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0
3,Brent,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.0,...,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
6,Ealing,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0
7,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Haringey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.11,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.04,0.07,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,Harrow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.01,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [130]:
num_top_venues = 5

for brgh in Brgh_grouped['Borough']:
    print("_________"+brgh+"________")
    temp = Brgh_grouped[Brgh_grouped['Borough'] == brgh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

_________Barking and Dagenham________
           venue  freq
0    Supermarket  0.10
1           Park  0.09
2  Grocery Store  0.07
3    Coffee Shop  0.06
4            Pub  0.06


_________Barnet________
           venue  freq
0            Pub  0.14
1    Coffee Shop  0.12
2           Café  0.06
3  Grocery Store  0.05
4           Park  0.05


_________Bexley________
            venue  freq
0     Pizza Place  0.07
1     Coffee Shop  0.06
2  Ice Cream Shop  0.05
3            Park  0.04
4  Discount Store  0.04


_________Brent________
                  venue  freq
0  Fast Food Restaurant  0.25
1        Sandwich Place  0.12
2     Convenience Store  0.12
3           Gas Station  0.06
4                Lawyer  0.06


_________Bromley________
           venue  freq
0            Pub  0.12
1  Grocery Store  0.08
2    Coffee Shop  0.08
3           Park  0.06
4    Pizza Place  0.05


_________Croydon________
           venue  freq
0            Pub  0.11
1           Park  0.09
2    Coffee Shop  0.07
3

Let's put this into a dataframe

In [131]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brgh_venues_sorted = pd.DataFrame(columns=columns)
brgh_venues_sorted['Borough'] = Brgh_grouped['Borough']

for ind in np.arange(Brgh_grouped.shape[0]):
    brgh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Brgh_grouped.iloc[ind, :], num_top_venues)

brgh_venues_sorted.head(8)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,Supermarket,Park,Grocery Store,Pub,Coffee Shop
1,Barnet,Pub,Coffee Shop,Café,Park,Grocery Store
2,Bexley,Pizza Place,Coffee Shop,Ice Cream Shop,Chinese Restaurant,Discount Store
3,Brent,Fast Food Restaurant,Convenience Store,Sandwich Place,Pizza Place,Lawyer
4,Bromley,Pub,Coffee Shop,Grocery Store,Park,Gym / Fitness Center
5,Croydon,Pub,Park,Coffee Shop,Grocery Store,Pizza Place
6,Ealing,Pub,Park,Coffee Shop,Botanical Garden,Pizza Place
7,Enfield,Coffee Shop,Turkish Restaurant,Pub,Supermarket,Park


Because we only have a small number of boroughs (20) a smaller number of clusters will be used, 5

In [132]:
kclusters = 5
brgh_grouped_clustering = Brgh_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brgh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 0, 4, 2, 4, 4, 0, 4, 0, 4], dtype=int32)

Now let's merge the dataframes

In [133]:
# add clustering labels
brgh_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

# merge brgh_grouped with LDR to add latitude/longitude for each neighborhood
Borough_merged = pd.merge(LDR_fin,brgh_venues_sorted, on='Borough')
Borough_merged

Unnamed: 0,Borough,Area,Population,Co-ordinates,Latitude,Average_Rent,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,13.93,194352.0,"(51.5541171, 0.15050434261994267)",51.554117,1192.0,0.150504,4,Supermarket,Park,Grocery Store,Pub,Coffee Shop
1,Barnet,33.49,369088.0,"(51.65309, -0.2002261)",51.65309,1548.0,-0.200226,0,Pub,Coffee Shop,Café,Park,Grocery Store
2,Bexley,23.38,236687.0,"(39.9692378, -82.936864)",39.969238,1084.0,-82.936864,4,Pizza Place,Coffee Shop,Ice Cream Shop,Chinese Restaurant,Discount Store
3,Brent,16.7,317264.0,"(32.9373463, -87.1647184)",32.937346,1578.0,-87.164718,2,Fast Food Restaurant,Convenience Store,Sandwich Place,Pizza Place,Lawyer
4,Bromley,57.97,317899.0,"(51.4028046, 0.0148142)",51.402805,1318.0,0.014814,4,Pub,Coffee Shop,Grocery Store,Park,Gym / Fitness Center
5,Croydon,33.41,372752.0,"(51.3713049, -0.101957)",51.371305,1112.0,-0.101957,4,Pub,Park,Coffee Shop,Grocery Store,Pizza Place
6,Ealing,21.44,342494.0,"(51.5126553, -0.3051952)",51.512655,1484.0,-0.305195,0,Pub,Park,Coffee Shop,Botanical Garden,Pizza Place
7,Enfield,31.74,320524.0,"(51.6520851, -0.0810175)",51.652085,1325.0,-0.081018,4,Coffee Shop,Turkish Restaurant,Pub,Supermarket,Park
8,Haringey,11.42,263386.0,"(51.587929849999995, -0.10541010599099046)",51.58793,1513.0,-0.10541,0,Café,Pub,Coffee Shop,Turkish Restaurant,Park
9,Harrow,19.49,243372.0,"(51.596827149999996, -0.33731605402671094)",51.596827,1396.0,-0.337316,4,Indian Restaurant,Coffee Shop,Pub,Supermarket,Café


And now let's map the clusters

In [134]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, rent, pop in zip(Borough_merged['Latitude'],
                                  Borough_merged['Longitude'],
                                  Borough_merged['Borough'],
                                  Borough_merged['Cluster Label'],
                                  Borough_merged['Average_Rent'],
                                  Borough_merged['Population']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + " " + "Rent " + str(rent) + " " + "Population " + str(pop), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=25,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

After seeing the clusters, it is worth looking into each one with a bit more detail, to see, which one is better suited for opening a venue.

In [135]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 0, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Barnet,33.49,369088.0,51.65309,-0.200226,0,Pub,Coffee Shop,Café,Park,Grocery Store
6,Ealing,21.44,342494.0,51.512655,-0.305195,0,Pub,Park,Coffee Shop,Botanical Garden,Pizza Place
8,Haringey,11.42,263386.0,51.58793,-0.10541,0,Café,Pub,Coffee Shop,Turkish Restaurant,Park
12,Hounslow,21.61,262407.0,51.468613,-0.361347,0,Pub,Park,Garden,Coffee Shop,Supermarket
13,Kingston upon Thames,14.38,166793.0,51.409627,-0.306262,0,Café,Pub,Park,Garden,Gastropub
14,Merton,14.52,203223.0,51.41087,-0.188097,0,Pub,Park,Coffee Shop,Café,Bar
15,Newham,13.98,318227.0,51.53,0.029318,0,Park,Pub,Café,Bar,Restaurant
16,Redbridge,21.78,288272.0,51.57632,0.04541,0,Pub,Park,Coffee Shop,Restaurant,Italian Restaurant
17,Richmond upon Thames,22.17,191365.0,51.440553,-0.307639,0,Pub,Park,Garden,Café,Coffee Shop


A very British picture: Pubs and Cafés as a most popular venues, and for people who don't facny a visit to either they can enjoy the nature. This cluster is not of much interest, as it will is already quite populated with Pubs and Cafés, so plently to do already for people here.

In [136]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 1, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Havering,43.35,242080.0,51.004361,-2.337475,1,Airfield,IT Services,Electronics Store,Food & Drink Shop,Yoga Studio


The population here is relatively smaller comaped to some of the other boroughs and it looks like the main point of interest is the airfield. The area is also quite large. It looks like it is not the most popular place for people to come to to enjoy a beer or a coffee.

In [137]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 2, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Brent,16.7,317264.0,32.937346,-87.164718,2,Fast Food Restaurant,Convenience Store,Sandwich Place,Pizza Place,Lawyer


This borough might be interesting as there is no pub or cafe yet but clearly some other food places. If the area is upcoming it might be interesting to open a local pub here.

In [138]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 3, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
18,Sutton,16.93,195914.0,30.567295,-100.643236,3,Convenience Store,Hotel,Pizza Place,Steakhouse,Mexican Restaurant


A great picture! There is no pub yet but there is a hotel and clearly an interest in food venues. With a good area and population of the borough, Suttong might be a good choice for a new venue of a similar sort

In [139]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 4, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,13.93,194352.0,51.554117,0.150504,4,Supermarket,Park,Grocery Store,Pub,Coffee Shop
2,Bexley,23.38,236687.0,39.969238,-82.936864,4,Pizza Place,Coffee Shop,Ice Cream Shop,Chinese Restaurant,Discount Store
4,Bromley,57.97,317899.0,51.402805,0.014814,4,Pub,Coffee Shop,Grocery Store,Park,Gym / Fitness Center
5,Croydon,33.41,372752.0,51.371305,-0.101957,4,Pub,Park,Coffee Shop,Grocery Store,Pizza Place
7,Enfield,31.74,320524.0,51.652085,-0.081018,4,Coffee Shop,Turkish Restaurant,Pub,Supermarket,Park
9,Harrow,19.49,243372.0,51.596827,-0.337316,4,Indian Restaurant,Coffee Shop,Pub,Supermarket,Café
11,Hillingdon,44.67,286806.0,51.542519,-0.448335,4,Pub,Supermarket,Coffee Shop,Indian Restaurant,Bar
19,Waltham Forest,14.99,265797.0,42.37564,-71.2358,4,Italian Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Ice Cream Shop


From the last cluster there are also a few boroughs that might be of interest; Bromley, Croydon and Waltham Forest. The first two already have a Pub but they are closely followed by a Coffe Shop and a Park respectively and some restaurants, so this might be interesting!
Waltham Forest is probably one of the most interesting as it has lots of restaurants but no pub ter and a sizeable population.

<h1>Battle of the Neighbourhood - Results & Discussion</h1>

###Results
The results of the above analysis and clustering can be summarized as follows:

1. The most popular social venues, ouside of Inner London boroughs are Pubs and Coffee shops
2. Northern boroughs are more prone to visiting pubs, whereas southern boroughs are most likely to shop and have the social life from home
3. Within top 5 places of interest in every borough is an ethnic restaurant
4. Rent price is not so much a factor for going out - the demand is not affected by difference in costs.

###Discussion

Looking at the data Waltham Forest, Bromley, Enfield and Sutton are the best places outside of Central London where a new venue is worth opening. However, a lot of information is not taken into account, and cannot be obtained from Foursquare Developer:

1. Bromley and Enfield's rent is slightly higher (approx £200 per month) so this will need to be taken into account. However, demand here might be higher and therefore it might be worth the extra cost. This can only be determined by visiting the boroughs.
2. Higher ethnic presence in a given borough can and will influence the popularity of a given cuisine.
3. Closer proximity to Inner boroughs and better transport links allows people to travel to the neighbouring borough and impact the measurements
4. Many small venues are not registered in Foursquare and are marketed via word-of-mouth, and are not taken into account

Regardless, the analysis provided an insight into what people like and opt for, when it comes to going out in their own neighbourhoods.

#Battle of the Neighbourhoods - Conclusion

###Conclusion

Finally to conclude this project, it has been great to have a go at solving a real-life problem using available data and find a solution for. The problem at hand here was to open a venue in London. To achieve this some frequently used python libraries were used to clean and analyse the data, Foursquare API was used to explore information of each borough and the results were mapped to visualise them as well.
