# Simplifying your house hunting in London

<hr>

## Introduction

House hunting in London is one of the most stressful things you can do in the UK.<br>
Where you live in London will have a huge impact on the amount of money you need to spend on purchasing or renting a property.  For instance, properties in the most prestigious addresses, such as central London, Kensington, Chelsea or the Docklands can’t be found for much less than six figures (purchase price).  However, if you move to some of the outer areas of London, then property prices become much more affordable.<br>
Wherever you choose to live in London you’ll be well serviced by the excellent public transport system. The underground train system, known as ‘The Tube’ covers the whole of the city.  And although it can get crowded during the rush to and from work in morning and evening, in general it’s pleasant, safe and efficient to use.<br>
This means that it’s perfectly feasible to choose to live in the outer reaches of London – even in the open spaces of leafy suburbs such as Richmond, Kew, Blackheath or Harrow – and still be able to commute into central London easily and quickly.<br>

Websites like Zoopla and Rightmove help you out searching your house but narrowing down the area to look for is a manual task and they don't show the attractions nearby or the neighborhood distinct areas.<br>
To search for the house according to your preferences about the neighborhood, you'll need to browse the internet and find informations by yourself. It is a consuming and tiring process.<br>

To help with this process, this project was made to people who are searching for a new house to identify or narrow down the search for the house according to the preferences about the neighborhood.
The focus is to provide relevant and quality information to be used for a decision when choosing the best residence to live in.<br>

<i>*The project works specifically with the Lewisham council in London to simplify and give the overview of the project. The full coverage of London can be applied in a future project. 

## Data

Two datasets are used for the project:

<b>1 - London Lewisham council postcodes </b>

The data that contains all postcodes present in the Lewisham council in London is downloaded from <a href="https://www.doogal.co.uk/AdministrativeAreas.php?district=E09000023">doogal.co.uk</a>. In this dataset, the mainly focus is the Lee Green ward to minimize the calls to Foursquare API.

Lewisham council postcodes data has many columns out of which only Postcode, Longitude, Latitude and Ward of the postcode are of our interest. Snapshot of the data is shown below:

In [3]:
import pandas as pd
link = 'https://www.doogal.co.uk/AdministrativeAreasCSV.ashx?district=E09000023' 
Lewisham_Data = pd.read_csv(link)
Lewisham_Data.head()

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,Ward,Parish,Introduced,Terminated,Altitude,Country,Last Updated,Quality
0,BR1 4BY,Yes,51.417289,-0.001741,539050,170591,TQ390705,Downham,"Lewisham, unparished area",1980-01-01,,35,England,2018-11-15,Within the building of the matched address clo...
1,BR1 4DN,Yes,51.418996,-0.002156,539016,170780,TQ390707,Downham,"Lewisham, unparished area",1980-01-01,,35,England,2018-11-15,Within the building of the matched address clo...
2,BR1 4EY,Yes,51.418477,0.005042,539518,170736,TQ395707,Downham,"Lewisham, unparished area",1980-01-01,,50,England,2018-11-15,Within the building of the matched address clo...
3,BR1 4FD,Yes,51.421083,-0.002194,539007,171012,TQ390710,Downham,"Lewisham, unparished area",2010-01-01,,33,England,2018-11-15,Within the building of the matched address clo...
4,BR1 4JG,Yes,51.419403,-0.000728,539114,170828,TQ391708,Downham,"Lewisham, unparished area",1980-01-01,,40,England,2018-11-15,Within the building of the matched address clo...


<b>2 - Foursquare location data</b> 

The Foursquare API enables developers to build applications that interact with the Foursquare platform.
Venues data will be used in collaboration with the above dataset.

FourSquare API has a rate limit for explore api of 99.500 regular calls/day when using the Personal Account, so that's the reason why to focus on only Lee Green part of the Lewisham.

## Methodology

The methodology is divided into three main parts, data wrangling/cleansing, exploratory analysis and machine learning.

<b>1 - Data Wrangling/Cleansing:</b>

Data obtained from Doogle for Lewisham council are concentrated on few columns only (Postcode, Latitude, Longitude and Ward). To obtain uniqueness on the different location, every postcode in Lewisham is divided into two major parts, District and Sector. A new column which contain Sectors is created and added the new column to the cleansed data. The data for Lee Green ward is filtered to minimize the calls to Four square API.

<b>2 - Exploratory Analysis:</b>

Lewisham data has 18 unique wards. It is filtered on a single ward as a point of interest namely Lee Green. As explained in the cleansing step that postcode is comprised of two components, district and sector. 

All the sectors are plotted on the map and then five different clusters based on the venue categories provided by Foursquare API data are created and then, the top five venues in each sector can be explored.

Since the objective is to ease the search for people who have preferences in terms of nearby venues, the project tries to explore all the sectors under Lee Green ward using Foursquare API and obtain multiple venues names, their geographical coordinates and category for each sector within 500m radius and put a limit on the number of venues as 100 for each sector.


<b>3 - Machine Learning:</b>

K-means Clustering is used for the project after the exploratory analysis to categorise postcodes into different homogenous clusters based on the venue categories. It will segment the data into groups where groups are similar among themselves but different from other groups in terms of occurence of venue categories. 
For the dataset, K-means Clustering will append another column to the dataset which will depict the cluster number, similar sectors will be grouped together.

## Results

The K-means Clustering obtained 5 clusters, as seen below: 

<img src="1.PNG" alt="Lee Green and all the sectors extracted.">

<img src="2.PNG" alt="Lee Green and the 5 clusters from K-means Clustering.">

## Discussion

### Analysing the clusters, we can see that there are 5 very different kinds of areas in the Lee Green.

  * Cluster 0. Contains houses with proximity to Platforms, Train Stations, Café, Grocery Store, Gym/Fitness Center, etc.
  * Cluster 1. It contains houses with proximity to  Pubs, Asian Restaurants, Supermarket, Indian Restaurant, etc
  * Cluster 2. The cluster hilights all houses in the proximity to Café, Pub, Grocery Store, Coffee Shop, Middle Eastern Restaurant, Fish & Chips Shop etc.
  * Cluster 3. It contains houses with proximity to Fast Food Restaurants, Veterinarian, Hotels, etc.
  * Cluster 4. It contains houses with proximity to Rental Car Location, Parks, Shopping Plaza, etc.

The above clusters segment the data into 5 homogeneous groups which exhibit similarities among themselves in terms of venue categories are being heterogenous from the other groups. For example, If an individual is interested in staying near to places with proximity to pubs and cafes, Cluster1 is a great choice. If the main interest to someone is to have easy access to platforms and train stations, sectors under Cluster 0 are the most suitable.

## Conclusion

As demonstrated in the Result and Discussion section, by clustering the area based on the venue categories and filtering the cluster to narrow down the search area can help a lot to simplify the house hunting process. <br>
Customers can then prefer to focus their house search by analysing various trade offs. Those who are confused can efficiently compare various areas and choose the one which is most suitable for them.