# Capstone Data Science Project - The Battle of Neighborhoods (Both the weeks combined)

## Description

### Discussion of the Background

As indicated by Bloomberg News, the London Housing Market is stuck. It is presently confronting various headwinds, including the possibility of higher assessments and an admonition from the Bank of England that U.K. home estimations could fall as much as 30 percent in case of a scattered exit from the European Union. All the more explicitly, four neglected splits recommend that the London market might be fit as a fiddle than many acknowledge: concealed value falls, record-low deals, homebuilder departure and assessment climbs tending to abroad purchasers of homes in England and Wales.

### Discussion of the Problem

In this situation, it is dire to receive AI devices so as to help homebuyers customers in London to settle on insightful and powerful choices. Therefore, the business issue we are presently presenting is: how might we offer help to homebuyers demographic in to buy a reasonable land in London in this unsure monetary and money related situation? To take care of this business issue, we are going to group London neighborhoods so as to suggest settings and the present normal cost of land where homebuyers can make a land speculation. We will prescribe beneficial settings as indicated by civilities and fundamental offices encompassing such scenes for example grade schools, secondary schools, emergency clinics and markets.

### Data used

Information on London properties and the relative cost paid information were separated from the HM Land Registry (http://landregistry.data.gov.uk/). The accompanying fields contain the location information remembered for Price Paid Data: Postcode; PAON Primary Addressable Object Name. Commonly the house number or name; SAON Secondary Addressable Object Name. In the event that there is a sub-working, for instance, the structure is separated into pads, there will be a SAON; Street; Locality; Town/City; District; County. To investigate and target prescribed areas across various scenes as indicated by the nearness of conveniences and basic offices, we will get to information through FourSquare API interface and mastermind them as a dataframe for representation. By blending information on London properties and the relative cost paid information from the HM Land Registry and information on courtesies and basic offices encompassing such properties from FourSquare API interface, we will have the option to suggest productive land investments.

### Method Description

1. We will collect the data
2. Explore the data and Try to Understand it.
3. Prepare the data and process the data.
4. Data Modelling

### Step 1: Let's Collect the data

Importing the required libraries and downloading the data 

In [56]:
import pandas as pd
import numpy as np
import datetime as dt
import os 
import json
!conda install -c conda-forge geopy --yes
#Conversion Latitude and Longitude
from geopy.geocoders import Nominatim 
# Used to handle requests
import requests
# Transformation of json to pandas
from pandas.io.json import json_normalize 
# For plotting
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [57]:
# Reading the Data from the source
df_ppd = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv")

### Step 2: Exploring the Data and Trying to Understand it

Displaying the first 5 rows using head

In [58]:
df_ppd.head(5)

Unnamed: 0,{79A74E21-D11E-1289-E053-6B04A8C01627},770000,2018-09-25 00:00,SK7 1AR,D,N,F,5,Unnamed: 8,OAK MEADOW,BRAMHALL,STOCKPORT,STOCKPORT.1,GREATER MANCHESTER,A,A.1
0,{79A74E21-D11F-1289-E053-6B04A8C01627},253500,2018-09-24 00:00,M6 8GQ,D,N,F,1,,RIVINGTON ROAD,,SALFORD,SALFORD,GREATER MANCHESTER,A,A
1,{79A74E21-D120-1289-E053-6B04A8C01627},231950,2018-09-28 00:00,WA3 2UE,D,Y,F,35,,STONEACRE CLOSE,LOWTON,WARRINGTON,WIGAN,GREATER MANCHESTER,A,A
2,{79A74E21-D121-1289-E053-6B04A8C01627},112500,2018-08-29 00:00,OL6 6RJ,S,N,F,102,,THORNFIELD GROVE,,ASHTON-UNDER-LYNE,TAMESIDE,GREATER MANCHESTER,A,A
3,{79A74E21-D122-1289-E053-6B04A8C01627},184995,2018-06-15 00:00,M46 0TW,S,Y,F,37,,THREADNEEDLE PLACE,ATHERTON,MANCHESTER,WIGAN,GREATER MANCHESTER,A,A
4,{79A74E21-D123-1289-E053-6B04A8C01627},214995,2018-09-28 00:00,M28 3XS,D,Y,L,9,,MARPLE GARDENS,WORSLEY,MANCHESTER,SALFORD,GREATER MANCHESTER,A,A


In [59]:
df_ppd.shape

(1029749, 16)

The dataset has 1029749 rows and 16 columns

### Step 3:  Prepare the Data and Process the Data

At this stage, we set up our dataset for the demonstrating procedure, deciding on the most appropriate AI calculation for our extension. As needs be, we play out the accompanying advances: Rename the segment names,Organization the date section, Sort information by date of offer, Select information just for the city of London, Make a rundown of road names in London, Figure the road insightful normal cost of the property, Peruse the road savvy facilitates into an information outline, disposing of repeating word London from singular names, Join the information to discover the directions of areas which fit into customer's financial plan, Plot suggested areas on London map alongside current market costs.

In [60]:
df_ppd.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime) # Formatting the date coln
df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True) # Clearing off every old exchange which were done before 2016
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True) # Sorting by sale date

In [61]:
df_ppd_london = df_ppd.query("Town_City == 'LONDON'")
streets = df_ppd_london['Street'].unique().tolist() # List of name of the streets in London
df_grp_price = df_ppd_london.groupby(['Street'])['Price'].mean().reset_index()
df_grp_price.columns = ['Street', 'Avg_Price']
df_affordable = df_grp_price.query("(Avg_Price >= 2200000) & (Avg_Price <= 2500000)") # Locations which fits the budget between the two limits

In [62]:
df_affordable # Affordable street with average price

Unnamed: 0,Street,Avg_Price
196,ALBION SQUARE,2.450000e+06
390,ANHALT ROAD,2.435000e+06
405,ANSDELL TERRACE,2.250000e+06
422,APPLEGARTH ROAD,2.400000e+06
855,BARONSMEAD ROAD,2.375000e+06
981,BEAUCLERC ROAD,2.480000e+06
1102,BELVEDERE DRIVE,2.340000e+06
1215,BICKENHALL STREET,2.208500e+06
1253,BIRCHLANDS AVENUE,2.217000e+06
1553,BRAMPTON GROVE,2.456875e+06


In [63]:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [64]:
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")

index: 196
item: Street       ALBION SQUARE
Avg_Price         2.45e+06
Name: 196, dtype: object
item.Street only: ALBION SQUARE
index: 390
item: Street       ANHALT ROAD
Avg_Price      2.435e+06
Name: 390, dtype: object
item.Street only: ANHALT ROAD
index: 405
item: Street       ANSDELL TERRACE
Avg_Price           2.25e+06
Name: 405, dtype: object
item.Street only: ANSDELL TERRACE
index: 422
item: Street       APPLEGARTH ROAD
Avg_Price            2.4e+06
Name: 422, dtype: object
item.Street only: APPLEGARTH ROAD
index: 855
item: Street       BARONSMEAD ROAD
Avg_Price          2.375e+06
Name: 855, dtype: object
item.Street only: BARONSMEAD ROAD
index: 981
item: Street       BEAUCLERC ROAD
Avg_Price          2.48e+06
Name: 981, dtype: object
item.Street only: BEAUCLERC ROAD
index: 1102
item: Street       BELVEDERE DRIVE
Avg_Price           2.34e+06
Name: 1102, dtype: object
item.Street only: BELVEDERE DRIVE
index: 1215
item: Street       BICKENHALL STREET
Avg_Price           2.2085e+06
N

In [65]:
geolocator = Nominatim()

  if __name__ == '__main__':


In [66]:
df_affordable['city_coord'] = df_affordable['Street'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df_affordable

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,Street,Avg_Price,city_coord
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.28939323910353)"
390,ANHALT ROAD,2.435000e+06,"(51.4803164, -0.1668011)"
405,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)"
422,APPLEGARTH ROAD,2.400000e+06,"(53.7486539, -0.3266704)"
855,BARONSMEAD ROAD,2.375000e+06,"(51.4773147, -0.239457)"
981,BEAUCLERC ROAD,2.480000e+06,"(30.2114523, -81.6179807)"
1102,BELVEDERE DRIVE,2.340000e+06,"(38.0728178, -78.4587964)"
1215,BICKENHALL STREET,2.208500e+06,"(51.5212014, -0.1589082)"
1253,BIRCHLANDS AVENUE,2.217000e+06,"(51.4483941, -0.1604676)"
1553,BRAMPTON GROVE,2.456875e+06,"(51.5899607, -0.3185249)"


In [67]:
df_affordable[['Latitude', 'Longitude']] = df_affordable['city_coord'].apply(pd.Series)
df_affordable

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


Unnamed: 0,Street,Avg_Price,city_coord,Latitude,Longitude
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.28939323910353)",-41.273758,173.289393
390,ANHALT ROAD,2.435000e+06,"(51.4803164, -0.1668011)",51.480316,-0.166801
405,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)",51.499890,-0.189103
422,APPLEGARTH ROAD,2.400000e+06,"(53.7486539, -0.3266704)",53.748654,-0.326670
855,BARONSMEAD ROAD,2.375000e+06,"(51.4773147, -0.239457)",51.477315,-0.239457
981,BEAUCLERC ROAD,2.480000e+06,"(30.2114523, -81.6179807)",30.211452,-81.617981
1102,BELVEDERE DRIVE,2.340000e+06,"(38.0728178, -78.4587964)",38.072818,-78.458796
1215,BICKENHALL STREET,2.208500e+06,"(51.5212014, -0.1589082)",51.521201,-0.158908
1253,BIRCHLANDS AVENUE,2.217000e+06,"(51.4483941, -0.1604676)",51.448394,-0.160468
1553,BRAMPTON GROVE,2.456875e+06,"(51.5899607, -0.3185249)",51.589961,-0.318525


In [68]:
df = df_affordable.drop(columns=['city_coord'])
df

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
196,ALBION SQUARE,2.450000e+06,-41.273758,173.289393
390,ANHALT ROAD,2.435000e+06,51.480316,-0.166801
405,ANSDELL TERRACE,2.250000e+06,51.499890,-0.189103
422,APPLEGARTH ROAD,2.400000e+06,53.748654,-0.326670
855,BARONSMEAD ROAD,2.375000e+06,51.477315,-0.239457
981,BEAUCLERC ROAD,2.480000e+06,30.211452,-81.617981
1102,BELVEDERE DRIVE,2.340000e+06,38.072818,-78.458796
1215,BICKENHALL STREET,2.208500e+06,51.521201,-0.158908
1253,BIRCHLANDS AVENUE,2.217000e+06,51.448394,-0.160468
1553,BRAMPTON GROVE,2.456875e+06,51.589961,-0.318525


In [69]:
address = 'London, UK'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinate of London : {}, {}.'.format(latitude, longitude))

  from ipykernel import kernelapp as app


The coordinate of London : 51.5073219, -0.1276474.


In [70]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=11) # Map creation
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [71]:
CLIENT_ID = 'QPCEJCW55ZLH4J3YHDFODAIGJAOZUHLGYPK0ORFA30T1RUA1' # Foursquare Client ID
CLIENT_SECRET = 'FVZUDXUVPDPXO2Q5K1F3LJP003YWLAZWHLORRNKVB0HV2GXP' # Foursquare Secret code
VERSION = '20181206'

### Step 4:  Data Modeling

We will dissect neighborhoods to suggest genuine bequests where home purchasers can make a land speculation. We will at that point prescribe productive settings as per pleasantries and fundamental offices encompassing such scenes for example primary schools, secondary schools, medical clinics and basic food item stores.After investigating the dataset and picking up bits of knowledge into it, we are prepared to utilize the bunching philosophy to dissect genuine bequests. We will utilize the k-implies grouping procedure as it is quick and proficient as far as computational expense, is exceptionally adaptable to represent transformations in land showcase in London and is precise.

In [72]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant info w.r.t venues
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [73]:
# Displaying new dataframe : location_venues
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

ALBION SQUARE
ANHALT ROAD
ANSDELL TERRACE
APPLEGARTH ROAD
BARONSMEAD ROAD
BEAUCLERC ROAD
BELVEDERE DRIVE
BICKENHALL STREET
BIRCHLANDS AVENUE
BRAMPTON GROVE
BRIARDALE GARDENS
BROOKWAY
BURBAGE ROAD
BURY WALK
CALLCOTT STREET
CAMPDEN HILL ROAD
CAMPION ROAD
CANNING PLACE
CARLISLE ROAD
CARLTON GARDENS
CARLYLE COURT
CHALCOT SQUARE
CHARLES LANE
CHELSEA CRESCENT
CHESTER CLOSE NORTH
CHEYNE COURT
CHEYNE ROW
CHISWICK MALL
CITY ROAD
CLARENDON STREET
CLONCURRY STREET
COLBECK MEWS
COLLEGE CRESCENT
CORNWALL TERRACE MEWS
COURT LANE GARDENS
CRESCENT GROVE
DALEBURY ROAD
DEWHURST ROAD
DORIA ROAD
DOWNSHIRE HILL
DUCHESS WALK
ECCLESTON SQUARE MEWS
EGBERT STREET
EGERTON PLACE
ELM PARK ROAD
FLORAL STREET
FRANK DIXON WAY
FULTON MEWS
GERARD ROAD
GERRARD ROAD
GIRDLERS ROAD
GLOUCESTER CRESCENT
GORDON PLACE
GRAFTON SQUARE
GRAHAM TERRACE
HARMAN DRIVE
HARRIS STREET
HAVANNAH STREET
HAZLEWELL ROAD
HEREFORD MEWS
HERONDALE AVENUE
HIGHGATE HIGH STREET
HIGHWOOD HILL
HILLGATE PLACE
HOLLYCROFT AVENUE
HOLLYWOOD MEWS
HONEYWELL

In [74]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ALBION SQUARE,-41.273758,173.289393,The Free House,-41.273340,173.287364,Bar
1,ALBION SQUARE,-41.273758,173.289393,The Indian Cafe,-41.273308,173.286530,Indian Restaurant
2,ALBION SQUARE,-41.273758,173.289393,Queen's Gardens,-41.273671,173.291383,Park
3,ALBION SQUARE,-41.273758,173.289393,Urban,-41.274355,173.286317,New American Restaurant
4,ALBION SQUARE,-41.273758,173.289393,Fish Stop,-41.276010,173.289592,Fish & Chips Shop
5,ALBION SQUARE,-41.273758,173.289393,Deville Cafe,-41.271941,173.285535,Beer Garden
6,ALBION SQUARE,-41.273758,173.289393,Fresh Choice,-41.272194,173.287218,Supermarket
7,ALBION SQUARE,-41.273758,173.289393,The Bridge Street Collective,-41.272520,173.285517,Café
8,ALBION SQUARE,-41.273758,173.289393,Mango,-41.274460,173.285345,Indian Restaurant
9,ALBION SQUARE,-41.273758,173.289393,Hopgood's,-41.274749,173.283831,Restaurant


In [75]:
location_venues.groupby('Street').count() # Count

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ALBION SQUARE,28,28,28,28,28,28
ANHALT ROAD,17,17,17,17,17,17
ANSDELL TERRACE,43,43,43,43,43,43
APPLEGARTH ROAD,4,4,4,4,4,4
BARONSMEAD ROAD,13,13,13,13,13,13
BEAUCLERC ROAD,5,5,5,5,5,5
BELVEDERE DRIVE,3,3,3,3,3,3
BICKENHALL STREET,59,59,59,59,59,59
BIRCHLANDS AVENUE,11,11,11,11,11,11
BRAMPTON GROVE,1,1,1,1,1,1


In [76]:
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique()))) 
# Extracting the number of unique categories.

There are 337 uniques categories.


In [77]:
location_venues.shape

(4365, 7)

In [78]:
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="") # one hot encoding
venues_onehot['Street'] = location_venues['Street'] # adding the coln named street back to dataframe
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1]) # Assign street coln as 1st coln
venues_onehot = venues_onehot[fixed_columns]
venues_onehot.head()

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Volleyball Court,Warehouse Store,Waterfront,Weight Loss Center,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [79]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Volleyball Court,Warehouse Store,Waterfront,Weight Loss Center,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,ALBION SQUARE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.035714,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
1,ANHALT ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.058824,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
2,ANSDELL TERRACE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.023256,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
3,APPLEGARTH ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
4,BARONSMEAD ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
5,BEAUCLERC ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
6,BELVEDERE DRIVE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
7,BICKENHALL STREET,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.016949,0.0,0.016949,0.016949,0.00000
8,BIRCHLANDS AVENUE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000
9,BRAMPTON GROVE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.0,0.00,0.000000,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.00000


In [80]:
london_grouped.shape

(149, 338)

In [81]:
# Top 5 landmarks/venues where there are profitable real estate investments
num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALBION SQUARE----
               venue  freq
0               Café  0.21
1                Pub  0.07
2         Restaurant  0.07
3  Indian Restaurant  0.07
4                Bar  0.07


----ANHALT ROAD----
                  venue  freq
0                   Pub  0.24
1         Grocery Store  0.12
2  Gym / Fitness Center  0.06
3   Japanese Restaurant  0.06
4                 Plaza  0.06


----ANSDELL TERRACE----
                venue  freq
0          Restaurant  0.09
1           Juice Bar  0.07
2  Italian Restaurant  0.07
3               Hotel  0.07
4      Clothing Store  0.07


----APPLEGARTH ROAD----
          venue  freq
0           Pub  0.25
1     Nightclub  0.25
2        Casino  0.25
3           Bar  0.25
4  Noodle House  0.00


----BARONSMEAD ROAD----
               venue  freq
0    Nature Preserve  0.08
1  Food & Drink Shop  0.08
2      Movie Theater  0.08
3        Coffee Shop  0.08
4               Park  0.08


----BEAUCLERC ROAD----
                 venue  freq
0                  S

In [82]:
# Returning the most common landmarks/venues nearby real estate investments
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10
indicators = ['st', 'nd', 'rd']
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [83]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

In [84]:
venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALBION SQUARE,Café,Coffee Shop,Restaurant,Bar,Pub,Indian Restaurant,Beer Garden,Museum,Fish & Chips Shop,Art Gallery
1,ANHALT ROAD,Pub,Grocery Store,Cocktail Bar,Art Gallery,Gym / Fitness Center,Diner,Garden,Pier,French Restaurant,English Restaurant
2,ANSDELL TERRACE,Restaurant,Hotel,Juice Bar,Clothing Store,Italian Restaurant,Bakery,Pub,Garden,Indian Restaurant,Chinese Restaurant
3,APPLEGARTH ROAD,Pub,Bar,Nightclub,Casino,Coworking Space,Filipino Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant
4,BARONSMEAD ROAD,Movie Theater,Coffee Shop,Bookstore,Nature Preserve,Farmers Market,Thai Restaurant,Breakfast Spot,Park,Restaurant,Food & Drink Shop


In [85]:
venues_sorted.shape

(149, 11)

In [86]:
london_grouped.shape

(149, 338)

In [87]:
london_grouped=df

In [88]:
# Clutering the properties in 5 Clusters
kclusters = 5
london_grouped_clustering = london_grouped.drop('Street', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)# k-means clutering
kmeans.labels_[0:50]

array([1, 3, 0, 3, 2, 1, 2, 0, 0, 1, 3, 3, 3, 1, 2, 2, 1, 3, 0, 1, 4, 4,
       3, 1, 1, 0, 3, 4, 1, 0, 3, 2, 3, 2, 2, 4, 3, 3, 2, 0, 1, 2, 4, 0,
       4, 0, 0, 4, 0, 0], dtype=int32)

In [89]:
london_grouped_clustering=df # london_grouped_clustering to include dataframe
london_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
196,ALBION SQUARE,2450000.0,-41.273758,173.289393
390,ANHALT ROAD,2435000.0,51.480316,-0.166801
405,ANSDELL TERRACE,2250000.0,51.49989,-0.189103
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457


In [90]:
london_grouped_clustering.shape

(162, 4)

In [91]:
df.shape

(162, 4)

In [92]:
london_grouped_clustering.dtypes
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [93]:
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [94]:
london_grouped_clustering['Cluster Labels'] = kmeans.labels_ # Clustering labels
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street') # merging london_grouped with london_data 
london_grouped_clustering.head(30) 

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,ALBION SQUARE,2450000.0,-41.273758,173.289393,1,Café,Coffee Shop,Restaurant,Bar,Pub,Indian Restaurant,Beer Garden,Museum,Fish & Chips Shop,Art Gallery
390,ANHALT ROAD,2435000.0,51.480316,-0.166801,3,Pub,Grocery Store,Cocktail Bar,Art Gallery,Gym / Fitness Center,Diner,Garden,Pier,French Restaurant,English Restaurant
405,ANSDELL TERRACE,2250000.0,51.49989,-0.189103,0,Restaurant,Hotel,Juice Bar,Clothing Store,Italian Restaurant,Bakery,Pub,Garden,Indian Restaurant,Chinese Restaurant
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667,3,Pub,Bar,Nightclub,Casino,Coworking Space,Filipino Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457,2,Movie Theater,Coffee Shop,Bookstore,Nature Preserve,Farmers Market,Thai Restaurant,Breakfast Spot,Park,Restaurant,Food & Drink Shop
981,BEAUCLERC ROAD,2480000.0,30.211452,-81.617981,1,Spa,Pizza Place,Automotive Shop,Harbor / Marina,Zoo,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1102,BELVEDERE DRIVE,2340000.0,38.072818,-78.458796,2,Pool,Playground,Auto Workshop,Zoo,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory
1215,BICKENHALL STREET,2208500.0,51.521201,-0.158908,0,Café,Pizza Place,Hotel,Restaurant,Bakery,Movie Theater,Gastropub,Garden,Bar,Italian Restaurant
1253,BIRCHLANDS AVENUE,2217000.0,51.448394,-0.160468,0,French Restaurant,Pub,Bakery,Coffee Shop,Gym / Fitness Center,Chinese Restaurant,Train Station,Lake,Brewery,Dance Studio
1553,BRAMPTON GROVE,2456875.0,51.589961,-0.318525,1,Home Service,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm


In [95]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11) # Map creation and setting color scheme
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [96]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
405,2250000.0,Restaurant,Hotel,Juice Bar,Clothing Store,Italian Restaurant,Bakery,Pub,Garden,Indian Restaurant,Chinese Restaurant
1215,2208500.0,Café,Pizza Place,Hotel,Restaurant,Bakery,Movie Theater,Gastropub,Garden,Bar,Italian Restaurant
1253,2217000.0,French Restaurant,Pub,Bakery,Coffee Shop,Gym / Fitness Center,Chinese Restaurant,Train Station,Lake,Brewery,Dance Studio
2225,2200000.0,,,,,,,,,,
2638,2250000.0,Restaurant,Pharmacy,Seafood Restaurant,Bank,Bakery,Grocery Store,Supermarket,Café,Outdoor Supply Store,Coffee Shop


In [97]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,2450000.0,Café,Coffee Shop,Restaurant,Bar,Pub,Indian Restaurant,Beer Garden,Museum,Fish & Chips Shop,Art Gallery
981,2480000.0,Spa,Pizza Place,Automotive Shop,Harbor / Marina,Zoo,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1553,2456875.0,Home Service,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm
1980,2492500.0,Supermarket,English Restaurant,Gym,Rental Car Location,Café,Coffee Shop,Fast Food Restaurant,Hardware Store,American Restaurant,Pub
2136,2461000.0,Soccer Field,Spa,Windmill,Bus Station,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant


In [98]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
855,2375000.0,Movie Theater,Coffee Shop,Bookstore,Nature Preserve,Farmers Market,Thai Restaurant,Breakfast Spot,Park,Restaurant,Food & Drink Shop
1102,2340000.0,Pool,Playground,Auto Workshop,Zoo,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory
2068,2375000.0,Pub,Hotel,Grocery Store,Park,Ice Cream Shop,Indian Restaurant,Pizza Place,Yoga Studio,Hostel,Bookstore
2129,2379652.7,Pub,Grocery Store,Indian Restaurant,Bakery,Park,Coffee Shop,Yoga Studio,Hotel,Ice Cream Shop,Pizza Place
2944,2367500.0,Hotel,Pub,Garden,Café,Coffee Shop,Italian Restaurant,Chinese Restaurant,Bar,Mediterranean Restaurant,Residential Building (Apartment / Condo)


In [99]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
390,2435000.0,Pub,Grocery Store,Cocktail Bar,Art Gallery,Gym / Fitness Center,Diner,Garden,Pier,French Restaurant,English Restaurant
422,2400000.0,Pub,Bar,Nightclub,Casino,Coworking Space,Filipino Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant
1632,2397132.0,Gym / Fitness Center,Italian Restaurant,Grocery Store,Health & Beauty Service,Coffee Shop,Breakfast Spot,Cricket Ground,Fast Food Restaurant,Ethiopian Restaurant,Event Space
1797,2400000.0,Art Gallery,Zoo,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market
1914,2445000.0,Bar,Grocery Store,Dance Studio,Athletics & Sports,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant


In [100]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2242,2300000.0,Farm,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farmers Market
2406,2286679.0,Pub,Café,Italian Restaurant,Bar,Coffee Shop,Convenience Store,Park,French Restaurant,Breakfast Spot,Belgian Restaurant
2686,2287500.0,Pub,Art Museum,Brewery,Gift Shop,Gym / Fitness Center,Creperie,Filipino Restaurant,Exhibit,Factory,Costume Shop
3377,2298000.0,Hotel,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm
4285,2265000.0,Pub,Shopping Mall,Farm,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant


## Results section

As a matter of first importance, despite the fact that the London Housing Market might be stuck, it is as yet an "ever-green" for business undertakings. We may talk about our outcomes under two primary viewpoints. To begin with, we may look at them as per neighborhoods/London zones. It is fascinating to take note of that, albeit West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) may be viewed as profoundly beneficial settings to buy a land as per civilities and basic offices encompassing such scenes for example primary schools, secondary schools, clinics and supermarkets, South-West London (Wandsworth, Balham) and North-West London (Isliington) are emerging as next future world class settings with a wide scope of conveniences and offices. In like manner, one may focus under-valued genuine bequests in these territories of London so as to make a business issue. Secondly, we may investigate our outcomes as per the five groups we have delivered. Despite the fact that, all bunches could commend an ideal scope of offices and pleasantries, we have discovered two primary examples. The principal design we are alluding to, for example Groups 0, 2 and 4, may target home purchasers inclined to live in 'green' regions with parks, waterfronts. Rather, the second example we are alluding to, for example Bunches 1 and 3, may target people who love bars, theaters and soccer.

## Concluding the Report

To summarize, as per Bloomberg News, the London Housing Market is stuck. It is presently confronting various headwinds, including the possibility of higher duties and an admonition from the Bank of England that U.K. home estimations could fall as much as 30 percent in case of a tumultuous exit from the European Union. In this situation, it is dire to embrace AI devices so as to help homebuyers customer base in London to settle on shrewd and viable choices. Thus, the business issue we were presenting was: how might we offer help to homebuyers customer base in to buy a reasonable land in London in this dubious monetary and money related situation? To take care of this business issue, we grouped London neighborhoods so as to suggest settings and the present normal cost of land where homebuyers can make a land venture. We prescribed beneficial scenes as per luxuries and basic offices encompassing such settings for example primary schools, secondary schools, medical clinics and supermarkets. To start with, we accumulated information on London properties and the relative cost paid information were extricated from the HM Land Registry (http://landregistry.data.gov.uk/). Also, to investigate and target prescribed areas across various settings as indicated by the nearness of luxuries and fundamental offices, we got to information through FourSquare API interface and masterminded them as an information outline for representation. By blending information on London properties and the relative cost paid information from the HM Land Registry and information on conveniences and fundamental offices encompassing such properties from FourSquare API interface, we had the option to suggest beneficial land ventures. Second, The Methodology segment contained four phases: 1. Gather Inspection Data; 2. Investigate and Understand Data; 3. Information readiness and preprocessing; 4. Demonstrating. Specifically, in the displaying area, we utilized the k-implies grouping strategy as it is quick and proficient as far as computational expense, is profoundly adaptable to represent changes in land showcase in London and is precise. At long last, we made the inference that despite the fact that the London Housing Market might be stuck, it is as yet an "ever-green" for business undertakings. We talked about our outcomes under two fundamental points of view. To begin with, we analyzed them as per neighborhoods/London zones. albeit West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) may be viewed as profoundly gainful scenes to buy a land as per comforts and fundamental offices encompassing such settings for example grade schools, secondary schools, medical clinics and supermarkets, South-West London (Wandsworth, Balham) and North-West London (Isliington) are emerging as next future tip top scenes with a wide scope of courtesies and offices. Likewise, one may focus under-estimated genuine homes in these territories of London so as to make a business undertaking. Second, we examined our outcomes as indicated by the five groups we created. While Clusters 0, 2 and 4 may target home purchasers inclined to live in 'green' zones with parks, waterfronts, Clusters 1 and 3 may target people who love bars, theaters and soccer.