# Capstone Project - The Battle of Neighborhoods!

Install and import required packages

In [15]:
# install the Google Trends API
# !pip install pytrends

# install the Daft Listings API
!pip install daftlistings

# install the Daft Scraper API
!pip install daft-scraper==1.2.7

# install geopandas, geopy
!pip install geopandas
!pip install geopy

# install folium
!pip install folium

# install matplotlib
!pip install matplotlib

# install pandas profiling
!pip install pandas-profiling==2.7.1

Collecting daft-scraper==1.2.7
  Downloading daft_scraper-1.2.7-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 1.4 MB/s eta 0:00:011
Installing collected packages: daft-scraper
  Attempting uninstall: daft-scraper
    Found existing installation: daft-scraper 1.3.0
    Uninstalling daft-scraper-1.3.0:
      Successfully uninstalled daft-scraper-1.3.0
Successfully installed daft-scraper-1.2.7


In [1]:
# python packages
import pprint
import requests
import geopandas
import pyproj as pp
import numpy as np
import pandas as pd
import datetime
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Google Trends API packages
from pytrends.request import TrendReq

# Daft listings API packages
from daftlistings import Daft, RentType, SortOrder, SortType, MapVisualization, SaleType
from joblib import Parallel, delayed
import time

# Daft Scraper API packages
from daft_scraper.search import DaftSearch, SearchType
from daft_scraper.search.options import (PropertyType, PropertyTypesOption, AdState, AdStateOption)
from daft_scraper.search.options_location import Location, LocationsOption

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Pandas Profiling
import pandas_profiling as pd_prof

# sklearn import for DBSCAN clustering
from sklearn.cluster import DBSCAN

## 1. Introduction
This section outlines a general background for the Business Problem that I'll be trying to solve as part of the capstone project.

The primary focus for this project would be on the city Dublin and its 22 different District areas.  

This project tries to achieve the following analyses for the respective target audience in mind:  
1) **House Renting**: Finding an apartment to rent in Dublin city is very challenging given the housing crisis. The target audience in this case is people looking for rental apartments in the city. The attempt here is to filter out properties based on user preferences for apartment characteristics, neighborhood choices, pricing and crime rate in the neighborhood in which the property is situated.  
2) **Neighborhood Clustering**: The approach here is to use visualization techniques to cluster districts within Dublin city using clustering techniques based on the venues and venue categories present in different districts. We can get a sense of how different districts are oriented within the city in terms of different places, amenities, transport routes and most importantly whether distance from the city centre plays a role in driving this.  
3) **Google Trends**: This data would act as one of the features where we try to do regerssion analysis for predicting the rent price for each apartment. The hypothesis would be that google trends for a search for an apartment to rent in a particular neighborhood would affect the pricing for the rentals. The analysis performed in the subsequent report would test this hypothesis.  
4) **Crimes**: This data would act as additional filtering for users looking to rent an apartment as well as drive the clustering of the districts as planned in point 2 above. It would be intersting to use visualizatin techniques again to find out if crimes are related to the geograhphical attributes of a particular neighborhood.    

Overall the aim is to aid people looking for rentals in Dublin city and help them filter out neighborhoods and properties based on their preferences as well as other local factors driving their decision making.  
Apart from that, the visualiztion techniques used for analysing different datasets would help certain stakeholders make decisions in terms of government planning, business marketing decisions as well as general readers looking for some insights of their own city! 

## 2. Data
This section defines the different data sources as well as their sample examples that have been used for this assignment.

### 2) Daft Listings API
As seen below, this is a very useful API (https://github.com/AnthonyBloomer/daftlistings/) yet simple to use and get upto speed.  
The sample example below shows a search using the API to get all listings in "Dublin city for rental 3-bed apartments with a max price of 2800EUR and furnished".  
We fetch all such listings and build a dataframe containing all the useful features for each property which as seen below would consist of <price', 'facilities', 'formalised_address', 'num_bedrooms', 'num_bathrooms', 'latitude', 'longitude'>  
This data would help us recommend properties to the targeted end-user as well as the geographical  coordinates would help us visually analyse the data in question.  

### 4) Foursquare Places API
Finally, the last part involves a similar approach taken during the previous weeks in this course where we had analysed different neighborhoods in Toronto, Canada.  
The challenge here is to obtain different districts comprising within Dublin City and obtain their respectice geographical coordinates using Nominatim geolocator.  
The sample code given below shows how we plan to construct the final dataframe where each row would be an individual venue along-with the attributes of each of the venues including their geolcation coordinates.  
OneHotEncoding can be used to get a feature representing distribution of different types of venues as well as the most popular and dominating venue type in each of the districts within Dublin city.  

In [85]:
print('There are {} uniques categories.'.format(len(dublin_venues['Venue Category'].unique())))

There are 192 uniques categories.


In [86]:
# one hot encoding
dublin_onehot = pd.get_dummies(dublin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dublin_onehot['Neighborhood'] = dublin_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(dublin_onehot)
cols.insert(0, cols.pop(cols.index('Neighborhood')))
dublin_onehot = dublin_onehot.loc[:, cols]

dublin_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [87]:
dublin_onehot.shape

(1646, 193)

In [89]:
dublin_grouped = dublin_onehot.groupby('Neighborhood').mean().reset_index()
dublin_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Ballinteer,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.013158,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.026316,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158,0.052632,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.105263,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Blackrock,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.028571,0.014286,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.014286,0.0,0.0,0.028571,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0
2,Clondalkin,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Dublin 1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.1,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
4,Dublin 10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.137931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Dublin 11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.096774,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Dublin 12,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.051282,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.128205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dublin 13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dublin 14,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.06,0.0,0.06,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
9,Dublin 15,0.012987,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.051948,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.025974,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.038961,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.012987,0.012987,0.012987,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [90]:
dublin_grouped.shape

(25, 193)

In [91]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [95]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dublin_grouped['Neighborhood']

for ind in np.arange(dublin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dublin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ballinteer,Supermarket,Café,Pub,Coffee Shop,Clothing Store,Department Store,Gym,Furniture / Home Store,Italian Restaurant,Park
1,Blackrock,Pub,Café,Train Station,Park,Coffee Shop,Shopping Mall,Supermarket,Bar,Thai Restaurant,Italian Restaurant
2,Clondalkin,Hotel,Bar,Convenience Store,Coffee Shop,Restaurant,Supermarket,Chinese Restaurant,Light Rail Station,Gym,Golf Course
3,Dublin 1,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
4,Dublin 10,Supermarket,Pub,Park,Gym,Hotel,Coffee Shop,Fast Food Restaurant,Hardware Store,Bowling Alley,Café
5,Dublin 11,Supermarket,Park,Convenience Store,Pub,Grocery Store,Sandwich Place,Tram Station,Sporting Goods Shop,Breakfast Spot,Chinese Restaurant
6,Dublin 12,Supermarket,Park,Convenience Store,Fast Food Restaurant,Tram Station,Coffee Shop,Grocery Store,Hardware Store,Shopping Mall,Motorcycle Shop
7,Dublin 13,Seafood Restaurant,Pub,Café,Fish Market,Ice Cream Shop,Harbor / Marina,Golf Course,Coffee Shop,Bar,Breakfast Spot
8,Dublin 14,Supermarket,Pub,Coffee Shop,Clothing Store,Café,Department Store,Pizza Place,Restaurant,Discount Store,Pharmacy
9,Dublin 15,Coffee Shop,Supermarket,Clothing Store,Italian Restaurant,Train Station,Fast Food Restaurant,Furniture / Home Store,Pub,Asian Restaurant,Sporting Goods Shop


In [101]:
# set number of clusters
kclusters = 3

dublin_grouped_clustering = dublin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dublin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 0, 2, 2, 2, 0, 1, 1, 2, 1, 0, 1, 1, 0, 2, 0, 2, 0, 1, 2,
       0, 1, 2], dtype=int32)

In [100]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dublin_merged = dublin_df

# merge dublin_grouped with dublin_merged to add latitude/longitude for each neighborhood
dublin_merged = dublin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
dublin_merged['Cluster Labels'].fillna(3.0, inplace=True)
dublin_merged['Cluster Labels'] = dublin_merged['Cluster Labels'].astype(int)

dublin_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dublin 1,53.352488,-6.256646,0,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
1,Dublin 2,53.33894,-6.252713,0,Café,Coffee Shop,Park,Hotel,Plaza,Pub,Restaurant,Cocktail Bar,Grocery Store,Bakery
2,Dublin 3,53.361223,-6.185467,1,Café,Pub,Boat or Ferry,Beach,Park,Scenic Lookout,Convenience Store,Train Station,Restaurant,Port
3,Dublin 4,53.327507,-6.227486,0,Pub,Café,Restaurant,Coffee Shop,Hotel,Park,Grocery Store,Gastropub,Pizza Place,Plaza
4,Dublin 5,53.383454,-6.181923,2,Supermarket,Grocery Store,Train Station,Convenience Store,Fast Food Restaurant,Pub,Shopping Mall,Café,Pizza Place,Bus Stop


In [131]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dublin_merged['Latitude'], dublin_merged['Longitude'], dublin_merged['Neighborhood'], dublin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [2]:
# exploring the daft scraper API for the latest version of Daft.ie

In [2]:
# call to the API for fetching all SALE AGREED properties in Dublin
options = [
    PropertyTypesOption([PropertyType.ALL]),
    LocationsOption([Location.DUBLIN_COUNTY]),
    AdStateOption(AdState.AGREED)
]

api = DaftSearch(SearchType.SALE)
listings = api.search(options)

In [3]:
print(len(listings))

cnt_price = 0
cnt_abr_price = 0

for listing in listings:
    if hasattr(listing, 'price'):
        cnt_price += 1
    if hasattr(listing, 'abbreviatedPrice'):
        cnt_abr_price += 1

print(cnt_price, cnt_abr_price)

2920
2848 2920


In [4]:
test_df2 = pd.DataFrame([vars(f) for f in listings])

In [5]:
test_df2.head()

Unnamed: 0,_id,sections,seoFriendlyPath,category,seller,propertyType,daftShortcode,featuredLevel,publishDate,media,state,abbreviatedPrice,point,seoTitle,numBedrooms,saleType,ber,title,pageBranding,label,newHome,url,price,numBathrooms,floorArea,propertySize,sticker,priceHistory
0,2554301,"[Property, New Homes, Houses]",/new-home-for-sale/lynton-old-connaught-avenue...,New Homes,"{'name': 'Derek Byrne', 'branch': 'Derek Byrne...",Houses,9190876,FEATURED,1601485470000,"{'totalImages': 5, 'hasVideo': False, 'hasBroc...",SALE_AGREED,€745k,"{'point_type': 'Point', 'coordinates': [-6.119...","Lynton, Old Connaught Avenue, Bray, Co. Wicklow",5.0,[For Sale],{'rating': 'A3'},"Lynton, Old Connaught Avenue, Bray, Co. Wicklow",{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,SALE AGREED,"{'totalUnitTypes': 1, 'subUnits': [{'id': 2554...",https://www.daft.ie//new-home-for-sale/lynton-...,,,,,,
1,2611416,"[Property, Residential, House, Semi-Detached H...",/for-sale/semi-detached-house-51-bramblefield-...,Buy,"{'name': 'Thomas Fitzpatrick', 'branch': 'Sher...",Semi-D,12791647,FEATURED,1615021385000,"{'totalImages': 12, 'hasVideo': False, 'hasBro...",SALE_AGREED,€345k,"{'point_type': 'Point', 'coordinates': [-6.414...","51 Bramblefield Crescent, Clonee, Dublin 15",3.0,[For Sale],"{'epi': '224.69 kWh/m2/yr', 'rating': 'C3', 'c...","51 Bramblefield Crescent, Clonee, Dublin 15",{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,SALE AGREED,,https://www.daft.ie//for-sale/semi-detached-ho...,345000.0,3.0,"{'unit': 'METRES_SQUARED', 'value': '101'}",101 m²,,
2,2609624,"[Property, Residential, House, Semi-Detached H...",/for-sale/semi-detached-house-1-moatfield-park...,Buy,"{'alternativePhone': '087 333 3823', 'name': '...",Semi-D,12787240,FEATURED,1613984495000,"{'totalImages': 21, 'hasVideo': True, 'hasBroc...",SALE_AGREED,€380k,"{'point_type': 'Point', 'coordinates': [-6.193...","1 Moatfield Park, Coolock, Artane, Dublin 5",3.0,[For Sale],"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","1 Moatfield Park, Coolock, Artane, Dublin 5",{'standardLogo': 'https://photos.cdn.dsch.ie/M...,SALE AGREED,,https://www.daft.ie//for-sale/semi-detached-ho...,380000.0,1.0,"{'unit': 'METRES_SQUARED', 'value': '91'}",91 m²,,
3,1118178,"[Property, New Homes, Houses]",/new-home-for-sale/robswall-by-hollybrook-home...,New Homes,"{'alternativePhone': '01 634 2466', 'name': 'C...",Houses,941204,FEATURED,1605640420000,"{'totalImages': 11, 'hasVideo': False, 'hasBro...",SALE_AGREED,€680k,"{'point_type': 'Point', 'coordinates': [-6.134...","Robswall by Hollybrook Homes, Coast Road, Mala...",2.0,[For Sale],{'rating': 'A3'},"Robswall by Hollybrook Homes, Coast Road, Mala...",{'standardLogo': 'https://photos.cdn.dsch.ie/M...,SALE AGREED,"{'totalUnitTypes': 1, 'subUnits': [{'id': 1190...",https://www.daft.ie//new-home-for-sale/robswal...,,,,,Easy Commute,
4,2930456,"[Property, Residential, House, Terraced House]",/for-sale/terraced-house-9-maolbuille-road-gla...,Buy,"{'name': 'Jennifer Byrne', 'branch': 'Sherry F...",Terrace,13895826,PREMIUM,1615021401000,"{'totalImages': 13, 'hasVideo': False, 'hasBro...",SALE_AGREED,€385k,"{'point_type': 'Point', 'coordinates': [-6.268...","9 Maolbuille Road, Glasnevin, Dublin 11",3.0,[For Sale],"{'epi': '364.68 kWh/m2/yr', 'rating': 'E2', 'c...","9 Maolbuille Road, Glasnevin, Dublin 11",{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,SALE AGREED,,https://www.daft.ie//for-sale/terraced-house-9...,385000.0,2.0,"{'unit': 'METRES_SQUARED', 'value': '87'}",87 m²,,


In [6]:
# only keep columns of interest
final_df = test_df2[['title', 'propertyType', 'category', 'numBedrooms', 'numBathrooms', 'price', 'abbreviatedPrice', 'ber', 'point', 'publishDate', 'seller', 'floorArea']]

In [7]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea
0,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow",Houses,New Homes,5.0,,,€745k,{'rating': 'A3'},"{'point_type': 'Point', 'coordinates': [-6.119...",1601485470000,"{'name': 'Derek Byrne', 'branch': 'Derek Byrne...",
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Semi-D,Buy,3.0,3.0,345000.0,€345k,"{'epi': '224.69 kWh/m2/yr', 'rating': 'C3', 'c...","{'point_type': 'Point', 'coordinates': [-6.414...",1615021385000,"{'name': 'Thomas Fitzpatrick', 'branch': 'Sher...","{'unit': 'METRES_SQUARED', 'value': '101'}"
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.193...",1613984495000,"{'alternativePhone': '087 333 3823', 'name': '...","{'unit': 'METRES_SQUARED', 'value': '91'}"
3,"Robswall by Hollybrook Homes, Coast Road, Mala...",Houses,New Homes,2.0,,,€680k,{'rating': 'A3'},"{'point_type': 'Point', 'coordinates': [-6.134...",1605640420000,"{'alternativePhone': '01 634 2466', 'name': 'C...",
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Terrace,Buy,3.0,2.0,385000.0,€385k,"{'epi': '364.68 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.268...",1615021401000,"{'name': 'Jennifer Byrne', 'branch': 'Sherry F...","{'unit': 'METRES_SQUARED', 'value': '87'}"


In [8]:
# logic to fetch the neighbourhood for each row depending on the number of tokens as part of the title split
new = final_df["title"].str.split(",", n = 6, expand = True) 

new[5].fillna(new[4], inplace=True)
new[5].fillna(new[3], inplace=True)
new[5].fillna(new[2], inplace=True)
new[5].fillna(new[1], inplace=True)
new[5].fillna(new[0], inplace=True)

In [9]:
final_df["neighbourhood"]= new[5] 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [10]:
# filter out neighbourhoods with low cardinality
col = 'neighbourhood'
n = 10
final_df = final_df[final_df.groupby(col)[col].transform('count').ge(n)]

vague_n = final_df[final_df['neighbourhood'] == ' Co. Dublin'].index
final_df.drop(vague_n , inplace=True)

In [11]:
final_df.groupby(['neighbourhood']).size()

neighbourhood
 Dublin 1      64
 Dublin 10     32
 Dublin 11    154
 Dublin 12    132
 Dublin 13     97
 Dublin 14    126
 Dublin 15    238
 Dublin 16     78
 Dublin 17     14
 Dublin 18    100
 Dublin 2      32
 Dublin 20     25
 Dublin 22     60
 Dublin 24    127
 Dublin 3     118
 Dublin 4     123
 Dublin 5     124
 Dublin 6     124
 Dublin 6W     38
 Dublin 7     115
 Dublin 8     152
 Dublin 9     127
dtype: int64

In [12]:
len(final_df)

2200

In [13]:
# logic to replace 0 price values with corresponding figures from abbreviatedPrice column
final_df['val'] = final_df['abbreviatedPrice'].str.replace('€','')
final_df['val'] = final_df['val'].str.replace('+','')
final_df['val'] = final_df['val'].str.replace('POA','0')

final_df.val = (final_df.val.replace(r'[kM]+$', '', regex=True).astype(float) * final_df.val.str.extract(r'[\d\.]+([kM]+)', expand=False).fillna(1).replace(['k','M'], [10**3, 10**6]).astype(int))

final_df.price.fillna(final_df.val, inplace=True)

  This is separate from the ipykernel package so we can avoid doing imports until


In [14]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea,neighbourhood,val
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Semi-D,Buy,3.0,3.0,345000.0,€345k,"{'epi': '224.69 kWh/m2/yr', 'rating': 'C3', 'c...","{'point_type': 'Point', 'coordinates': [-6.414...",1615021385000,"{'name': 'Thomas Fitzpatrick', 'branch': 'Sher...","{'unit': 'METRES_SQUARED', 'value': '101'}",Dublin 15,345000.0
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.193...",1613984495000,"{'alternativePhone': '087 333 3823', 'name': '...","{'unit': 'METRES_SQUARED', 'value': '91'}",Dublin 5,380000.0
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Terrace,Buy,3.0,2.0,385000.0,€385k,"{'epi': '364.68 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.268...",1615021401000,"{'name': 'Jennifer Byrne', 'branch': 'Sherry F...","{'unit': 'METRES_SQUARED', 'value': '87'}",Dublin 11,385000.0
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Apartment,Buy,2.0,2.0,395000.0,€395k,"{'epi': '40.56 kWh/m2/yr', 'rating': 'C3', 'co...","{'point_type': 'Point', 'coordinates': [-6.237...",1615021364000,"{'name': 'Emma Curran', 'branch': 'Sherry Fitz...","{'unit': 'METRES_SQUARED', 'value': '56'}",Dublin 4,395000.0
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...","{'point_type': 'Point', 'coordinates': [-6.392...",1615021423000,"{'name': 'Ronan Healy', 'branch': 'Sherry Fitz...","{'unit': 'METRES_SQUARED', 'value': '71'}",Dublin 22,195000.0


In [15]:
# remove rows with 0 price
zero_val = final_df[final_df['val'] == 0].index
final_df.drop(zero_val , inplace=True)

final_df.groupby(['val']).size()

val
75000.0       1
95000.0       1
129000.0      1
130000.0      1
135000.0      2
139000.0      2
140000.0      5
145000.0      1
150000.0      5
159000.0      1
160000.0      7
165000.0      5
169000.0      2
170000.0      8
175000.0      9
179000.0      3
180000.0     10
185000.0      9
188000.0      1
189000.0      2
190000.0     12
194000.0      1
195000.0     17
197000.0      1
198000.0      1
199000.0     13
200000.0     23
205000.0      3
209000.0      3
210000.0     17
212000.0      1
215000.0     21
219000.0      1
220000.0     28
223000.0      1
225000.0     49
229000.0      7
230000.0     27
235000.0     37
238000.0      1
239000.0      2
240000.0     25
245000.0     20
249000.0      9
250000.0     67
255000.0      8
259000.0      5
260000.0     34
265000.0     25
267000.0      1
269000.0      5
270000.0     25
275000.0     73
279000.0      5
280000.0     19
285000.0     43
286000.0      1
288000.0      1
289000.0      2
290000.0     22
295000.0     60
298000.0      1
2990

In [16]:
test_df3 = pd.concat([final_df.drop(['seller'], axis=1), final_df['seller'].apply(pd.Series)], axis=1)

test_df3 = pd.concat([test_df3.drop(['floorArea'], axis=1), test_df3['floorArea'].apply(pd.Series)], axis=1)

In [17]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,neighbourhood,val,name,branch,phone,squareLogo,backgroundColour,address,licenceNumber,showContactForm,sellerType,sellerId,standardLogo,alternativePhone,profileImage,phoneWhenToCall,0,unit,value
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Semi-D,Buy,3.0,3.0,345000.0,€345k,"{'epi': '224.69 kWh/m2/yr', 'rating': 'C3', 'c...","{'point_type': 'Point', 'coordinates': [-6.414...",1615021385000,Dublin 15,345000.0,Thomas Fitzpatrick,Sherry FitzGerald Clonee,01-801 8090,https://photos.cdn.dsch.ie/NDkxZjhiMzNkOWQwZDU...,#003764,"Unit 2 Abbey House,\r\nMain Street,\r\nClonee,...",2183,True,BRANDED_AGENT,10888,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,101
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.193...",1613984495000,Dublin 5,380000.0,Avril Ward MIPAV MMCEPI,Delaney Estates,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,True,BRANDED_AGENT,453,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,087 333 3823,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,,,METRES_SQUARED,91
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Terrace,Buy,3.0,2.0,385000.0,€385k,"{'epi': '364.68 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.268...",1615021401000,Dublin 11,385000.0,Jennifer Byrne,Sherry FitzGerald Drumcondra,01 837 3737,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#0f3a5d,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,2183,True,BRANDED_AGENT,2655,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,https://photos.cdn.dsch.ie/M2FjOGQ4NDVlNTI0MDM...,,,METRES_SQUARED,87
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Apartment,Buy,2.0,2.0,395000.0,€395k,"{'epi': '40.56 kWh/m2/yr', 'rating': 'C3', 'co...","{'point_type': 'Point', 'coordinates': [-6.237...",1615021364000,Dublin 4,395000.0,Emma Curran,Sherry FitzGerald Ballsbridge,01 269 8888,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003866,"176 Pembroke Road, \r\nBallsbridge, \r\nDublin...",2183,True,BRANDED_AGENT,9990,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,56
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...","{'point_type': 'Point', 'coordinates': [-6.392...",1615021423000,Dublin 22,195000.0,Ronan Healy,Sherry FitzGerald Tallaght,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,,2183,True,BRANDED_AGENT,10821,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,71


In [18]:
test_df3['value'] = test_df3['value'].where(test_df3['unit'] == 'METRES_SQUARED', test_df3['value'].astype(float) * 4046.86)

In [19]:
test_df3.groupby(['value']).size()

value
121.4058               1
161.8744               1
202.34300000000002     3
404.68600000000004     2
445.1546               1
607.029                1
809.3720000000001      1
1011.715               1
1214.058               1
3480.2996000000003     1
3601.7054000000003     1
4046.86                1
6596.3818              1
100                   29
101                   17
102                   24
103                   19
104                    9
105                   24
106                   10
107                   18
108                   12
109                   14
110                   15
111                   12
112                   11
113                    4
114                    5
115                    9
116                   14
117                    5
118                    9
119                    3
120                   16
121                    4
122                    8
123                    7
124                   12
125                   13
126                

In [20]:
# split out the point column into (long, lat) values as 2 new columns
test_df3 = pd.concat([test_df3.drop(['point'], axis=1), test_df3['point'].apply(pd.Series)], axis=1)

test_df3[['longitude','latitude']] = pd.DataFrame(test_df3.coordinates.tolist(), index=test_df3.index)

In [23]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,publishDate,neighbourhood,val,name,branch,phone,squareLogo,backgroundColour,address,licenceNumber,showContactForm,sellerType,sellerId,standardLogo,alternativePhone,profileImage,phoneWhenToCall,0,unit,value,point_type,coordinates,longitude,latitude,0.1,code,epi,rating
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Semi-D,Buy,3.0,3.0,345000.0,€345k,1615021385000,Dublin 15,345000.0,Thomas Fitzpatrick,Sherry FitzGerald Clonee,01-801 8090,https://photos.cdn.dsch.ie/NDkxZjhiMzNkOWQwZDU...,#003764,"Unit 2 Abbey House,\r\nMain Street,\r\nClonee,...",2183,True,BRANDED_AGENT,10888,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,101,Point,"[-6.414363, 53.405304]",-6.414363,53.405304,,102214400,224.69 kWh/m2/yr,C3
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,1613984495000,Dublin 5,380000.0,Avril Ward MIPAV MMCEPI,Delaney Estates,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,True,BRANDED_AGENT,453,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,087 333 3823,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,,,METRES_SQUARED,91,Point,"[-6.193117, 53.388871]",-6.193117,53.388871,,113454078,379.75 kWh/m2/yr,E2
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Terrace,Buy,3.0,2.0,385000.0,€385k,1615021401000,Dublin 11,385000.0,Jennifer Byrne,Sherry FitzGerald Drumcondra,01 837 3737,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#0f3a5d,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,2183,True,BRANDED_AGENT,2655,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,https://photos.cdn.dsch.ie/M2FjOGQ4NDVlNTI0MDM...,,,METRES_SQUARED,87,Point,"[-6.268219, 53.386685]",-6.268219,53.386685,,111143335,364.68 kWh/m2/yr,E2
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Apartment,Buy,2.0,2.0,395000.0,€395k,1615021364000,Dublin 4,395000.0,Emma Curran,Sherry FitzGerald Ballsbridge,01 269 8888,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003866,"176 Pembroke Road, \r\nBallsbridge, \r\nDublin...",2183,True,BRANDED_AGENT,9990,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,56,Point,"[-6.237199, 53.338738]",-6.237199,53.338738,,113483481,40.56 kWh/m2/yr,C3
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,1615021423000,Dublin 22,195000.0,Ronan Healy,Sherry FitzGerald Tallaght,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,,2183,True,BRANDED_AGENT,10821,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,71,Point,"[-6.392492, 53.32423]",-6.392492,53.32423,,101436293,178.87 kWh/m2/yr,C2


In [None]:
# similarly split the ber column to fetch the rating
test_df3 = pd.concat([test_df3.drop(['ber'], axis=1), test_df3['ber'].apply(pd.Series)], axis=1)

In [25]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,publishDate,neighbourhood,val,name,branch,phone,squareLogo,backgroundColour,address,licenceNumber,showContactForm,sellerType,sellerId,standardLogo,alternativePhone,profileImage,phoneWhenToCall,0,unit,value,point_type,coordinates,longitude,latitude,0.1,code,epi,rating
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Semi-D,Buy,3.0,3.0,345000.0,€345k,1615021385000,Dublin 15,345000.0,Thomas Fitzpatrick,Sherry FitzGerald Clonee,01-801 8090,https://photos.cdn.dsch.ie/NDkxZjhiMzNkOWQwZDU...,#003764,"Unit 2 Abbey House,\r\nMain Street,\r\nClonee,...",2183,True,BRANDED_AGENT,10888,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,101,Point,"[-6.414363, 53.405304]",-6.414363,53.405304,,102214400,224.69 kWh/m2/yr,C3
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,1613984495000,Dublin 5,380000.0,Avril Ward MIPAV MMCEPI,Delaney Estates,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,True,BRANDED_AGENT,453,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,087 333 3823,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,,,METRES_SQUARED,91,Point,"[-6.193117, 53.388871]",-6.193117,53.388871,,113454078,379.75 kWh/m2/yr,E2
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Terrace,Buy,3.0,2.0,385000.0,€385k,1615021401000,Dublin 11,385000.0,Jennifer Byrne,Sherry FitzGerald Drumcondra,01 837 3737,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#0f3a5d,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,2183,True,BRANDED_AGENT,2655,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,https://photos.cdn.dsch.ie/M2FjOGQ4NDVlNTI0MDM...,,,METRES_SQUARED,87,Point,"[-6.268219, 53.386685]",-6.268219,53.386685,,111143335,364.68 kWh/m2/yr,E2
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Apartment,Buy,2.0,2.0,395000.0,€395k,1615021364000,Dublin 4,395000.0,Emma Curran,Sherry FitzGerald Ballsbridge,01 269 8888,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003866,"176 Pembroke Road, \r\nBallsbridge, \r\nDublin...",2183,True,BRANDED_AGENT,9990,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,56,Point,"[-6.237199, 53.338738]",-6.237199,53.338738,,113483481,40.56 kWh/m2/yr,C3
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,1615021423000,Dublin 22,195000.0,Ronan Healy,Sherry FitzGerald Tallaght,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,,2183,True,BRANDED_AGENT,10821,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,,,,,METRES_SQUARED,71,Point,"[-6.392492, 53.32423]",-6.392492,53.32423,,101436293,178.87 kWh/m2/yr,C2


In [26]:
# filter out unwanted columns
final_df2 = test_df3[['title', 'neighbourhood', 'propertyType', 'numBedrooms', 'numBathrooms', 'value', 'val', 'rating', 'sellerId', 'longitude', 'latitude', 'publishDate']]

In [27]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Dublin 15,Semi-D,3.0,3.0,101,345000.0,C3,10888,-6.414363,53.405304,1615021385000
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Dublin 11,Terrace,3.0,2.0,87,385000.0,E2,2655,-6.268219,53.386685,1615021401000
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Dublin 4,Apartment,2.0,2.0,56,395000.0,C3,9990,-6.237199,53.338738,1615021364000
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,,71,195000.0,C2,10821,-6.392492,53.32423,1615021423000


In [28]:
final_df2[final_df2['numBathrooms'].isnull()].groupby(['propertyType']).size()

propertyType
Apartment    12
Bungalow      2
Detached      5
Houses        2
Semi-D        4
Site         16
Terrace      10
dtype: int64

In [29]:
final_df2.groupby(['rating']).size()

rating
A2         13
A3         25
B1          9
B2         52
B3        133
C1        171
C2        191
C3        211
D1        258
D2        274
E1        187
E2        158
F         142
G         129
SI_666     56
dtype: int64

In [30]:
# handle missing values, replacing with a non-existent constant values
final_df2['numBedrooms'] = final_df2['numBedrooms'].fillna(-1)

final_df2['numBathrooms'] = final_df2['numBathrooms'].fillna(-1)

final_df2['rating'] = final_df2['rating'].fillna('ZZZ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [31]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Dublin 15,Semi-D,3.0,3.0,101,345000.0,C3,10888,-6.414363,53.405304,1615021385000
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Dublin 11,Terrace,3.0,2.0,87,385000.0,E2,2655,-6.268219,53.386685,1615021401000
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Dublin 4,Apartment,2.0,2.0,56,395000.0,C3,9990,-6.237199,53.338738,1615021364000
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1615021423000


In [32]:
mapping_df = final_df2[:100]

mapping_df['n_neigh'] = mapping_df.groupby('neighbourhood').ngroup()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [33]:
mapping_df.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate,n_neigh
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Dublin 15,Semi-D,3.0,3.0,101,345000.0,C3,10888,-6.414363,53.405304,1615021385000,6
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000,16
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Dublin 11,Terrace,3.0,2.0,87,385000.0,E2,2655,-6.268219,53.386685,1615021401000,2
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Dublin 4,Apartment,2.0,2.0,56,395000.0,C3,9990,-6.237199,53.338738,1615021364000,15
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1615021423000,12


In [34]:
# set color scheme for the neighbourhoods
num_neigh = mapping_df.apply(pd.Series.nunique)['neighbourhood']

x = np.arange(num_neigh)
ys = [i + x + (i*x)**2 for i in range(num_neigh)]

colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [35]:
address = 'Dublin, Ireland'

geolocator = Nominatim(user_agent="dublin_locator")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dublin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dublin are 53.3497645, -6.2602732.


In [36]:
# create map of Dublin using latitude and longitude values
map_dublin = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, title, n_neigh in zip(mapping_df['latitude'], mapping_df['longitude'], mapping_df['title'], mapping_df['n_neigh']):
    label = '{}'.format(title)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color=rainbow[n_neigh],
        fill=True,
        fill_color=rainbow[n_neigh],
        fill_opacity=0.7,
        parse_html=False).add_to(map_dublin)  
    
map_dublin

In [37]:
final_df2 = final_df2.rename({'value': 'floorArea', 'val': 'price'}, axis=1)  # new method

final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate
1,"51 Bramblefield Crescent, Clonee, Dublin 15",Dublin 15,Semi-D,3.0,3.0,101,345000.0,C3,10888,-6.414363,53.405304,1615021385000
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
4,"9 Maolbuille Road, Glasnevin, Dublin 11",Dublin 11,Terrace,3.0,2.0,87,385000.0,E2,2655,-6.268219,53.386685,1615021401000
5,"19 Pembroke Square, The Lansdowne Block, Grand...",Dublin 4,Apartment,2.0,2.0,56,395000.0,C3,9990,-6.237199,53.338738,1615021364000
6,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1615021423000


In [38]:
# remove rows with NaN for newly added columns
final_df2 = final_df2.dropna()

In [39]:
final_df2['pricePerBedroom'] = final_df2['price'] / final_df2['numBedrooms']

In [40]:
final_df2['avgPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.average)

In [41]:
final_df2['medianPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.median)

In [42]:
final_df2['deltaAvgPrice'] = final_df2['avgPriceNeighbourhood'] - final_df2['price']

In [43]:
final_df2['deltaMedianPrice'] = final_df2['medianPriceNeighbourhood'] - final_df2['price']

In [44]:
dict_north_south = {'Dublin 1':'N', 'Dublin 10':'S', 'Dublin 11':'N', 'Dublin 12':'S', 'Dublin 13':'N', 'Dublin 14':'S', 'Dublin 15':'N', 
                    'Dublin 16':'S', 'Dublin 17':'N', 'Dublin 18':'S', 'Dublin 2':'S', 'Dublin 20':'S', 'Dublin 22':'S', 'Dublin 24':'S', 
                    'Dublin 3':'N', 'Dublin 4':'S', 'Dublin 5':'N', 'Dublin 6':'S', 'Dublin 6W':'S', 'Dublin 7':'N', 'Dublin 8':'S', 'Dublin 9':'N'}

In [45]:
final_df2["neighbourhood"] = final_df2["neighbourhood"].str.strip()
final_df2['dublinNorthSouth']=final_df2['neighbourhood'].map(dict_north_south)

final_df2.groupby('dublinNorthSouth').size()

dublinNorthSouth
N    843
S    932
dtype: int64

In [46]:
def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    All args must be of equal length.    
    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

In [47]:
final_df2['distToCity'] = haversine_np(final_df2['longitude'], final_df2['latitude'], -6.2580, 53.3531)

In [48]:
final_df2.isna().sum()

title                       0
neighbourhood               0
propertyType                0
numBedrooms                 0
numBathrooms                0
floorArea                   0
price                       0
rating                      0
sellerId                    0
longitude                   0
latitude                    0
publishDate                 0
pricePerBedroom             0
avgPriceNeighbourhood       0
medianPriceNeighbourhood    0
deltaAvgPrice               0
deltaMedianPrice            0
dublinNorthSouth            0
distToCity                  0
dtype: int64

In [49]:
final_df2['date'] = pd.to_datetime(final_df2['publishDate'], unit='ms')
final_df2['daysSincePublished'] = pd.to_datetime("now") - final_df2['date']
final_df2['daysSincePublished'] = final_df2['daysSincePublished'].apply(lambda x: x.days)

In [50]:
zero_days = final_df2[final_df2['daysSincePublished'] == 0].index
final_df2.drop(zero_days , inplace=True)

final_df2.groupby(['daysSincePublished']).size()

daysSincePublished
1      20
2       7
3      16
4      11
5      13
7       3
8       8
9       7
10     17
11      5
12      9
13      1
14      6
15      6
16     12
17      4
18      8
19     11
21      5
22     14
23      4
24     16
25     12
26      4
28      2
29     11
30     13
31     10
32      4
33      8
34      2
35      4
36      6
37      9
38      9
39      9
40      8
42      5
43      8
44      8
45     28
46     21
47      4
48      2
50     13
51     15
52      5
53     15
54      6
55      1
57      3
58      2
59     11
60     14
61     32
62      2
64      1
65      1
66      1
67      2
70      9
72      8
73     16
74     15
75      4
77      9
78     10
79     10
80     22
81     26
82     10
83      1
84     10
85      8
86      9
87     17
88     18
89      8
91      7
92     19
93     14
94     26
95     21
96      8
98      2
99     11
100    20
101    15
102    13
103    11
105     4
106     9
107    13
108    21
109    20
110    28
112    15
113    11
1

In [51]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate,pricePerBedroom,avgPriceNeighbourhood,medianPriceNeighbourhood,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,date,daysSincePublished
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000,126666.666667,396177.570093,380000.0,16177.570093,0.0,N,5.857172,2021-02-22 09:01:35,12
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000,165000.0,507338.235294,475000.0,12338.235294,-20000.0,S,8.366932,2021-02-24 13:26:35,9
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000,250000.0,377430.107527,350000.0,127430.107527,100000.0,N,1.387431,2021-02-18 10:46:29,16
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,1614866853000,140000.0,316105.263158,300000.0,176105.263158,160000.0,N,0.536232,2021-03-04 14:07:33,1
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,1614866844000,240000.0,316105.263158,300000.0,76105.263158,60000.0,N,0.528618,2021-03-04 14:07:24,1


In [52]:
final_df3 = final_df2[['title','neighbourhood','propertyType','numBedrooms','numBathrooms','floorArea','price','rating','sellerId','longitude','latitude','pricePerBedroom','deltaAvgPrice','deltaMedianPrice','dublinNorthSouth','distToCity','daysSincePublished']]

In [53]:
final_df3.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1


In [54]:
# below is where we make use of the Foursquare API

In [55]:
CLIENT_ID = 'WV2XS4MH5YRWGHLTCFT4CKR4SRWNHWAF3JHWHNN4MKEQWTL3' # your Foursquare ID
CLIENT_SECRET = 'QEWWIOG0M3BT4V0YPNSKJY521MDUBHBYWBXCJFZ0452KP3OT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [56]:
# 1. Food 4d4b7105d754a06374d81259
# 2. Outdoors & Recreation 4d4b7105d754a06377d81259
# 3. Shop & Service 4d4b7105d754a06378d81259

In [57]:
food_cat_ids = ['4bf58dd8d48988d16d941735','4bf58dd8d48988d128941735','4bf58dd8d48988d1e0931735','4bf58dd8d48988d110941735',
                '4bf58dd8d48988d149941735','4bf58dd8d48988d1fa931735','4bf58dd8d48988d1c4941735','4bf58dd8d48988d145941735',
                '4bf58dd8d48988d11b941735','4bf58dd8d48988d16e941735','4bf58dd8d48988d1c5941735','4bf58dd8d48988d143941735',
                '4bf58dd8d48988d1ce941735','4bf58dd8d48988d10e951735','4bf58dd8d48988d1c9941735','4bf58dd8d48988d1ca941735',
                '4bf58dd8d48988d142941735','4bf58dd8d48988d11e941735','4bf58dd8d48988d16a941735','52e81612bcbc57f1066b79f1',
                '4bf58dd8d48988d155941735','4bf58dd8d48988d1f9941735','4bf58dd8d48988d10f941735','4bf58dd8d48988d1cc941735']

recreation_cat_ids = ['4bf58dd8d48988d163941735','4bf58dd8d48988d1e6941735','4bf58dd8d48988d176941735','4bf58dd8d48988d137941735',
                     '4bf58dd8d48988d164941735','4bf58dd8d48988d1e4931735','4bf58dd8d48988d1e0941735','4bf58dd8d48988d12d951735',
                     '4bf58dd8d48988d1e2941735','4bf58dd8d48988d165941735','56aa371be4b08b9a8d57353e','58daa1558bbb0b01f18ec1fd',
                     '4deefb944765f83613cdba6e','4e74f6cabd41c4836eac4c31','56aa371be4b08b9a8d573562','4bf58dd8d48988d15e941735']

shop_cat_ids = ['52f2ab2ebcbc57f1066b8b46','4bf58dd8d48988d103951735','4bf58dd8d48988d1f6941735','4bf58dd8d48988d1fd941735',
               '4d954b0ea243a5684a65b473','4bf58dd8d48988d114951735','4bf58dd8d48988d112951735','4bf58dd8d48988d118951735',
               '4bf58dd8d48988d1f2941735','5032833091d4c4b30a586d60','52dea92d3cf9994f4e043dbb','4bf58dd8d48988d10f951735',
               '4bf58dd8d48988d1f8941735','4bf58dd8d48988d122951735','4bf58dd8d48988d106951735','4bf58dd8d48988d108951735']

In [58]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)
        num_food = 0
        num_recreation = 0
        num_shop = 0
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']

        for v in results:
            num_food += 1 if (v['venue']['categories'][0]['id'] in food_cat_ids) else 0
            num_recreation += 1 if (v['venue']['categories'][0]['id'] in recreation_cat_ids) else 0
            num_shop += 1 if (v['venue']['categories'][0]['id'] in shop_cat_ids) else 0
#         print(num_food, num_recreation, num_shop)
        
        # return only relevant information for each nearby venue
        venues_list.append([
            name, 
            num_food,
            num_recreation,
            num_shop])
#     print(venues_list)

    nearby_venues = pd.DataFrame([venue_list for venue_list in venues_list])
    nearby_venues.columns = ['title', 
                  'numFood',
                  'numRecreation',
                  'numShop']
    
    return(nearby_venues)

In [60]:
dublin_venues = getNearbyVenues(names=final_df3['title'],
                                   latitudes=final_df3['latitude'],
                                   longitudes=final_df3['longitude'])

In [61]:
dublin_venues.shape

(1253, 4)

In [62]:
dublin_venues.head()

Unnamed: 0,title,numFood,numRecreation,numShop
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",50,17,11
1,"150 Broadford Rise, Ballinteer, Dublin 16",46,9,14
2,"Apartment 172, Block C, Dublin 7",51,11,5
3,"1 Fitzgibbon Lane, Dublin 1",57,10,5
4,"18A Fitzgibbon Street, Dublin 1",57,10,5


In [63]:
dublin_venues2 = final_df3.join(dublin_venues.set_index('title'), on='title')

dublin_venues2.shape

(1285, 20)

In [64]:
dublin_venues2 = dublin_venues2.drop_duplicates()

In [65]:
profile = pd_prof.ProfileReport(dublin_venues2) 
profile.to_file("output.html")

Summarize dataset:   0%|          | 0/33 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

  cmap.set_bad(cmap_bad)


Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [2]:
# add meaningful and useful features
# **1. delta from avg price for that neighbourhood = (avg_price_neighbourhood - price)
# **2. delta from median price for that neighbourhood = (median_proce_neighbourhood - price)
# **3. north_south column 1/2 or north/south = dict_north_south {'Dublin 1': 'N', 'Dublin 2', 'S', ...}
# **4. days since ad published = difference between 2 ephocs (publishDate - today)
# **5. distance from city center = difference between lat, long (Haversine formula)
# 6. commute time to city centre by {walking/cycling/train/bus}
# 7. categorical column for price ranges {bins}
# 8. calculated field from 7 => num of properties in that neighbourhood for that price range
# ***9. num_ {Pharmacies, Supermarkets, Restaurants, Cafes, Parks} in 5K radius
# **10. price per bedroom = (price / numBedrooms)

In [66]:
dublin_venues2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12,50,17,11
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9,46,9,14
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16,51,11,5
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1,57,10,5
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1,57,10,5


In [67]:
print('There are {} unique property types.'.format(len(dublin_venues2['propertyType'].unique())))
print('There are {} unique ber ratings.'.format(len(dublin_venues2['rating'].unique())))

There are 9 unique property types.
There are 16 unique ber ratings.


In [68]:
dublin_venues2.shape

(1246, 20)

In [69]:
# one hot encoding
dublin_onehot2 = pd.get_dummies(dublin_venues2[['propertyType']], prefix="", prefix_sep="")

In [70]:
dublin_onehot2.head()

Unnamed: 0,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse
2,0,0,0,0,0,1,0,0,0
11,0,0,0,0,0,1,0,0,0
13,1,0,0,0,0,0,0,0,0
14,0,0,0,0,0,0,0,1,0
15,0,0,0,0,0,0,0,1,0


In [71]:
dublin_onehot3 = pd.get_dummies(dublin_venues2[['rating']], prefix="", prefix_sep="")

In [72]:
dublin_onehot3.head()

Unnamed: 0,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ
2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
11,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
13,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
14,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
15,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [73]:
test_df4 = pd.concat([dublin_venues2, dublin_onehot2], axis=1)

In [75]:
test_df4 = pd.concat([test_df4, dublin_onehot3], axis=1)

In [76]:
test_df4.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12,50,17,11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9,46,9,14,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16,51,11,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [154]:
test_df4["labels_1"] = DBSCAN(eps=0.01, min_samples=10).fit(test_df4[["numBedrooms", "numBathrooms", "floorArea"]].values).labels_
print(test_df4["labels_1"].unique())

[-1  0  1  2  3  4  5  6  7  8  9 10]


In [155]:
test_df4.groupby('labels_1').size()

labels
-1     1103
 0       12
 1       13
 2       10
 3       15
 4       21
 5       14
 6       11
 7       14
 8       12
 9       11
 10      10
dtype: int64

In [157]:
noisy_labels = test_df4[test_df4['labels_1'] == -1].index
mapping_df2 = test_df4.drop(noisy_labels)

mapping_df2.groupby(['labels_1']).size()

labels
0     12
1     13
2     10
3     15
4     21
5     14
6     11
7     14
8     12
9     11
10    10
dtype: int64

In [158]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
num_clusters = 11
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, title, cluster in zip(mapping_df2['latitude'], mapping_df2['longitude'], mapping_df2['title'], mapping_df2['labels_1']):
    label = folium.Popup(str(title) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [159]:
# investigating an interesting cluster nicely spread across the map
test_df4[test_df4.labels_1 == 10]

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ,labels,labels_2
675,"Apartment 63, Saint Peters Square, Phibsboroug...",Dublin 7,Apartment,2.0,1.0,63,265000.0,D2,607,-6.272357,53.361633,132500.0,112430.107527,85000.0,N,1.343799,37,56,12,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,10,-1
789,"41 The Rectory, Stepaside, Dublin 18",Dublin 18,Apartment,2.0,1.0,63,260000.0,C2,2615,-6.219662,53.255898,130000.0,259395.061728,165000.0,S,11.097536,94,23,7,15,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,10,-1
822,"Apartment 26, The Cedars, Herbert Park Lane, B...",Dublin 4,Apartment,2.0,1.0,63,425000.0,C3,329,-6.231967,53.327321,212500.0,282593.220339,70000.0,S,3.345127,59,51,15,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,10,-1
981,"44 Deerpark Drive, Kiltipper, Dublin 24",Dublin 24,Apartment,2.0,1.0,63,185000.0,D1,10948,-6.37623,53.270464,92500.0,96532.608696,90000.0,S,12.08068,88,28,4,22,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,10,-1
1160,"29 Green Isle Court, Clondalkin, Dublin 22",Dublin 22,Apartment,2.0,1.0,63,200000.0,D1,8891,-6.404264,53.310064,100000.0,42820.0,27000.0,S,10.820576,92,39,3,36,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,10,-1
1203,"23 The Park, Kingswood Heights, Kingswood, Dub...",Dublin 24,Bungalow,2.0,1.0,63,285000.0,G,463,-6.369618,53.304563,142500.0,-3467.391304,-10000.0,S,9.163264,88,40,5,31,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,10,-1
1586,"Apartment 131, Inishtrahull Custom House Harbo...",Dublin 1,Apartment,2.0,1.0,63,350000.0,C3,10659,-6.246135,53.350963,175000.0,-33894.736842,-50000.0,N,0.822056,127,54,13,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,10,8
1589,"71 Addison Drive, Glasnevin, Dublin 11",Dublin 11,Apartment,2.0,1.0,63,295000.0,C2,9172,-6.276927,53.376131,147500.0,4732.758621,-13000.0,N,2.850493,70,57,11,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,10,-1
2302,"9 Kearn's Court, Kearn's Place, Kilmainham, Du...",Dublin 8,Apartment,2.0,1.0,63,275000.0,C2,1413,-6.305082,53.341177,137500.0,66384.0,25000.0,S,3.392727,89,49,12,6,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,10,-1
2745,"3 Kilmainham Orchard, Turvey Avenue, Kilmainha...",Dublin 8,Apartment,2.0,1.0,63,250000.0,E1,2621,-6.311155,53.341144,125000.0,91384.0,50000.0,S,3.768185,112,52,9,6,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,10,4


In [None]:
# ["longitude", "latitude"] -> location based clustering
# ["numFood", "numRecreation", "numShop", "distToCity"] -> neighbourhood based clustering

In [180]:
test_df4["labels_2"] = DBSCAN(eps=0.01, min_samples=10).fit(test_df4[["longitude", "latitude"]].values).labels_
print(test_df4["labels_2"].unique())

[ 0  1 -1  2  3  4  5  6  8  7  9]


In [181]:
test_df4.groupby('labels_2').size()

labels_2
-1    144
 0    818
 1     50
 2     35
 3     56
 4     56
 5     42
 6     12
 7     14
 8     10
 9      9
dtype: int64

In [182]:
noisy_labels = test_df4[test_df4['labels_2'] == -1].index
mapping_df2 = test_df4.drop(noisy_labels)

mapping_df2.groupby(['labels_2']).size()

labels_2
0    818
1     50
2     35
3     56
4     56
5     42
6     12
7     14
8     10
9      9
dtype: int64

In [184]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
num_clusters = 10
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, title, cluster in zip(mapping_df2['latitude'], mapping_df2['longitude'], mapping_df2['title'], mapping_df2['labels_2']):
    label = folium.Popup(str(title) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [200]:
test_df4["labels_3"] = DBSCAN(eps=0.01, min_samples=5).fit(test_df4[["numFood", "numRecreation", "numShop"]].values).labels_
print(test_df4["labels_3"].unique())

[-1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37]


In [201]:
test_df4.groupby('labels_3').size()

labels_3
-1     991
 0       6
 1       9
 2       6
 3       9
 4       9
 5       6
 6       5
 7       6
 8       8
 9       6
 10     10
 11     14
 12     10
 13      5
 14      5
 15      8
 16     11
 17      8
 18      5
 19      5
 20      5
 21      5
 22      5
 23      6
 24      7
 25      5
 26      7
 27      5
 28      5
 29      5
 30      5
 31      5
 32      7
 33      6
 34      5
 35      8
 36      8
 37      5
dtype: int64

In [202]:
noisy_labels = test_df4[test_df4['labels_3'] == -1].index
mapping_df2 = test_df4.drop(noisy_labels)

mapping_df2.groupby(['labels_3']).size()

labels_3
0      6
1      9
2      6
3      9
4      9
5      6
6      5
7      6
8      8
9      6
10    10
11    14
12    10
13     5
14     5
15     8
16    11
17     8
18     5
19     5
20     5
21     5
22     5
23     6
24     7
25     5
26     7
27     5
28     5
29     5
30     5
31     5
32     7
33     6
34     5
35     8
36     8
37     5
dtype: int64

In [203]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
num_clusters = 38
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, title, cluster in zip(mapping_df2['latitude'], mapping_df2['longitude'], mapping_df2['title'], mapping_df2['labels_3']):
    label = folium.Popup(str(title) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [204]:
# investigating a cluster from Dublin 7, Smithfield/Stoneybatter
test_df4[test_df4.labels_3 == 2]

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ,labels,labels_2,labels_3
53,"27 Smithfield Gate, Redcow Lane, Smithfield, D...",Dublin 1,Apartment,3.0,2.0,75,350000.0,D2,7483,-6.277353,53.349858,116666.666667,-33894.736842,-50000.0,N,1.333301,1,51,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,-1,0,2
804,"48 Glenbeigh Road, Cabra, Dublin 7",Dublin 7,Terrace,3.0,1.0,73,375000.0,G,1413,-6.295414,53.357974,125000.0,2430.107527,-25000.0,N,2.539896,61,51,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,-1,0,2
1372,"54 Old Cabra Road, Cabra, Dublin 7",Dublin 7,Semi-D,3.0,1.0,90,535000.0,G,841,-6.291896,53.359204,178333.333333,-157569.892473,-185000.0,N,2.348214,116,51,10,5,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,2
1933,"6 Saint Bricin's Park, Arbour Hill, Dublin 7",Dublin 7,Terrace,3.0,2.0,72,395000.0,E2,1087,-6.290769,53.349727,131666.666667,-17569.892473,-45000.0,N,2.205688,51,51,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,-1,0,2
1965,"113 Oxmantown Road, Stoneybatter, Dublin 7",Dublin 7,Terrace,2.0,1.0,66,405000.0,E1,1087,-6.289532,53.352742,202500.0,-27569.892473,-55000.0,N,2.091859,88,51,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,-1,0,2
2470,"18 Ostman Place, Stoneybatter, Dublin 7",Dublin 7,Terrace,2.0,1.0,62,435000.0,ZZZ,1087,-6.285966,53.352548,217500.0,-57569.892473,-85000.0,N,1.85597,127,51,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1,0,2


In [205]:
test_df4.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ,labels,labels_2,labels_3
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12,50,17,11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,-1,0,-1
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9,46,9,14,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,-1,0,-1
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16,51,11,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,-1,0,0
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,-1,0,1
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,-1,0,1


In [207]:
test_df4['price_decile'] = pd.qcut(test_df4['price'], 10, labels=False)

In [208]:
test_df4.groupby('price_decile').size()

price_decile
0    127
1    125
2    146
3    118
4    107
5    137
6    129
7    127
8    105
9    125
dtype: int64

In [210]:
test_df4.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ,labels,labels_2,labels_3,price_decile
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12,50,17,11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,-1,0,-1,5
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9,46,9,14,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,-1,0,-1,7
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16,51,11,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,-1,0,0,2
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,-1,0,1,0
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,-1,0,1,1


In [211]:
def get_price_bucket (row):
    if row['price_decile'] < 3:
        return 'Low'
    elif row['price_decile'] < 7:
        return 'Medium'
    return 'High'

In [212]:
test_df4['price_bucket'] = test_df4.apply(lambda row: get_price_bucket(row), axis=1)

In [213]:
test_df4.groupby('price_bucket').size()

price_bucket
High      357
Low       398
Medium    491
dtype: int64

In [218]:
test_df4.groupby('price_bucket')['price'].mean()

price_bucket
High      641689.075630
Low       229434.673367
Medium    348855.397149
Name: price, dtype: float64

In [219]:
test_df4.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Terrace,Townhouse,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ,labels,labels_2,labels_3,price_decile,price_bucket
2,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,16177.570093,0.0,N,5.857172,12,50,17,11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,-1,0,-1,5,Medium
11,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,12338.235294,-20000.0,S,8.366932,9,46,9,14,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,-1,0,-1,7,High
13,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,127430.107527,100000.0,N,1.387431,16,51,11,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,-1,0,0,2,Low
14,"1 Fitzgibbon Lane, Dublin 1",Dublin 1,Terrace,1.0,1.0,60,140000.0,G,11902,-6.256604,53.357853,140000.0,176105.263158,160000.0,N,0.536232,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,-1,0,1,0,Low
15,"18A Fitzgibbon Street, Dublin 1",Dublin 1,Terrace,1.0,1.0,53,240000.0,D2,11902,-6.2566,53.357783,240000.0,76105.263158,60000.0,N,0.528618,1,57,10,5,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,-1,0,1,1,Low


In [221]:
test_df4.to_csv('output.csv', index=False)