# Capstone Project - The Battle of Neighborhoods!

Install and import required packages

In [15]:
# install the Google Trends API
# !pip install pytrends

# install the Daft Listings API
!pip install daftlistings

# install the Daft Scraper API
!pip install daft-scraper==1.2.7

# install geopandas, geopy
!pip install geopandas
!pip install geopy

# install folium
!pip install folium

# install matplotlib
!pip install matplotlib

# install pandas profiling
!pip install pandas-profiling==2.7.1

Collecting daft-scraper==1.2.7
  Downloading daft_scraper-1.2.7-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 1.4 MB/s eta 0:00:011
Installing collected packages: daft-scraper
  Attempting uninstall: daft-scraper
    Found existing installation: daft-scraper 1.3.0
    Uninstalling daft-scraper-1.3.0:
      Successfully uninstalled daft-scraper-1.3.0
Successfully installed daft-scraper-1.2.7


In [69]:
# python packages
import pprint
import requests
import geopandas
import pyproj as pp
import numpy as np
import pandas as pd
import datetime
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Google Trends API packages
from pytrends.request import TrendReq

# Daft listings API packages
from daftlistings import Daft, RentType, SortOrder, SortType, MapVisualization, SaleType
from joblib import Parallel, delayed
import time

# Daft Scraper API packages
from daft_scraper.search import DaftSearch, SearchType
from daft_scraper.search.options import (PropertyType, PropertyTypesOption, AdState, AdStateOption)
from daft_scraper.search.options_location import Location, LocationsOption

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Pandas Profiling
import pandas_profiling as pd_prof

# sklearn import for DBSCAN clustering
from sklearn.cluster import DBSCAN

## 1. Introduction
This section outlines a general background for the Business Problem that I'll be trying to solve as part of the capstone project.

The primary focus for this project would be on the city Dublin and its 22 different District areas.  

This project tries to achieve the following analyses for the respective target audience in mind:  
1) **House Renting**: Finding an apartment to rent in Dublin city is very challenging given the housing crisis. The target audience in this case is people looking for rental apartments in the city. The attempt here is to filter out properties based on user preferences for apartment characteristics, neighborhood choices, pricing and crime rate in the neighborhood in which the property is situated.  
2) **Neighborhood Clustering**: The approach here is to use visualization techniques to cluster districts within Dublin city using clustering techniques based on the venues and venue categories present in different districts. We can get a sense of how different districts are oriented within the city in terms of different places, amenities, transport routes and most importantly whether distance from the city centre plays a role in driving this.  
3) **Google Trends**: This data would act as one of the features where we try to do regerssion analysis for predicting the rent price for each apartment. The hypothesis would be that google trends for a search for an apartment to rent in a particular neighborhood would affect the pricing for the rentals. The analysis performed in the subsequent report would test this hypothesis.  
4) **Crimes**: This data would act as additional filtering for users looking to rent an apartment as well as drive the clustering of the districts as planned in point 2 above. It would be intersting to use visualizatin techniques again to find out if crimes are related to the geograhphical attributes of a particular neighborhood.    

Overall the aim is to aid people looking for rentals in Dublin city and help them filter out neighborhoods and properties based on their preferences as well as other local factors driving their decision making.  
Apart from that, the visualiztion techniques used for analysing different datasets would help certain stakeholders make decisions in terms of government planning, business marketing decisions as well as general readers looking for some insights of their own city! 

## 2. Data
This section defines the different data sources as well as their sample examples that have been used for this assignment.

### 2) Daft Listings API
As seen below, this is a very useful API (https://github.com/AnthonyBloomer/daftlistings/) yet simple to use and get upto speed.  
The sample example below shows a search using the API to get all listings in "Dublin city for rental 3-bed apartments with a max price of 2800EUR and furnished".  
We fetch all such listings and build a dataframe containing all the useful features for each property which as seen below would consist of <price', 'facilities', 'formalised_address', 'num_bedrooms', 'num_bathrooms', 'latitude', 'longitude'>  
This data would help us recommend properties to the targeted end-user as well as the geographical  coordinates would help us visually analyse the data in question.  

### 4) Foursquare Places API
Finally, the last part involves a similar approach taken during the previous weeks in this course where we had analysed different neighborhoods in Toronto, Canada.  
The challenge here is to obtain different districts comprising within Dublin City and obtain their respectice geographical coordinates using Nominatim geolocator.  
The sample code given below shows how we plan to construct the final dataframe where each row would be an individual venue along-with the attributes of each of the venues including their geolcation coordinates.  
OneHotEncoding can be used to get a feature representing distribution of different types of venues as well as the most popular and dominating venue type in each of the districts within Dublin city.  

In [85]:
print('There are {} uniques categories.'.format(len(dublin_venues['Venue Category'].unique())))

There are 192 uniques categories.


In [86]:
# one hot encoding
dublin_onehot = pd.get_dummies(dublin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dublin_onehot['Neighborhood'] = dublin_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(dublin_onehot)
cols.insert(0, cols.pop(cols.index('Neighborhood')))
dublin_onehot = dublin_onehot.loc[:, cols]

dublin_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [87]:
dublin_onehot.shape

(1646, 193)

In [89]:
dublin_grouped = dublin_onehot.groupby('Neighborhood').mean().reset_index()
dublin_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Ballinteer,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.013158,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.026316,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158,0.052632,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.105263,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Blackrock,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.028571,0.014286,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.014286,0.0,0.0,0.028571,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0
2,Clondalkin,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Dublin 1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.1,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
4,Dublin 10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.137931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Dublin 11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.096774,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Dublin 12,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.051282,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.128205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dublin 13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dublin 14,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.06,0.0,0.06,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
9,Dublin 15,0.012987,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.051948,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.025974,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.038961,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.012987,0.012987,0.012987,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [90]:
dublin_grouped.shape

(25, 193)

In [91]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [95]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dublin_grouped['Neighborhood']

for ind in np.arange(dublin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dublin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ballinteer,Supermarket,Café,Pub,Coffee Shop,Clothing Store,Department Store,Gym,Furniture / Home Store,Italian Restaurant,Park
1,Blackrock,Pub,Café,Train Station,Park,Coffee Shop,Shopping Mall,Supermarket,Bar,Thai Restaurant,Italian Restaurant
2,Clondalkin,Hotel,Bar,Convenience Store,Coffee Shop,Restaurant,Supermarket,Chinese Restaurant,Light Rail Station,Gym,Golf Course
3,Dublin 1,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
4,Dublin 10,Supermarket,Pub,Park,Gym,Hotel,Coffee Shop,Fast Food Restaurant,Hardware Store,Bowling Alley,Café
5,Dublin 11,Supermarket,Park,Convenience Store,Pub,Grocery Store,Sandwich Place,Tram Station,Sporting Goods Shop,Breakfast Spot,Chinese Restaurant
6,Dublin 12,Supermarket,Park,Convenience Store,Fast Food Restaurant,Tram Station,Coffee Shop,Grocery Store,Hardware Store,Shopping Mall,Motorcycle Shop
7,Dublin 13,Seafood Restaurant,Pub,Café,Fish Market,Ice Cream Shop,Harbor / Marina,Golf Course,Coffee Shop,Bar,Breakfast Spot
8,Dublin 14,Supermarket,Pub,Coffee Shop,Clothing Store,Café,Department Store,Pizza Place,Restaurant,Discount Store,Pharmacy
9,Dublin 15,Coffee Shop,Supermarket,Clothing Store,Italian Restaurant,Train Station,Fast Food Restaurant,Furniture / Home Store,Pub,Asian Restaurant,Sporting Goods Shop


In [101]:
# set number of clusters
kclusters = 3

dublin_grouped_clustering = dublin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dublin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 0, 2, 2, 2, 0, 1, 1, 2, 1, 0, 1, 1, 0, 2, 0, 2, 0, 1, 2,
       0, 1, 2], dtype=int32)

In [100]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dublin_merged = dublin_df

# merge dublin_grouped with dublin_merged to add latitude/longitude for each neighborhood
dublin_merged = dublin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
dublin_merged['Cluster Labels'].fillna(3.0, inplace=True)
dublin_merged['Cluster Labels'] = dublin_merged['Cluster Labels'].astype(int)

dublin_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dublin 1,53.352488,-6.256646,0,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
1,Dublin 2,53.33894,-6.252713,0,Café,Coffee Shop,Park,Hotel,Plaza,Pub,Restaurant,Cocktail Bar,Grocery Store,Bakery
2,Dublin 3,53.361223,-6.185467,1,Café,Pub,Boat or Ferry,Beach,Park,Scenic Lookout,Convenience Store,Train Station,Restaurant,Port
3,Dublin 4,53.327507,-6.227486,0,Pub,Café,Restaurant,Coffee Shop,Hotel,Park,Grocery Store,Gastropub,Pizza Place,Plaza
4,Dublin 5,53.383454,-6.181923,2,Supermarket,Grocery Store,Train Station,Convenience Store,Fast Food Restaurant,Pub,Shopping Mall,Café,Pizza Place,Bus Stop


In [131]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dublin_merged['Latitude'], dublin_merged['Longitude'], dublin_merged['Neighborhood'], dublin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [2]:
# exploring the daft scraper API for the latest version of Daft.ie

In [2]:
# call to the API for fetching all SALE AGREED properties in Dublin
options = [
    PropertyTypesOption([PropertyType.ALL]),
    LocationsOption([Location.DUBLIN_COUNTY]),
    AdStateOption(AdState.AGREED)
]

api = DaftSearch(SearchType.SALE)
listings = api.search(options)

In [3]:
print(len(listings))

cnt_price = 0
cnt_abr_price = 0

for listing in listings:
    if hasattr(listing, 'price'):
        cnt_price += 1
    if hasattr(listing, 'abbreviatedPrice'):
        cnt_abr_price += 1

print(cnt_price, cnt_abr_price)

2980
2898 2980


In [4]:
test_df2 = pd.DataFrame([vars(f) for f in listings])

In [5]:
test_df2.head()

Unnamed: 0,featuredLevel,state,price,publishDate,seoTitle,title,_id,category,propertyType,point,seoFriendlyPath,abbreviatedPrice,saleType,seller,numBedrooms,sections,media,daftShortcode,ber,numBathrooms,floorArea,label,pageBranding,propertySize,url,newHome,sticker,priceHistory
0,FEATURED,SALE_AGREED,380000.0,1613984495000,"1 Moatfield Park, Coolock, Artane, Dublin 5","1 Moatfield Park, Coolock, Artane, Dublin 5",2609624,Buy,Semi-D,"{'coordinates': [-6.193117, 53.388871], 'point...",/for-sale/semi-detached-house-1-moatfield-park...,€380k,[For Sale],"{'address': '50 St. Brigid's Road, Artane, D...",3.0,"[Property, Residential, House, Semi-Detached H...","{'hasVirtualTour': True, 'totalImages': 21, 'i...",12787240,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...",1.0,"{'unit': 'METRES_SQUARED', 'value': '91'}",SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,91 m²,https://www.daft.ie//for-sale/semi-detached-ho...,,,
1,FEATURED,SALE_AGREED,,1601485470000,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow","Lynton, Old Connaught Avenue, Bray, Co. Wicklow",2554301,New Homes,Houses,"{'coordinates': [-6.119600389636076, 53.210655...",/new-home-for-sale/lynton-old-connaught-avenue...,€745k,[For Sale],"{'phone': '0879370896', 'squareLogo': 'https:/...",5.0,"[Property, New Homes, Houses]","{'hasVirtualTour': False, 'totalImages': 5, 'i...",9190876,{'rating': 'A3'},,,SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,,https://www.daft.ie//new-home-for-sale/lynton-...,"{'totalUnitTypes': 1, 'subUnits': [{'id': 2554...",,
2,FEATURED,SALE_AGREED,415000.0,1614449109000,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22","44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",2876390,Buy,Semi-D,"{'coordinates': [-6.42237, 53.295096], 'point_...",/for-sale/semi-detached-house-44-ardsolus-brow...,€415k,[For Sale],"{'licenceNumber': '002183', 'phone': '01 414 0...",4.0,"[Property, Residential, House, Semi-Detached H...","{'hasVirtualTour': False, 'totalImages': 14, '...",13559528,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...",3.0,"{'unit': 'METRES_SQUARED', 'value': '132'}",SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,132 m²,https://www.daft.ie//for-sale/semi-detached-ho...,,Viewing Advised,
3,FEATURED,SALE_AGREED,,1605640420000,"Robswall by Hollybrook Homes, Coast Road, Mala...","Robswall by Hollybrook Homes, Coast Road, Mala...",1118178,New Homes,Houses,"{'coordinates': [-6.134997708957428, 53.442447...",/new-home-for-sale/robswall-by-hollybrook-home...,€680k,[For Sale],"{'address': '20-21 Upper Pembroke Street, Dub...",2.0,"[Property, New Homes, Houses]","{'hasVirtualTour': True, 'totalImages': 11, 'i...",941204,{'rating': 'A3'},,,SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,,https://www.daft.ie//new-home-for-sale/robswal...,"{'totalUnitTypes': 1, 'subUnits': [{'id': 1190...",Easy Commute,
4,PREMIUM,SALE_AGREED,195000.0,1614449090000,"54 Thornfield Square, Watery Lane, Clondalkin,...","54 Thornfield Square, Watery Lane, Clondalkin,...",2624144,Buy,Apartment,"{'coordinates': [-6.392492, 53.32423], 'point_...",/for-sale/apartment-54-thornfield-square-water...,€195k,[For Sale],"{'licenceNumber': '002183', 'phone': '01 414 0...",2.0,"[Property, Residential, Apartment]","{'hasVirtualTour': False, 'totalImages': 8, 'i...",13115090,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...",,"{'unit': 'METRES_SQUARED', 'value': '71'}",SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,71 m²,https://www.daft.ie//for-sale/apartment-54-tho...,,,


In [6]:
# only keep columns of interest
final_df = test_df2[['title', 'propertyType', 'category', 'numBedrooms', 'numBathrooms', 'price', 'abbreviatedPrice', 'ber', 'point', 'publishDate', 'seller', 'floorArea']]

In [7]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'coordinates': [-6.193117, 53.388871], 'point...",1613984495000,"{'address': '50 St. Brigid's Road, Artane, D...","{'unit': 'METRES_SQUARED', 'value': '91'}"
1,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow",Houses,New Homes,5.0,,,€745k,{'rating': 'A3'},"{'coordinates': [-6.119600389636076, 53.210655...",1601485470000,"{'phone': '0879370896', 'squareLogo': 'https:/...",
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'coordinates': [-6.42237, 53.295096], 'point_...",1614449109000,"{'licenceNumber': '002183', 'phone': '01 414 0...","{'unit': 'METRES_SQUARED', 'value': '132'}"
3,"Robswall by Hollybrook Homes, Coast Road, Mala...",Houses,New Homes,2.0,,,€680k,{'rating': 'A3'},"{'coordinates': [-6.134997708957428, 53.442447...",1605640420000,"{'address': '20-21 Upper Pembroke Street, Dub...",
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...","{'coordinates': [-6.392492, 53.32423], 'point_...",1614449090000,"{'licenceNumber': '002183', 'phone': '01 414 0...","{'unit': 'METRES_SQUARED', 'value': '71'}"


In [8]:
# logic to fetch the neighbourhood for each row depending on the number of tokens as part of the title split
new = final_df["title"].str.split(",", n = 6, expand = True) 

new[5].fillna(new[4], inplace=True)
new[5].fillna(new[3], inplace=True)
new[5].fillna(new[2], inplace=True)
new[5].fillna(new[1], inplace=True)
new[5].fillna(new[0], inplace=True)

In [9]:
final_df["neighbourhood"]= new[5] 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [10]:
# filter out neighbourhoods with low cardinality
col = 'neighbourhood'
n = 10
final_df = final_df[final_df.groupby(col)[col].transform('count').ge(n)]

vague_n = final_df[final_df['neighbourhood'] == ' Co. Dublin'].index
final_df.drop(vague_n , inplace=True)

In [11]:
final_df.groupby(['neighbourhood']).size()

neighbourhood
 Dublin 1      60
 Dublin 10     28
 Dublin 11    154
 Dublin 12    145
 Dublin 13    105
 Dublin 14    127
 Dublin 15    233
 Dublin 16     75
 Dublin 17     13
 Dublin 18    106
 Dublin 2      31
 Dublin 20     23
 Dublin 22     63
 Dublin 24    130
 Dublin 3     122
 Dublin 4     116
 Dublin 5     128
 Dublin 6     127
 Dublin 6W     36
 Dublin 7     119
 Dublin 8     158
 Dublin 9     141
dtype: int64

In [12]:
len(final_df)

2240

In [13]:
# logic to replace 0 price values with corresponding figures from abbreviatedPrice column
final_df['val'] = final_df['abbreviatedPrice'].str.replace('€','')
final_df['val'] = final_df['val'].str.replace('+','')
final_df['val'] = final_df['val'].str.replace('POA','0')

final_df.val = (final_df.val.replace(r'[kM]+$', '', regex=True).astype(float) * final_df.val.str.extract(r'[\d\.]+([kM]+)', expand=False).fillna(1).replace(['k','M'], [10**3, 10**6]).astype(int))

final_df.price.fillna(final_df.val, inplace=True)

  This is separate from the ipykernel package so we can avoid doing imports until


In [14]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea,neighbourhood,val
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'coordinates': [-6.193117, 53.388871], 'point...",1613984495000,"{'address': '50 St. Brigid's Road, Artane, D...","{'unit': 'METRES_SQUARED', 'value': '91'}",Dublin 5,380000.0
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'coordinates': [-6.42237, 53.295096], 'point_...",1614449109000,"{'licenceNumber': '002183', 'phone': '01 414 0...","{'unit': 'METRES_SQUARED', 'value': '132'}",Dublin 22,415000.0
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...","{'coordinates': [-6.392492, 53.32423], 'point_...",1614449090000,"{'licenceNumber': '002183', 'phone': '01 414 0...","{'unit': 'METRES_SQUARED', 'value': '71'}",Dublin 22,195000.0
9,"150 Broadford Rise, Ballinteer, Dublin 16",Semi-D,Buy,3.0,2.0,495000.0,€495k,"{'epi': '205.19 kWh/m2/yr', 'rating': 'C3', 'c...","{'coordinates': [-6.261108, 53.27783], 'point_...",1614173195000,"{'address': '5 Lower Main Street, Dundrum, D...","{'unit': 'METRES_SQUARED', 'value': '102'}",Dublin 16,495000.0
10,"11 Garthy Wood, Knocklyon, Dublin 16",Apartment,Buy,2.0,2.0,325000.0,€325k,"{'rating': 'C1', 'code': '109986448'}","{'coordinates': [-6.318023, 53.281929], 'point...",1614449150000,"{'licenceNumber': '002183', 'phone': '01 495 1...","{'unit': 'METRES_SQUARED', 'value': '62'}",Dublin 16,325000.0


In [15]:
# remove rows with 0 price
zero_val = final_df[final_df['val'] == 0].index
final_df.drop(zero_val , inplace=True)

final_df.groupby(['val']).size()

val
75000.0       1
95000.0       1
135000.0      3
139000.0      2
140000.0      3
145000.0      1
150000.0      6
159000.0      1
160000.0      6
165000.0      5
169000.0      2
170000.0      8
175000.0      7
179000.0      2
180000.0      9
185000.0      9
189000.0      2
190000.0     16
194000.0      1
195000.0     15
197000.0      1
198000.0      1
199000.0     14
200000.0     24
205000.0      3
209000.0      2
210000.0     17
212000.0      1
215000.0     23
219000.0      1
220000.0     26
223000.0      1
225000.0     48
229000.0      6
230000.0     27
235000.0     37
238000.0      1
239000.0      2
240000.0     24
245000.0     20
249000.0     11
250000.0     69
255000.0     10
259000.0      4
260000.0     36
265000.0     26
267000.0      1
269000.0      6
270000.0     24
275000.0     74
279000.0      4
280000.0     23
285000.0     44
286000.0      1
288000.0      1
289000.0      2
290000.0     22
295000.0     66
298000.0      1
299000.0      7
300000.0     39
305000.0      3
3060

In [16]:
test_df3 = pd.concat([final_df.drop(['seller'], axis=1), final_df['seller'].apply(pd.Series)], axis=1)

test_df3 = pd.concat([test_df3.drop(['floorArea'], axis=1), test_df3['floorArea'].apply(pd.Series)], axis=1)

In [17]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,neighbourhood,val,address,licenceNumber,profileImage,phone,squareLogo,backgroundColour,name,sellerType,alternativePhone,standardLogo,branch,showContactForm,sellerId,phoneWhenToCall,0,unit,value
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...","{'coordinates': [-6.193117, 53.388871], 'point...",1613984495000,Dublin 5,380000.0,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,Avril Ward MIPAV MMCEPI,BRANDED_AGENT,087 333 3823,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,Delaney Estates,True,453,,,METRES_SQUARED,91
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'coordinates': [-6.42237, 53.295096], 'point_...",1614449109000,Dublin 22,415000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,132
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...","{'coordinates': [-6.392492, 53.32423], 'point_...",1614449090000,Dublin 22,195000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,71
9,"150 Broadford Rise, Ballinteer, Dublin 16",Semi-D,Buy,3.0,2.0,495000.0,€495k,"{'epi': '205.19 kWh/m2/yr', 'rating': 'C3', 'c...","{'coordinates': [-6.261108, 53.27783], 'point_...",1614173195000,Dublin 16,495000.0,"5 Lower Main Street,\r\nDundrum,\r\nDublin 14",1756,https://photos.cdn.dsch.ie/MGQ2N2JiYmVkOWY3YmI...,01 2984695,https://photos.cdn.dsch.ie/ODhlM2JhYjgwYjQ2NDJ...,#1d3456,Robert Finnegan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/NTg4MDY3OWYwNjQwY2U...,Vincent Finnegan,True,49,,,METRES_SQUARED,102
10,"11 Garthy Wood, Knocklyon, Dublin 16",Apartment,Buy,2.0,2.0,325000.0,€325k,"{'rating': 'C1', 'code': '109986448'}","{'coordinates': [-6.318023, 53.281929], 'point...",1614449150000,Dublin 16,325000.0,,2183,,01 495 1111,https://photos.cdn.dsch.ie/YWMxNGZkZjBkNzVhNWI...,#003560,Jack Ryan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Templeogue,True,2606,,,METRES_SQUARED,62


In [18]:
test_df3['value'] = test_df3['value'].where(test_df3['unit'] == 'METRES_SQUARED', test_df3['value'].astype(float) * 4046.86)

In [19]:
test_df3.groupby(['value']).size()

value
121.4058               2
161.8744               1
202.34300000000002     2
404.68600000000004     2
445.1546               1
607.029                1
809.3720000000001      1
1011.715               1
1214.058               1
3480.2996000000003     1
3601.7054000000003     1
4046.86                1
6596.3818              1
100                   31
101                   14
102                   27
103                   18
104                   10
105                   26
106                   11
107                   19
108                   12
109                   13
110                   14
111                   11
112                   12
113                    7
114                    5
115                   10
116                   13
117                    6
118                   10
119                    3
120                   19
121                    6
122                    7
123                    6
124                   13
125                   14
126                

In [20]:
# split out the point column into (long, lat) values as 2 new columns
test_df3 = pd.concat([test_df3.drop(['point'], axis=1), test_df3['point'].apply(pd.Series)], axis=1)

test_df3[['longitude','latitude']] = pd.DataFrame(test_df3.coordinates.tolist(), index=test_df3.index)

In [21]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,publishDate,neighbourhood,val,address,licenceNumber,profileImage,phone,squareLogo,backgroundColour,name,sellerType,alternativePhone,standardLogo,branch,showContactForm,sellerId,phoneWhenToCall,0,unit,value,coordinates,point_type,longitude,latitude
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,"{'epi': '379.75 kWh/m2/yr', 'rating': 'E2', 'c...",1613984495000,Dublin 5,380000.0,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,Avril Ward MIPAV MMCEPI,BRANDED_AGENT,087 333 3823,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,Delaney Estates,True,453,,,METRES_SQUARED,91,"[-6.193117, 53.388871]",Point,-6.193117,53.388871
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...",1614449109000,Dublin 22,415000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,132,"[-6.42237, 53.295096]",Point,-6.42237,53.295096
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,"{'epi': '178.87 kWh/m2/yr', 'rating': 'C2', 'c...",1614449090000,Dublin 22,195000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,71,"[-6.392492, 53.32423]",Point,-6.392492,53.32423
9,"150 Broadford Rise, Ballinteer, Dublin 16",Semi-D,Buy,3.0,2.0,495000.0,€495k,"{'epi': '205.19 kWh/m2/yr', 'rating': 'C3', 'c...",1614173195000,Dublin 16,495000.0,"5 Lower Main Street,\r\nDundrum,\r\nDublin 14",1756,https://photos.cdn.dsch.ie/MGQ2N2JiYmVkOWY3YmI...,01 2984695,https://photos.cdn.dsch.ie/ODhlM2JhYjgwYjQ2NDJ...,#1d3456,Robert Finnegan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/NTg4MDY3OWYwNjQwY2U...,Vincent Finnegan,True,49,,,METRES_SQUARED,102,"[-6.261108, 53.27783]",Point,-6.261108,53.27783
10,"11 Garthy Wood, Knocklyon, Dublin 16",Apartment,Buy,2.0,2.0,325000.0,€325k,"{'rating': 'C1', 'code': '109986448'}",1614449150000,Dublin 16,325000.0,,2183,,01 495 1111,https://photos.cdn.dsch.ie/YWMxNGZkZjBkNzVhNWI...,#003560,Jack Ryan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Templeogue,True,2606,,,METRES_SQUARED,62,"[-6.318023, 53.281929]",Point,-6.318023,53.281929


In [22]:
# similarly split the ber column to fetch the rating
test_df3 = pd.concat([test_df3.drop(['ber'], axis=1), test_df3['ber'].apply(pd.Series)], axis=1)

In [23]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,publishDate,neighbourhood,val,address,licenceNumber,profileImage,phone,squareLogo,backgroundColour,name,sellerType,alternativePhone,standardLogo,branch,showContactForm,sellerId,phoneWhenToCall,0,unit,value,coordinates,point_type,longitude,latitude,0.1,code,epi,rating
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Semi-D,Buy,3.0,1.0,380000.0,€380k,1613984495000,Dublin 5,380000.0,"50 St. Brigid's Road,\r\nArtane,\r\nDublin 5",2604,https://photos.cdn.dsch.ie/M2Y2NzBlODY4MGM0ODA...,01 805 8031,https://photos.cdn.dsch.ie/NzcwM2ZiOTgxY2M4NDU...,#00367c,Avril Ward MIPAV MMCEPI,BRANDED_AGENT,087 333 3823,https://photos.cdn.dsch.ie/M2VjZjFmOTI1OGMyNGY...,Delaney Estates,True,453,,,METRES_SQUARED,91,"[-6.193117, 53.388871]",Point,-6.193117,53.388871,,113454078,379.75 kWh/m2/yr,E2
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,1614449109000,Dublin 22,415000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,132,"[-6.42237, 53.295096]",Point,-6.42237,53.295096,,111685541,48.93 kWh/m2/yr,A2
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Apartment,Buy,2.0,,195000.0,€195k,1614449090000,Dublin 22,195000.0,,2183,,01 414 0004,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,#003764,Ronan Healy,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Tallaght,True,10821,,,METRES_SQUARED,71,"[-6.392492, 53.32423]",Point,-6.392492,53.32423,,101436293,178.87 kWh/m2/yr,C2
9,"150 Broadford Rise, Ballinteer, Dublin 16",Semi-D,Buy,3.0,2.0,495000.0,€495k,1614173195000,Dublin 16,495000.0,"5 Lower Main Street,\r\nDundrum,\r\nDublin 14",1756,https://photos.cdn.dsch.ie/MGQ2N2JiYmVkOWY3YmI...,01 2984695,https://photos.cdn.dsch.ie/ODhlM2JhYjgwYjQ2NDJ...,#1d3456,Robert Finnegan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/NTg4MDY3OWYwNjQwY2U...,Vincent Finnegan,True,49,,,METRES_SQUARED,102,"[-6.261108, 53.27783]",Point,-6.261108,53.27783,,106065576,205.19 kWh/m2/yr,C3
10,"11 Garthy Wood, Knocklyon, Dublin 16",Apartment,Buy,2.0,2.0,325000.0,€325k,1614449150000,Dublin 16,325000.0,,2183,,01 495 1111,https://photos.cdn.dsch.ie/YWMxNGZkZjBkNzVhNWI...,#003560,Jack Ryan,BRANDED_AGENT,,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,Sherry FitzGerald Templeogue,True,2606,,,METRES_SQUARED,62,"[-6.318023, 53.281929]",Point,-6.318023,53.281929,,109986448,,C1


In [24]:
# filter out unwanted columns
final_df2 = test_df3[['title', 'neighbourhood', 'propertyType', 'numBedrooms', 'numBathrooms', 'value', 'val', 'rating', 'sellerId', 'longitude', 'latitude', 'publishDate']]

In [25]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1614449109000
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,,71,195000.0,C2,10821,-6.392492,53.32423,1614449090000
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000
10,"11 Garthy Wood, Knocklyon, Dublin 16",Dublin 16,Apartment,2.0,2.0,62,325000.0,C1,2606,-6.318023,53.281929,1614449150000


In [26]:
final_df2[final_df2['numBathrooms'].isnull()].groupby(['propertyType']).size()

propertyType
Apartment    11
Bungalow      2
Detached      5
Houses        2
Semi-D        6
Site         17
Studio        1
Terrace      12
dtype: int64

In [27]:
final_df2.groupby(['rating']).size()

rating
A2         13
A3         26
B1          9
B2         55
B3        132
C1        177
C2        194
C3        211
D1        264
D2        269
E1        193
E2        157
F         148
G         123
SI_666     57
dtype: int64

In [28]:
# handle missing values, replacing with a non-existent constant values
final_df2['numBedrooms'] = final_df2['numBedrooms'].fillna(-1)

final_df2['numBathrooms'] = final_df2['numBathrooms'].fillna(-1)

final_df2['rating'] = final_df2['rating'].fillna('ZZZ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [29]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1614449109000
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1614449090000
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000
10,"11 Garthy Wood, Knocklyon, Dublin 16",Dublin 16,Apartment,2.0,2.0,62,325000.0,C1,2606,-6.318023,53.281929,1614449150000


In [30]:
mapping_df = final_df2[:100]

mapping_df['n_neigh'] = mapping_df.groupby('neighbourhood').ngroup()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [31]:
mapping_df.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate,n_neigh
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000,15
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1614449109000,11
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1614449090000,11
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000,7
10,"11 Garthy Wood, Knocklyon, Dublin 16",Dublin 16,Apartment,2.0,2.0,62,325000.0,C1,2606,-6.318023,53.281929,1614449150000,7


In [32]:
# set color scheme for the neighbourhoods
num_neigh = mapping_df.apply(pd.Series.nunique)['neighbourhood']

x = np.arange(num_neigh)
ys = [i + x + (i*x)**2 for i in range(num_neigh)]

colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [33]:
address = 'Dublin, Ireland'

geolocator = Nominatim(user_agent="dublin_locator")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dublin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dublin are 53.3497645, -6.2602732.


In [34]:
# create map of Dublin using latitude and longitude values
map_dublin = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, title, n_neigh in zip(mapping_df['latitude'], mapping_df['longitude'], mapping_df['title'], mapping_df['n_neigh']):
    label = '{}'.format(title)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color=rainbow[n_neigh],
        fill=True,
        fill_color=rainbow[n_neigh],
        fill_opacity=0.7,
        parse_html=False).add_to(map_dublin)  
    
map_dublin

In [35]:
final_df2 = final_df2.rename({'value': 'floorArea', 'val': 'price'}, axis=1)  # new method

final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000
2,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1614449109000
4,"54 Thornfield Square, Watery Lane, Clondalkin,...",Dublin 22,Apartment,2.0,-1.0,71,195000.0,C2,10821,-6.392492,53.32423,1614449090000
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000
10,"11 Garthy Wood, Knocklyon, Dublin 16",Dublin 16,Apartment,2.0,2.0,62,325000.0,C1,2606,-6.318023,53.281929,1614449150000


In [36]:
# remove rows with NaN for newly added columns
final_df2 = final_df2.dropna()

In [37]:
final_df2['pricePerBedroom'] = final_df2['price'] / final_df2['numBedrooms']

In [38]:
final_df2['avgPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.average)

In [39]:
final_df2['medianPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.median)

In [40]:
final_df2['deltaAvgPrice'] = final_df2['avgPriceNeighbourhood'] - final_df2['price']

In [41]:
final_df2['deltaMedianPrice'] = final_df2['medianPriceNeighbourhood'] - final_df2['price']

In [42]:
dict_north_south = {'Dublin 1':'N', 'Dublin 10':'S', 'Dublin 11':'N', 'Dublin 12':'S', 'Dublin 13':'N', 'Dublin 14':'S', 'Dublin 15':'N', 
                    'Dublin 16':'S', 'Dublin 17':'N', 'Dublin 18':'S', 'Dublin 2':'S', 'Dublin 20':'S', 'Dublin 22':'S', 'Dublin 24':'S', 
                    'Dublin 3':'N', 'Dublin 4':'S', 'Dublin 5':'N', 'Dublin 6':'S', 'Dublin 6W':'S', 'Dublin 7':'N', 'Dublin 8':'S', 'Dublin 9':'N'}

In [43]:
final_df2["neighbourhood"] = final_df2["neighbourhood"].str.strip()
final_df2['dublinNorthSouth']=final_df2['neighbourhood'].map(dict_north_south)

final_df2.groupby('dublinNorthSouth').size()

dublinNorthSouth
N    863
S    946
dtype: int64

In [44]:
def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    All args must be of equal length.    
    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

In [45]:
final_df2['distToCity'] = haversine_np(final_df2['longitude'], final_df2['latitude'], -6.2580, 53.3531)

In [46]:
final_df2.isna().sum()

title                       0
neighbourhood               0
propertyType                0
numBedrooms                 0
numBathrooms                0
floorArea                   0
price                       0
rating                      0
sellerId                    0
longitude                   0
latitude                    0
publishDate                 0
pricePerBedroom             0
avgPriceNeighbourhood       0
medianPriceNeighbourhood    0
deltaAvgPrice               0
deltaMedianPrice            0
dublinNorthSouth            0
distToCity                  0
dtype: int64

In [47]:
final_df2['date'] = pd.to_datetime(final_df2['publishDate'], unit='ms')
final_df2['daysSincePublished'] = pd.to_datetime("now") - final_df2['date']
final_df2['daysSincePublished'] = final_df2['daysSincePublished'].apply(lambda x: x.days)

In [48]:
zero_days = final_df2[final_df2['daysSincePublished'] == 0].index
final_df2.drop(zero_days , inplace=True)

final_df2.groupby(['daysSincePublished']).size()

daysSincePublished
1       4
2       8
3      14
4       6
5      11
6       1
8      12
9      10
10      3
11      8
12     10
13      1
15     14
16      5
17     10
18     10
19      8
22      8
23     13
24     12
25      7
26     12
27      1
28      2
29      6
30      9
31     11
32      5
33     11
35      2
36      9
37      8
38     24
39     22
40      8
41      2
43     11
44     13
45      8
46     10
47     11
48      2
49      1
50      2
51      2
52      8
53     13
54     35
55      1
56      2
58      2
60      3
63      9
65      3
66     13
67     23
68      6
70      1
71     16
72      9
73     11
74     34
75     17
77      4
78     13
79      8
80     14
81     13
82     19
83      1
85     15
86     22
87     29
88     21
89     12
90      1
92      3
93     18
94     24
95     11
96     17
98      1
99      9
100    16
101    12
102    20
103    33
104     6
106    25
107     9
108    19
109    18
110    16
112     4
113    12
114    23
115    25
116    23
1

In [49]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate,pricePerBedroom,avgPriceNeighbourhood,medianPriceNeighbourhood,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,date,daysSincePublished
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,1613984495000,126666.666667,394794.642857,375000.0,14794.642857,-5000.0,N,5.857172,2021-02-22 09:01:35,5
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,1614173195000,165000.0,508187.5,475000.0,13187.5,-20000.0,S,8.366932,2021-02-24 13:26:35,3
11,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000,250000.0,378567.010309,350000.0,128567.010309,100000.0,N,1.387431,2021-02-18 10:46:29,9
14,"26 Manorfields Walk, Castaheany, Clonee, Dubli...",Dublin 15,Terrace,2.0,1.0,70,250000.0,D1,1186,-6.424428,53.397586,1613752599000,125000.0,338767.857143,280000.0,88767.857143,30000.0,N,12.090038,2021-02-19 16:36:39,8
16,"Ayla, 39 Beach Road, Sandymount, Dublin 4",Dublin 4,Terrace,3.0,4.0,147,950000.0,C1,893,-6.215883,53.335437,1613745094000,316666.666667,714607.142857,527500.0,-235392.857143,-422500.0,S,3.414646,2021-02-19 14:31:34,8


In [50]:
final_df3 = final_df2[['title','neighbourhood','propertyType','numBedrooms','numBathrooms','floorArea','price','rating','sellerId','longitude','latitude','pricePerBedroom','deltaAvgPrice','deltaMedianPrice','dublinNorthSouth','distToCity','daysSincePublished']]

In [51]:
final_df3.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,14794.642857,-5000.0,N,5.857172,5
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,13187.5,-20000.0,S,8.366932,3
11,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,128567.010309,100000.0,N,1.387431,9
14,"26 Manorfields Walk, Castaheany, Clonee, Dubli...",Dublin 15,Terrace,2.0,1.0,70,250000.0,D1,1186,-6.424428,53.397586,125000.0,88767.857143,30000.0,N,12.090038,8
16,"Ayla, 39 Beach Road, Sandymount, Dublin 4",Dublin 4,Terrace,3.0,4.0,147,950000.0,C1,893,-6.215883,53.335437,316666.666667,-235392.857143,-422500.0,S,3.414646,8


In [52]:
# below is where we make use of the Foursquare API

In [53]:
CLIENT_ID = 'WV2XS4MH5YRWGHLTCFT4CKR4SRWNHWAF3JHWHNN4MKEQWTL3' # your Foursquare ID
CLIENT_SECRET = 'QEWWIOG0M3BT4V0YPNSKJY521MDUBHBYWBXCJFZ0452KP3OT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [54]:
# 1. Food 4d4b7105d754a06374d81259
# 2. Outdoors & Recreation 4d4b7105d754a06377d81259
# 3. Shop & Service 4d4b7105d754a06378d81259

In [55]:
food_cat_ids = ['4bf58dd8d48988d16d941735','4bf58dd8d48988d128941735','4bf58dd8d48988d1e0931735','4bf58dd8d48988d110941735',
                '4bf58dd8d48988d149941735','4bf58dd8d48988d1fa931735','4bf58dd8d48988d1c4941735','4bf58dd8d48988d145941735',
                '4bf58dd8d48988d11b941735','4bf58dd8d48988d16e941735','4bf58dd8d48988d1c5941735','4bf58dd8d48988d143941735',
                '4bf58dd8d48988d1ce941735','4bf58dd8d48988d10e951735','4bf58dd8d48988d1c9941735','4bf58dd8d48988d1ca941735',
                '4bf58dd8d48988d142941735','4bf58dd8d48988d11e941735','4bf58dd8d48988d16a941735','52e81612bcbc57f1066b79f1',
                '4bf58dd8d48988d155941735','4bf58dd8d48988d1f9941735','4bf58dd8d48988d10f941735','4bf58dd8d48988d1cc941735']

recreation_cat_ids = ['4bf58dd8d48988d163941735','4bf58dd8d48988d1e6941735','4bf58dd8d48988d176941735','4bf58dd8d48988d137941735',
                     '4bf58dd8d48988d164941735','4bf58dd8d48988d1e4931735','4bf58dd8d48988d1e0941735','4bf58dd8d48988d12d951735',
                     '4bf58dd8d48988d1e2941735','4bf58dd8d48988d165941735','56aa371be4b08b9a8d57353e','58daa1558bbb0b01f18ec1fd',
                     '4deefb944765f83613cdba6e','4e74f6cabd41c4836eac4c31','56aa371be4b08b9a8d573562','4bf58dd8d48988d15e941735']

shop_cat_ids = ['52f2ab2ebcbc57f1066b8b46','4bf58dd8d48988d103951735','4bf58dd8d48988d1f6941735','4bf58dd8d48988d1fd941735',
               '4d954b0ea243a5684a65b473','4bf58dd8d48988d114951735','4bf58dd8d48988d112951735','4bf58dd8d48988d118951735',
               '4bf58dd8d48988d1f2941735','5032833091d4c4b30a586d60','52dea92d3cf9994f4e043dbb','4bf58dd8d48988d10f951735',
               '4bf58dd8d48988d1f8941735','4bf58dd8d48988d122951735','4bf58dd8d48988d106951735','4bf58dd8d48988d108951735']

In [56]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)
        num_food = 0
        num_recreation = 0
        num_shop = 0
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']

        for v in results:
            num_food += 1 if (v['venue']['categories'][0]['id'] in food_cat_ids) else 0
            num_recreation += 1 if (v['venue']['categories'][0]['id'] in recreation_cat_ids) else 0
            num_shop += 1 if (v['venue']['categories'][0]['id'] in shop_cat_ids) else 0
#         print(num_food, num_recreation, num_shop)
        
        # return only relevant information for each nearby venue
        venues_list.append([
            name, 
            num_food,
            num_recreation,
            num_shop])
#     print(venues_list)

    nearby_venues = pd.DataFrame([venue_list for venue_list in venues_list])
    nearby_venues.columns = ['title', 
                  'numFood',
                  'numRecreation',
                  'numShop']
    
    return(nearby_venues)

In [62]:
dublin_venues = getNearbyVenues(names=final_df3['title'],
                                   latitudes=final_df3['latitude'],
                                   longitudes=final_df3['longitude'])

In [63]:
dublin_venues.shape

(1283, 4)

In [64]:
dublin_venues.head()

Unnamed: 0,title,numFood,numRecreation,numShop
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",49,17,12
1,"150 Broadford Rise, Ballinteer, Dublin 16",47,9,15
2,"Apartment 172, Block C, Dublin 7",53,12,4
3,"26 Manorfields Walk, Castaheany, Clonee, Dubli...",46,8,25
4,"Ayla, 39 Beach Road, Sandymount, Dublin 4",48,19,4


In [65]:
dublin_venues2 = final_df3.join(dublin_venues.set_index('title'), on='title')

dublin_venues2.shape

(1313, 20)

In [66]:
dublin_venues2 = dublin_venues2.drop_duplicates()

In [67]:
profile = pd_prof.ProfileReport(dublin_venues2) 
profile.to_file("output.html")

Summarize dataset:   0%|          | 0/33 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

  cmap.set_bad(cmap_bad)


Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [2]:
# add meaningful and useful features
# **1. delta from avg price for that neighbourhood = (avg_price_neighbourhood - price)
# **2. delta from median price for that neighbourhood = (median_proce_neighbourhood - price)
# **3. north_south column 1/2 or north/south = dict_north_south {'Dublin 1': 'N', 'Dublin 2', 'S', ...}
# **4. days since ad published = difference between 2 ephocs (publishDate - today)
# **5. distance from city center = difference between lat, long (Haversine formula)
# 6. commute time to city centre by {walking/cycling/train/bus}
# 7. categorical column for price ranges {bins}
# 8. calculated field from 7 => num of properties in that neighbourhood for that price range
# ***9. num_ {Pharmacies, Supermarkets, Restaurants, Cafes, Parks} in 5K radius
# **10. price per bedroom = (price / numBedrooms)

In [68]:
dublin_venues2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,14794.642857,-5000.0,N,5.857172,5,49,17,12
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,13187.5,-20000.0,S,8.366932,3,47,9,15
11,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,128567.010309,100000.0,N,1.387431,9,53,12,4
14,"26 Manorfields Walk, Castaheany, Clonee, Dubli...",Dublin 15,Terrace,2.0,1.0,70,250000.0,D1,1186,-6.424428,53.397586,125000.0,88767.857143,30000.0,N,12.090038,8,46,8,25
16,"Ayla, 39 Beach Road, Sandymount, Dublin 4",Dublin 4,Terrace,3.0,4.0,147,950000.0,C1,893,-6.215883,53.335437,316666.666667,-235392.857143,-422500.0,S,3.414646,8,48,19,4


In [134]:
print('There are {} unique property types.'.format(len(dublin_venues2['propertyType'].unique())))
print('There are {} unique ber ratings.'.format(len(dublin_venues2['rating'].unique())))

There are 10 unique property types.
There are 16 unique ber ratings.


In [137]:
dublin_venues2.shape

(1276, 21)

In [138]:
# one hot encoding
dublin_onehot2 = pd.get_dummies(dublin_venues2[['propertyType']], prefix="", prefix_sep="")

KeyError: "None of [Index(['rating'], dtype='object')] are in the [columns]"

In [143]:
dublin_onehot2.head()

Unnamed: 0,Apartment,Bungalow,Detached,Duplex,End of Terrace,Semi-D,Site,Studio,Terrace,Townhouse
0,0,0,0,0,0,1,0,0,0,0
9,0,0,0,0,0,1,0,0,0,0
11,1,0,0,0,0,0,0,0,0,0
14,0,0,0,0,0,0,0,0,1,0
16,0,0,0,0,0,0,0,0,1,0


In [140]:
dublin_onehot3 = pd.get_dummies(dublin_venues2[['rating']], prefix="", prefix_sep="")

In [142]:
dublin_onehot3.head()

Unnamed: 0,A2,A3,B1,B2,B3,C1,C2,C3,D1,D2,E1,E2,F,G,SI_666,ZZZ
0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
9,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
14,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
16,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [123]:
dublin_venues2["labels"] = DBSCAN(min_samples=10).fit(dublin_venues2[["numBedrooms", "numBathrooms", "floorArea"]].values).labels_
print(dublin_venues2["labels"].unique())

[-1  0  1  2  3  4  5  6  7  8  9 10]


In [124]:
dublin_venues2.groupby('labels').size()

labels
-1     1128
 0       14
 1       13
 2       15
 3       24
 4       14
 5       11
 6       13
 7       13
 8       10
 9       11
 10      10
dtype: int64

In [125]:
dublin_venues2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,labels
0,"1 Moatfield Park, Coolock, Artane, Dublin 5",Dublin 5,Semi-D,3.0,1.0,91,380000.0,E2,453,-6.193117,53.388871,126666.666667,14794.642857,-5000.0,N,5.857172,5,49,17,12,-1
9,"150 Broadford Rise, Ballinteer, Dublin 16",Dublin 16,Semi-D,3.0,2.0,102,495000.0,C3,49,-6.261108,53.27783,165000.0,13187.5,-20000.0,S,8.366932,3,47,9,15,-1
11,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,128567.010309,100000.0,N,1.387431,9,53,12,4,-1
14,"26 Manorfields Walk, Castaheany, Clonee, Dubli...",Dublin 15,Terrace,2.0,1.0,70,250000.0,D1,1186,-6.424428,53.397586,125000.0,88767.857143,30000.0,N,12.090038,8,46,8,25,0
16,"Ayla, 39 Beach Road, Sandymount, Dublin 4",Dublin 4,Terrace,3.0,4.0,147,950000.0,C1,893,-6.215883,53.335437,316666.666667,-235392.857143,-422500.0,S,3.414646,8,48,19,4,-1


In [128]:
noisy_labels = dublin_venues2[dublin_venues2['labels'] == -1].index
mapping_df2 = dublin_venues2.drop(noisy_labels)

mapping_df2.groupby(['labels']).size()

labels
0     14
1     13
2     15
3     24
4     14
5     11
6     13
7     13
8     10
9     11
10    10
dtype: int64

In [129]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
num_clusters = 12
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, title, cluster in zip(mapping_df2['latitude'], mapping_df2['longitude'], mapping_df2['title'], mapping_df2['labels']):
    label = folium.Popup(str(title) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [131]:
dublin_venues2[dublin_venues2.labels == 3]

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished,numFood,numRecreation,numShop,labels
149,"Apartment 127, Block B, Lymewood Mews, Northwo...",Dublin 9,Apartment,2.0,2.0,70,275000.0,C2,3658,-6.256419,53.402848,137500.0,124886.956522,104000.0,N,5.529245,11,49,11,10,3
197,"Apartment 15, Block A, Heywood Court, Northwoo...",Dublin 9,Apartment,2.0,2.0,70,275000.0,C1,10733,-6.256384,53.401532,137500.0,124886.956522,104000.0,N,5.383078,39,53,11,9,3
292,"Apartment 21, Windmill Terrace, Clonsilla, Dub...",Dublin 15,Apartment,2.0,2.0,70,245000.0,C1,1815,-6.39992,53.380132,122500.0,93767.857143,35000.0,N,9.878185,18,43,9,24,3
838,"15 The Academy, Park West, Dublin 12",Dublin 12,Apartment,2.0,2.0,70,220000.0,C2,10949,-6.378825,53.332367,110000.0,99252.525253,75000.0,S,8.340624,43,42,8,27,3
848,"29 The Crescent, Carrickmines Manor, Carrickmi...",Dublin 18,Apartment,2.0,2.0,70,295000.0,D1,7425,-6.177179,53.248267,147500.0,222406.976744,132500.0,S,12.826569,75,49,7,20,3
858,"Apartment 2, Amberwood, Mulhuddart, Dublin 15",Dublin 15,Apartment,2.0,2.0,70,185000.0,D1,12,-6.403482,53.4042,92500.0,153767.857143,95000.0,N,11.191452,67,32,7,20,3
1305,"55 Ellensborough Lodge, Kiltipper Road, Kiltip...",Dublin 24,Terrace,2.0,2.0,70,235000.0,C2,11577,-6.370795,53.268697,117500.0,47610.526316,40000.0,S,12.0023,99,22,5,19,3
1312,"Apartment 4, The Turnpike, Ballymun, Dublin 9",Dublin 9,Apartment,2.0,2.0,70,170000.0,D1,3658,-6.264883,53.399981,85000.0,229886.956522,209000.0,N,5.2296,80,50,12,10,3
1396,"61 Cedar Brook Way, Cherry Orchard, Ballyfermo...",Dublin 10,Apartment,2.0,2.0,70,170000.0,D1,6996,-6.379153,53.336754,85000.0,54545.454545,55000.0,S,8.240145,115,43,10,25,3
1515,"65 Cruise Park Drive, Tyrrelstown, Dublin 15",Dublin 15,End of Terrace,2.0,2.0,70,235000.0,C2,2636,-6.395053,53.418973,117500.0,103767.857143,45000.0,N,11.66594,117,31,7,22,3


In [None]:
["longitude", "latitude", "distToCity"]
["numFood", "numRecreation", "numShop"]