# Capstone Project - The Battle of Neighborhoods!

Install and import required packages

In [130]:
# install the Google Trends API
# !pip install pytrends

# install the Daft Listings API
!pip install daftlistings

# install the Daft Scraper API
!pip install daft-scraper==1.3.0

# install geopandas, geopy
!pip install geopandas
!pip install geopy

# install folium
!pip install folium

# install matplotlib
!pip install matplotlib

# install pandas profiling
!pip install pandas-profiling==2.7.1



In [131]:
# python packages
import pprint
import requests
import geopandas
import pyproj as pp
import numpy as np
import pandas as pd
import datetime
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Google Trends API packages
from pytrends.request import TrendReq

# Daft listings API packages
from daftlistings import Daft, RentType, SortOrder, SortType, MapVisualization, SaleType
from joblib import Parallel, delayed
import time

# Daft Scraper API packages
from daft_scraper.search import DaftSearch, SearchType
from daft_scraper.search.options import (PropertyType, PropertyTypesOption, AdState, AdStateOption)
from daft_scraper.search.options_location import Location, LocationsOption

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Pandas Profiling
import pandas_profiling as pd_prof

## 1. Introduction
This section outlines a general background for the Business Problem that I'll be trying to solve as part of the capstone project.

The primary focus for this project would be on the city Dublin and its 22 different District areas.  

This project tries to achieve the following analyses for the respective target audience in mind:  
1) **House Renting**: Finding an apartment to rent in Dublin city is very challenging given the housing crisis. The target audience in this case is people looking for rental apartments in the city. The attempt here is to filter out properties based on user preferences for apartment characteristics, neighborhood choices, pricing and crime rate in the neighborhood in which the property is situated.  
2) **Neighborhood Clustering**: The approach here is to use visualization techniques to cluster districts within Dublin city using clustering techniques based on the venues and venue categories present in different districts. We can get a sense of how different districts are oriented within the city in terms of different places, amenities, transport routes and most importantly whether distance from the city centre plays a role in driving this.  
3) **Google Trends**: This data would act as one of the features where we try to do regerssion analysis for predicting the rent price for each apartment. The hypothesis would be that google trends for a search for an apartment to rent in a particular neighborhood would affect the pricing for the rentals. The analysis performed in the subsequent report would test this hypothesis.  
4) **Crimes**: This data would act as additional filtering for users looking to rent an apartment as well as drive the clustering of the districts as planned in point 2 above. It would be intersting to use visualizatin techniques again to find out if crimes are related to the geograhphical attributes of a particular neighborhood.    

Overall the aim is to aid people looking for rentals in Dublin city and help them filter out neighborhoods and properties based on their preferences as well as other local factors driving their decision making.  
Apart from that, the visualiztion techniques used for analysing different datasets would help certain stakeholders make decisions in terms of government planning, business marketing decisions as well as general readers looking for some insights of their own city! 

## 2. Data
This section defines the different data sources as well as their sample examples that have been used for this assignment.

### 2) Daft Listings API
As seen below, this is a very useful API (https://github.com/AnthonyBloomer/daftlistings/) yet simple to use and get upto speed.  
The sample example below shows a search using the API to get all listings in "Dublin city for rental 3-bed apartments with a max price of 2800EUR and furnished".  
We fetch all such listings and build a dataframe containing all the useful features for each property which as seen below would consist of <price', 'facilities', 'formalised_address', 'num_bedrooms', 'num_bathrooms', 'latitude', 'longitude'>  
This data would help us recommend properties to the targeted end-user as well as the geographical  coordinates would help us visually analyse the data in question.  

### 4) Foursquare Places API
Finally, the last part involves a similar approach taken during the previous weeks in this course where we had analysed different neighborhoods in Toronto, Canada.  
The challenge here is to obtain different districts comprising within Dublin City and obtain their respectice geographical coordinates using Nominatim geolocator.  
The sample code given below shows how we plan to construct the final dataframe where each row would be an individual venue along-with the attributes of each of the venues including their geolcation coordinates.  
OneHotEncoding can be used to get a feature representing distribution of different types of venues as well as the most popular and dominating venue type in each of the districts within Dublin city.  

In [85]:
print('There are {} uniques categories.'.format(len(dublin_venues['Venue Category'].unique())))

There are 192 uniques categories.


In [86]:
# one hot encoding
dublin_onehot = pd.get_dummies(dublin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dublin_onehot['Neighborhood'] = dublin_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(dublin_onehot)
cols.insert(0, cols.pop(cols.index('Neighborhood')))
dublin_onehot = dublin_onehot.loc[:, cols]

dublin_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Dublin 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [87]:
dublin_onehot.shape

(1646, 193)

In [89]:
dublin_grouped = dublin_onehot.groupby('Neighborhood').mean().reset_index()
dublin_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Betting Shop,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Café,Canal,Canal Lock,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Convention Center,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hockey Field,Home Service,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Light Rail Station,Lighthouse,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Port,Portuguese Restaurant,Pub,Recreation Center,Rental Car Location,Restaurant,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Ballinteer,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.013158,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.026316,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158,0.052632,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.105263,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Blackrock,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.028571,0.014286,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.014286,0.0,0.0,0.028571,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0
2,Clondalkin,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Dublin 1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.1,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
4,Dublin 10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.137931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Dublin 11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.096774,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Dublin 12,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.051282,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.128205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dublin 13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dublin 14,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.06,0.0,0.06,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
9,Dublin 15,0.012987,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.051948,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.025974,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.012987,0.038961,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.012987,0.012987,0.012987,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [90]:
dublin_grouped.shape

(25, 193)

In [91]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [95]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dublin_grouped['Neighborhood']

for ind in np.arange(dublin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dublin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ballinteer,Supermarket,Café,Pub,Coffee Shop,Clothing Store,Department Store,Gym,Furniture / Home Store,Italian Restaurant,Park
1,Blackrock,Pub,Café,Train Station,Park,Coffee Shop,Shopping Mall,Supermarket,Bar,Thai Restaurant,Italian Restaurant
2,Clondalkin,Hotel,Bar,Convenience Store,Coffee Shop,Restaurant,Supermarket,Chinese Restaurant,Light Rail Station,Gym,Golf Course
3,Dublin 1,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
4,Dublin 10,Supermarket,Pub,Park,Gym,Hotel,Coffee Shop,Fast Food Restaurant,Hardware Store,Bowling Alley,Café
5,Dublin 11,Supermarket,Park,Convenience Store,Pub,Grocery Store,Sandwich Place,Tram Station,Sporting Goods Shop,Breakfast Spot,Chinese Restaurant
6,Dublin 12,Supermarket,Park,Convenience Store,Fast Food Restaurant,Tram Station,Coffee Shop,Grocery Store,Hardware Store,Shopping Mall,Motorcycle Shop
7,Dublin 13,Seafood Restaurant,Pub,Café,Fish Market,Ice Cream Shop,Harbor / Marina,Golf Course,Coffee Shop,Bar,Breakfast Spot
8,Dublin 14,Supermarket,Pub,Coffee Shop,Clothing Store,Café,Department Store,Pizza Place,Restaurant,Discount Store,Pharmacy
9,Dublin 15,Coffee Shop,Supermarket,Clothing Store,Italian Restaurant,Train Station,Fast Food Restaurant,Furniture / Home Store,Pub,Asian Restaurant,Sporting Goods Shop


In [101]:
# set number of clusters
kclusters = 3

dublin_grouped_clustering = dublin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dublin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 0, 2, 2, 2, 0, 1, 1, 2, 1, 0, 1, 1, 0, 2, 0, 2, 0, 1, 2,
       0, 1, 2], dtype=int32)

In [100]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dublin_merged = dublin_df

# merge dublin_grouped with dublin_merged to add latitude/longitude for each neighborhood
dublin_merged = dublin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
dublin_merged['Cluster Labels'].fillna(3.0, inplace=True)
dublin_merged['Cluster Labels'] = dublin_merged['Cluster Labels'].astype(int)

dublin_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dublin 1,53.352488,-6.256646,0,Coffee Shop,Café,Pub,Park,Italian Restaurant,Hotel,Bookstore,Theater,Bar,Plaza
1,Dublin 2,53.33894,-6.252713,0,Café,Coffee Shop,Park,Hotel,Plaza,Pub,Restaurant,Cocktail Bar,Grocery Store,Bakery
2,Dublin 3,53.361223,-6.185467,1,Café,Pub,Boat or Ferry,Beach,Park,Scenic Lookout,Convenience Store,Train Station,Restaurant,Port
3,Dublin 4,53.327507,-6.227486,0,Pub,Café,Restaurant,Coffee Shop,Hotel,Park,Grocery Store,Gastropub,Pizza Place,Plaza
4,Dublin 5,53.383454,-6.181923,2,Supermarket,Grocery Store,Train Station,Convenience Store,Fast Food Restaurant,Pub,Shopping Mall,Café,Pizza Place,Bus Stop


In [131]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.2)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dublin_merged['Latitude'], dublin_merged['Longitude'], dublin_merged['Neighborhood'], dublin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

In [2]:
# exploring the daft scraper API for the latest version of Daft.ie

In [10]:
# call to the API for fetching all SALE AGREED properties in Dublin
options = [
    PropertyTypesOption([PropertyType.ALL]),
    LocationsOption([Location.DUBLIN_COUNTY]),
    AdStateOption(AdState.AGREED)
]

api = DaftSearch(SearchType.SALE)
listings = api.search(options)

In [11]:
print(len(listings))

cnt_price = 0
cnt_abr_price = 0

for listing in listings:
    if hasattr(listing, 'price'):
        cnt_price += 1
    if hasattr(listing, 'abbreviatedPrice'):
        cnt_abr_price += 1

print(cnt_price, cnt_abr_price)

3060
2980 3060


In [12]:
test_df2 = pd.DataFrame([vars(f) for f in listings])

In [13]:
test_df2.head()

Unnamed: 0,_id,point,propertyType,ber,publishDate,title,sections,featuredLevel,media,seoTitle,saleType,state,numBedrooms,category,daftShortcode,abbreviatedPrice,seller,seoFriendlyPath,newHome,label,pageBranding,url,numBathrooms,price,sticker,floorArea,priceHistory
0,2554301,"{'point_type': 'Point', 'coordinates': [-6.119...",Houses,{'rating': 'A3'},1601485470000,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow","[Property, New Homes, Houses]",FEATURED,{'images': [{'size720x480': 'https://photos.cd...,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow",[For Sale],SALE_AGREED,5.0,New Homes,9190876,€745k,{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,/new-home-for-sale/lynton-old-connaught-avenue...,"{'totalUnitTypes': 1, 'subUnits': [{'id': 2554...",SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,https://www.daft.ie//new-home-for-sale/lynton-...,,,,,
1,2876390,"{'point_type': 'Point', 'coordinates': [-6.422...",Semi-D,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...",1613905487000,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22","[Property, Residential, House, Semi-Detached H...",FEATURED,"{'images': [{'caption': 'Picture No. 01', 'siz...","44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",[For Sale],SALE_AGREED,4.0,Buy,13559528,€415k,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,/for-sale/semi-detached-house-44-ardsolus-brow...,,SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,https://www.daft.ie//for-sale/semi-detached-ho...,3.0,415000.0,Viewing Advised,"{'unit': 'METRES_SQUARED', 'value': '132'}",
2,1118178,"{'point_type': 'Point', 'coordinates': [-6.134...",Houses,{'rating': 'A3'},1605640420000,"Robswall by Hollybrook Homes, Coast Road, Mala...","[Property, New Homes, Houses]",FEATURED,{'images': [{'size720x480': 'https://photos.cd...,"Robswall by Hollybrook Homes, Coast Road, Mala...",[For Sale],SALE_AGREED,2.0,New Homes,941204,€680k,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,/new-home-for-sale/robswall-by-hollybrook-home...,"{'totalUnitTypes': 1, 'subUnits': [{'id': 1190...",SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,https://www.daft.ie//new-home-for-sale/robswal...,,,Easy Commute,,
3,2910071,"{'point_type': 'Point', 'coordinates': [-6.179...",Detached,"{'epi': '57.83 kWh/m2/yr', 'rating': 'A3', 'co...",1613131818000,"20 Clairville Lodge, Streamstown Lane, Malahid...","[Property, Residential, House, Detached House]",PREMIUM,{'images': [{'size720x480': 'https://photos.cd...,"20 Clairville Lodge, Streamstown Lane, Malahid...",[For Sale],SALE_AGREED,4.0,Buy,13765576,€1.1M,{'standardLogo': 'https://photos.cdn.dsch.ie/N...,/for-sale/detached-house-20-clairville-lodge-s...,,SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/N...,https://www.daft.ie//for-sale/detached-house-2...,4.0,1050000.0,,"{'unit': 'METRES_SQUARED', 'value': '195'}",
4,2558687,"{'point_type': 'Point', 'coordinates': [-6.426...",Semi-D,"{'epi': '231.4 kWh/m2/yr', 'rating': 'D1', 'co...",1613901818000,"7 Woodville Grove, Lucan, Co. Dublin","[Property, Residential, House, Semi-Detached H...",PREMIUM,{'images': [{'size720x480': 'https://photos.cd...,"7 Woodville Grove, Lucan, Co. Dublin",[For Sale],SALE_AGREED,4.0,Buy,12686245,€410k,{'standardLogo': 'https://photos.cdn.dsch.ie/O...,/for-sale/semi-detached-house-7-woodville-grov...,,SALE AGREED,{'standardLogo': 'https://photos.cdn.dsch.ie/O...,https://www.daft.ie//for-sale/semi-detached-ho...,3.0,410000.0,,"{'unit': 'METRES_SQUARED', 'value': '113'}",


In [14]:
# only keep columns of interest
final_df = test_df2[['title', 'propertyType', 'category', 'numBedrooms', 'numBathrooms', 'price', 'abbreviatedPrice', 'ber', 'point', 'publishDate', 'seller', 'floorArea']]

In [15]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea
0,"Lynton, Old Connaught Avenue, Bray, Co. Wicklow",Houses,New Homes,5.0,,,€745k,{'rating': 'A3'},"{'point_type': 'Point', 'coordinates': [-6.119...",1601485470000,{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'point_type': 'Point', 'coordinates': [-6.422...",1613905487000,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,"{'unit': 'METRES_SQUARED', 'value': '132'}"
2,"Robswall by Hollybrook Homes, Coast Road, Mala...",Houses,New Homes,2.0,,,€680k,{'rating': 'A3'},"{'point_type': 'Point', 'coordinates': [-6.134...",1605640420000,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,
3,"20 Clairville Lodge, Streamstown Lane, Malahid...",Detached,Buy,4.0,4.0,1050000.0,€1.1M,"{'epi': '57.83 kWh/m2/yr', 'rating': 'A3', 'co...","{'point_type': 'Point', 'coordinates': [-6.179...",1613131818000,{'standardLogo': 'https://photos.cdn.dsch.ie/N...,"{'unit': 'METRES_SQUARED', 'value': '195'}"
4,"7 Woodville Grove, Lucan, Co. Dublin",Semi-D,Buy,4.0,3.0,410000.0,€410k,"{'epi': '231.4 kWh/m2/yr', 'rating': 'D1', 'co...","{'point_type': 'Point', 'coordinates': [-6.426...",1613901818000,{'standardLogo': 'https://photos.cdn.dsch.ie/O...,"{'unit': 'METRES_SQUARED', 'value': '113'}"


In [16]:
# logic to fetch the neighbourhood for each row depending on the number of tokens as part of the title split
new = final_df["title"].str.split(",", n = 6, expand = True) 

new[5].fillna(new[4], inplace=True)
new[5].fillna(new[3], inplace=True)
new[5].fillna(new[2], inplace=True)
new[5].fillna(new[1], inplace=True)
new[5].fillna(new[0], inplace=True)

In [17]:
final_df["neighbourhood"]= new[5] 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [18]:
# filter out neighbourhoods with low cardinality
col = 'neighbourhood'
n = 10
final_df = final_df[final_df.groupby(col)[col].transform('count').ge(n)]

vague_n = final_df[final_df['neighbourhood'] == ' Co. Dublin'].index
final_df.drop(vague_n , inplace=True)

In [19]:
final_df.groupby(['neighbourhood']).size()

neighbourhood
 Dublin 1      62
 Dublin 10     32
 Dublin 11    161
 Dublin 12    152
 Dublin 13    106
 Dublin 14    131
 Dublin 15    235
 Dublin 16     76
 Dublin 17     11
 Dublin 18    115
 Dublin 2      34
 Dublin 20     22
 Dublin 22     68
 Dublin 24    130
 Dublin 3     127
 Dublin 4     115
 Dublin 5     128
 Dublin 6     138
 Dublin 6W     35
 Dublin 7     119
 Dublin 8     150
 Dublin 9     151
dtype: int64

In [20]:
len(final_df)

2298

In [21]:
# logic to replace 0 price values with corresponding figures from abbreviatedPrice column
final_df['val'] = final_df['abbreviatedPrice'].str.replace('€','')
final_df['val'] = final_df['val'].str.replace('+','')
final_df['val'] = final_df['val'].str.replace('POA','0')

final_df.val = (final_df.val.replace(r'[kM]+$', '', regex=True).astype(float) * final_df.val.str.extract(r'[\d\.]+([kM]+)', expand=False).fillna(1).replace(['k','M'], [10**3, 10**6]).astype(int))

final_df.price.fillna(final_df.val, inplace=True)

  This is separate from the ipykernel package so we can avoid doing imports until


In [23]:
final_df.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,seller,floorArea,neighbourhood,val
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'point_type': 'Point', 'coordinates': [-6.422...",1613905487000,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,"{'unit': 'METRES_SQUARED', 'value': '132'}",Dublin 22,415000.0
6,"Apartment 172, Block C, Dublin 7",Apartment,Buy,1.0,1.0,249500.0,€250k,{'rating': 'C1'},"{'point_type': 'Point', 'coordinates': [-6.277...",1613645189000,{'standardLogo': 'https://photos.cdn.dsch.ie/N...,"{'unit': 'METRES_SQUARED', 'value': '51'}",Dublin 7,250000.0
9,"17 Sandyford Hall Crescent, Sandyford, Dublin 18",Semi-D,Buy,3.0,2.0,0.0,POA,"{'epi': '170.14 kWh/m2/yr', 'rating': 'C1', 'c...","{'point_type': 'Point', 'coordinates': [-6.215...",1613727937000,{'standardLogo': 'https://photos.cdn.dsch.ie/Z...,"{'unit': 'METRES_SQUARED', 'value': '110'}",Dublin 18,0.0
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",End of Terrace,Buy,2.0,3.0,395000.0,€395k,"{'epi': '200.68 kWh/m2/yr', 'rating': 'C3', 'c...","{'point_type': 'Point', 'coordinates': [-6.262...",1613905453000,{'standardLogo': 'https://photos.cdn.dsch.ie/Y...,"{'unit': 'METRES_SQUARED', 'value': '74'}",Dublin 9,395000.0
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Apartment,Buy,1.0,1.0,210000.0,€210k,"{'epi': '172.98 kWh/m2/yr', 'rating': 'C1', 'c...","{'point_type': 'Point', 'coordinates': [-6.310...",1612859975000,{'standardLogo': 'https://photos.cdn.dsch.ie/M...,"{'unit': 'METRES_SQUARED', 'value': '45'}",Dublin 15,210000.0


In [25]:
# remove rows with 0 price
zero_val = final_df[final_df['val'] == 0].index
final_df.drop(zero_val , inplace=True)

final_df.groupby(['val']).size()

val
75000.0       1
95000.0       1
135000.0      3
139000.0      2
140000.0      3
145000.0      1
150000.0      7
159000.0      1
160000.0      7
165000.0      5
169000.0      2
170000.0      9
175000.0      8
179000.0      2
180000.0      9
185000.0     10
189000.0      2
190000.0     18
194000.0      1
195000.0     19
197000.0      1
198000.0      1
199000.0     13
200000.0     26
205000.0      3
209000.0      2
210000.0     18
212000.0      1
215000.0     24
219000.0      1
220000.0     27
223000.0      1
225000.0     52
229000.0      8
230000.0     26
235000.0     38
238000.0      1
239000.0      3
240000.0     25
245000.0     19
249000.0     11
250000.0     72
255000.0     11
259000.0      4
260000.0     34
265000.0     28
267000.0      1
269000.0      6
270000.0     24
275000.0     74
279000.0      4
280000.0     25
285000.0     46
286000.0      1
289000.0      2
290000.0     23
295000.0     64
299000.0      8
300000.0     38
305000.0      3
306000.0      1
309000.0      1
3100

In [33]:
test_df3 = pd.concat([final_df.drop(['seller'], axis=1), final_df['seller'].apply(pd.Series)], axis=1)

test_df3 = pd.concat([test_df3.drop(['floorArea'], axis=1), test_df3['floorArea'].apply(pd.Series)], axis=1)

In [40]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,point,publishDate,neighbourhood,val,standardLogo,squareLogo,phone,licenceNumber,name,sellerType,backgroundColour,branch,sellerId,showContactForm,address,alternativePhone,profileImage,phoneWhenToCall,0,unit,value
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...","{'point_type': 'Point', 'coordinates': [-6.422...",1613905487000,Dublin 22,415000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 414 0004,2183,Ronan Healy,BRANDED_AGENT,#003764,Sherry FitzGerald Tallaght,10821,True,,,,,,METRES_SQUARED,132
6,"Apartment 172, Block C, Dublin 7",Apartment,Buy,1.0,1.0,249500.0,€250k,{'rating': 'C1'},"{'point_type': 'Point', 'coordinates': [-6.277...",1613645189000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/NTBjYWI5M2UwNjI1NWU...,https://photos.cdn.dsch.ie/NzNmNWI3MWYyNWMxNGR...,086 1976034,3048,Ken Lundy,BRANDED_AGENT,#fd5967,EARNEST,1331,True,"48-59 North King Street,\r\nSmithfield, \r\nDu...",01 8728808,,,,METRES_SQUARED,51
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",End of Terrace,Buy,2.0,3.0,395000.0,€395k,"{'epi': '200.68 kWh/m2/yr', 'rating': 'C3', 'c...","{'point_type': 'Point', 'coordinates': [-6.262...",1613905453000,Dublin 9,395000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 837 3737,2183,Elizabeth Ryan,BRANDED_AGENT,#0f3a5d,Sherry FitzGerald Drumcondra,2655,True,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,,https://photos.cdn.dsch.ie/ZjFlNzhjMDFiZGQwNGE...,,,METRES_SQUARED,74
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Apartment,Buy,1.0,1.0,210000.0,€210k,"{'epi': '172.98 kWh/m2/yr', 'rating': 'C1', 'c...","{'point_type': 'Point', 'coordinates': [-6.310...",1612859975000,Dublin 15,210000.0,https://photos.cdn.dsch.ie/MzFjMWE2NTI5ODZmYzg...,https://photos.cdn.dsch.ie/YWMwZjU1MmM4MGJkMGV...,01 829 9150,2943,Karen Carberry,BRANDED_AGENT,#192b6b,SATIS PROPERTY,4274,True,Unit 3 Phoenix Park Way\r\nPhoenix Park Raceco...,,https://photos.cdn.dsch.ie/MzcwYTRmMTBiM2RhYzg...,"Monday - Friday 8:30 am - 5:00pm, Closed for L...",,METRES_SQUARED,45
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Apartment,Buy,2.0,1.0,250000.0,€250k,"{'epi': '378.44 kWh/m2/yr', 'rating': 'E2', 'c...","{'point_type': 'Point', 'coordinates': [-6.282...",1613119918000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/YjUxZjg1YzFkYzRiN2Y...,https://photos.cdn.dsch.ie/ZTUwOGQ3NGU0NjM3OTN...,01 6712277,1642,Sales,BRANDED_AGENT,#475863,REA Fitzgerald Chambers,948,True,"9 Manor St,\r\nStoneybatter,\r\nDublin 7",,,Office Hours,,METRES_SQUARED,46


In [45]:
test_df3['value'] = test_df3['value'].where(test_df3['unit'] == 'METRES_SQUARED', test_df3['value'].astype(float) * 4046.86)

In [47]:
test_df3.groupby(['value']).size()

value
121.4058               2
161.8744               1
202.34300000000002     2
404.68600000000004     2
445.1546               1
607.029                1
809.3720000000001      1
1011.715               1
1214.058               1
3480.2996000000003     1
3601.7054000000003     1
4046.86                1
6596.3818              1
100                   30
101                   16
102                   26
103                   20
104                   10
105                   24
106                   11
107                   20
108                   13
109                   13
110                   13
111                   14
112                   13
113                    7
114                    4
115                   12
116                   13
117                    7
118                   10
119                    4
120                   21
121                    6
122                    6
123                    7
124                   14
125                   14
126                

In [48]:
# split out the point column into (long, lat) values as 2 new columns
test_df3 = pd.concat([test_df3.drop(['point'], axis=1), test_df3['point'].apply(pd.Series)], axis=1)

test_df3[['longitude','latitude']] = pd.DataFrame(test_df3.coordinates.tolist(), index=test_df3.index)

In [49]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,ber,publishDate,neighbourhood,val,standardLogo,squareLogo,phone,licenceNumber,name,sellerType,backgroundColour,branch,sellerId,showContactForm,address,alternativePhone,profileImage,phoneWhenToCall,0,unit,value,point_type,coordinates,longitude,latitude
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,"{'epi': '48.93 kWh/m2/yr', 'rating': 'A2', 'co...",1613905487000,Dublin 22,415000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 414 0004,2183,Ronan Healy,BRANDED_AGENT,#003764,Sherry FitzGerald Tallaght,10821,True,,,,,,METRES_SQUARED,132,Point,"[-6.42237, 53.295096]",-6.42237,53.295096
6,"Apartment 172, Block C, Dublin 7",Apartment,Buy,1.0,1.0,249500.0,€250k,{'rating': 'C1'},1613645189000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/NTBjYWI5M2UwNjI1NWU...,https://photos.cdn.dsch.ie/NzNmNWI3MWYyNWMxNGR...,086 1976034,3048,Ken Lundy,BRANDED_AGENT,#fd5967,EARNEST,1331,True,"48-59 North King Street,\r\nSmithfield, \r\nDu...",01 8728808,,,,METRES_SQUARED,51,Point,"[-6.277584, 53.348715]",-6.277584,53.348715
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",End of Terrace,Buy,2.0,3.0,395000.0,€395k,"{'epi': '200.68 kWh/m2/yr', 'rating': 'C3', 'c...",1613905453000,Dublin 9,395000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 837 3737,2183,Elizabeth Ryan,BRANDED_AGENT,#0f3a5d,Sherry FitzGerald Drumcondra,2655,True,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,,https://photos.cdn.dsch.ie/ZjFlNzhjMDFiZGQwNGE...,,,METRES_SQUARED,74,Point,"[-6.262658, 53.362818]",-6.262658,53.362818
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Apartment,Buy,1.0,1.0,210000.0,€210k,"{'epi': '172.98 kWh/m2/yr', 'rating': 'C1', 'c...",1612859975000,Dublin 15,210000.0,https://photos.cdn.dsch.ie/MzFjMWE2NTI5ODZmYzg...,https://photos.cdn.dsch.ie/YWMwZjU1MmM4MGJkMGV...,01 829 9150,2943,Karen Carberry,BRANDED_AGENT,#192b6b,SATIS PROPERTY,4274,True,Unit 3 Phoenix Park Way\r\nPhoenix Park Raceco...,,https://photos.cdn.dsch.ie/MzcwYTRmMTBiM2RhYzg...,"Monday - Friday 8:30 am - 5:00pm, Closed for L...",,METRES_SQUARED,45,Point,"[-6.310559, 53.376695]",-6.310559,53.376695
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Apartment,Buy,2.0,1.0,250000.0,€250k,"{'epi': '378.44 kWh/m2/yr', 'rating': 'E2', 'c...",1613119918000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/YjUxZjg1YzFkYzRiN2Y...,https://photos.cdn.dsch.ie/ZTUwOGQ3NGU0NjM3OTN...,01 6712277,1642,Sales,BRANDED_AGENT,#475863,REA Fitzgerald Chambers,948,True,"9 Manor St,\r\nStoneybatter,\r\nDublin 7",,,Office Hours,,METRES_SQUARED,46,Point,"[-6.282134, 53.349442]",-6.282134,53.349442


In [50]:
# similarly split the ber column to fetch the rating
test_df3 = pd.concat([test_df3.drop(['ber'], axis=1), test_df3['ber'].apply(pd.Series)], axis=1)

In [51]:
test_df3.head()

Unnamed: 0,title,propertyType,category,numBedrooms,numBathrooms,price,abbreviatedPrice,publishDate,neighbourhood,val,standardLogo,squareLogo,phone,licenceNumber,name,sellerType,backgroundColour,branch,sellerId,showContactForm,address,alternativePhone,profileImage,phoneWhenToCall,0,unit,value,point_type,coordinates,longitude,latitude,0.1,code,epi,rating
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Semi-D,Buy,4.0,3.0,415000.0,€415k,1613905487000,Dublin 22,415000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 414 0004,2183,Ronan Healy,BRANDED_AGENT,#003764,Sherry FitzGerald Tallaght,10821,True,,,,,,METRES_SQUARED,132,Point,"[-6.42237, 53.295096]",-6.42237,53.295096,,111685541.0,48.93 kWh/m2/yr,A2
6,"Apartment 172, Block C, Dublin 7",Apartment,Buy,1.0,1.0,249500.0,€250k,1613645189000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/NTBjYWI5M2UwNjI1NWU...,https://photos.cdn.dsch.ie/NzNmNWI3MWYyNWMxNGR...,086 1976034,3048,Ken Lundy,BRANDED_AGENT,#fd5967,EARNEST,1331,True,"48-59 North King Street,\r\nSmithfield, \r\nDu...",01 8728808,,,,METRES_SQUARED,51,Point,"[-6.277584, 53.348715]",-6.277584,53.348715,,,,C1
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",End of Terrace,Buy,2.0,3.0,395000.0,€395k,1613905453000,Dublin 9,395000.0,https://photos.cdn.dsch.ie/YjEyMzk4NzQ4MTM2ZmN...,https://photos.cdn.dsch.ie/MTIyMGQyYjA3NmE5MTZ...,01 837 3737,2183,Elizabeth Ryan,BRANDED_AGENT,#0f3a5d,Sherry FitzGerald Drumcondra,2655,True,12 Upper Drumcondra Road\r\nDrumcondra\r\nDubl...,,https://photos.cdn.dsch.ie/ZjFlNzhjMDFiZGQwNGE...,,,METRES_SQUARED,74,Point,"[-6.262658, 53.362818]",-6.262658,53.362818,,108714726.0,200.68 kWh/m2/yr,C3
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Apartment,Buy,1.0,1.0,210000.0,€210k,1612859975000,Dublin 15,210000.0,https://photos.cdn.dsch.ie/MzFjMWE2NTI5ODZmYzg...,https://photos.cdn.dsch.ie/YWMwZjU1MmM4MGJkMGV...,01 829 9150,2943,Karen Carberry,BRANDED_AGENT,#192b6b,SATIS PROPERTY,4274,True,Unit 3 Phoenix Park Way\r\nPhoenix Park Raceco...,,https://photos.cdn.dsch.ie/MzcwYTRmMTBiM2RhYzg...,"Monday - Friday 8:30 am - 5:00pm, Closed for L...",,METRES_SQUARED,45,Point,"[-6.310559, 53.376695]",-6.310559,53.376695,,100872464.0,172.98 kWh/m2/yr,C1
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Apartment,Buy,2.0,1.0,250000.0,€250k,1613119918000,Dublin 7,250000.0,https://photos.cdn.dsch.ie/YjUxZjg1YzFkYzRiN2Y...,https://photos.cdn.dsch.ie/ZTUwOGQ3NGU0NjM3OTN...,01 6712277,1642,Sales,BRANDED_AGENT,#475863,REA Fitzgerald Chambers,948,True,"9 Manor St,\r\nStoneybatter,\r\nDublin 7",,,Office Hours,,METRES_SQUARED,46,Point,"[-6.282134, 53.349442]",-6.282134,53.349442,,100337302.0,378.44 kWh/m2/yr,E2


In [52]:
# filter out unwanted columns
final_df2 = test_df3[['title', 'neighbourhood', 'propertyType', 'numBedrooms', 'numBathrooms', 'value', 'val', 'rating', 'sellerId', 'longitude', 'latitude', 'publishDate']]

In [53]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1613905487000
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",Dublin 9,End of Terrace,2.0,3.0,74,395000.0,C3,2655,-6.262658,53.362818,1613905453000
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,1612859975000
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,1613119918000


In [54]:
final_df2[final_df2['numBathrooms'].isnull()].groupby(['propertyType']).size()

propertyType
Apartment          9
Bungalow           2
Detached           5
End of Terrace     1
Houses             1
Semi-D             5
Site              18
Studio             1
Terrace           13
dtype: int64

In [55]:
final_df2.groupby(['rating']).size()

rating
A2         11
A3         27
B1          8
B2         59
B3        139
C1        178
C2        206
C3        206
D1        269
D2        277
E1        201
E2        155
F         155
G         127
SI_666     58
dtype: int64

In [56]:
# handle missing values, replacing with a non-existent constant values
final_df2['numBedrooms'] = final_df2['numBedrooms'].fillna(-1)

final_df2['numBathrooms'] = final_df2['numBathrooms'].fillna(-1)

final_df2['rating'] = final_df2['rating'].fillna('ZZZ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [57]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1613905487000
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",Dublin 9,End of Terrace,2.0,3.0,74,395000.0,C3,2655,-6.262658,53.362818,1613905453000
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,1612859975000
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,1613119918000


In [58]:
mapping_df = final_df2[:100]

mapping_df['n_neigh'] = mapping_df.groupby('neighbourhood').ngroup()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [59]:
mapping_df.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,value,val,rating,sellerId,longitude,latitude,publishDate,n_neigh
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1613905487000,11
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000,18
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",Dublin 9,End of Terrace,2.0,3.0,74,395000.0,C3,2655,-6.262658,53.362818,1613905453000,20
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,1612859975000,6
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,1613119918000,18


In [60]:
# set color scheme for the neighbourhoods
num_neigh = mapping_df.apply(pd.Series.nunique)['neighbourhood']

x = np.arange(num_neigh)
ys = [i + x + (i*x)**2 for i in range(num_neigh)]

colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [61]:
address = 'Dublin, Ireland'

geolocator = Nominatim(user_agent="dublin_locator")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dublin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dublin are 53.3497645, -6.2602732.


In [62]:
# create map of Dublin using latitude and longitude values
map_dublin = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, title, n_neigh in zip(mapping_df['latitude'], mapping_df['longitude'], mapping_df['title'], mapping_df['n_neigh']):
    label = '{}'.format(title)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color=rainbow[n_neigh],
        fill=True,
        fill_color=rainbow[n_neigh],
        fill_opacity=0.7,
        parse_html=False).add_to(map_dublin)  
    
map_dublin

In [64]:
final_df2 = final_df2.rename({'value': 'floorArea', 'val': 'price'}, axis=1)  # new method

final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate
1,"44 Ardsolus, Brownsbarn, Kingswood, Dublin 22",Dublin 22,Semi-D,4.0,3.0,132,415000.0,A2,10821,-6.42237,53.295096,1613905487000
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000
10,"1a St Brigid's Road Lower, Drumcondra, Dublin 9",Dublin 9,End of Terrace,2.0,3.0,74,395000.0,C3,2655,-6.262658,53.362818,1613905453000
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,1612859975000
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,1613119918000


In [136]:
# remove rows with NaN for newly added columns
final_df2 = final_df2.dropna()

In [65]:
final_df2['pricePerBedroom'] = final_df2['price'] / final_df2['numBedrooms']

In [68]:
final_df2['avgPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.average)

In [70]:
final_df2['medianPriceNeighbourhood'] = final_df2.groupby('neighbourhood')['price'].transform(np.median)

In [78]:
final_df2['deltaAvgPrice'] = final_df2['avgPriceNeighbourhood'] - final_df2['price']

In [80]:
final_df2['deltaMedianPrice'] = final_df2['medianPriceNeighbourhood'] - final_df2['price']

In [120]:
dict_north_south = {'Dublin 1':'N', 'Dublin 10':'S', 'Dublin 11':'N', 'Dublin 12':'S', 'Dublin 13':'N', 'Dublin 14':'S', 'Dublin 15':'N', 
                    'Dublin 16':'S', 'Dublin 17':'N', 'Dublin 18':'S', 'Dublin 2':'S', 'Dublin 20':'S', 'Dublin 22':'S', 'Dublin 24':'S', 
                    'Dublin 3':'N', 'Dublin 4':'S', 'Dublin 5':'N', 'Dublin 6':'S', 'Dublin 6W':'S', 'Dublin 7':'N', 'Dublin 8':'S', 'Dublin 9':'N'}

In [123]:
final_df2["neighbourhood"] = final_df2["neighbourhood"].str.strip()
final_df2['dublinNorthSouth']=final_df2['neighbourhood'].map(dict_north_south)

final_df2.groupby('dublinNorthSouth').size()

dublinNorthSouth
N    1067
S    1166
dtype: int64

In [125]:
def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    All args must be of equal length.    
    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

In [126]:
final_df2['distToCity'] = haversine_np(final_df2['longitude'], final_df2['latitude'], -6.2580, 53.3531)

In [137]:
final_df2.isna().sum()

title                       0
neighbourhood               0
propertyType                0
numBedrooms                 0
numBathrooms                0
floorArea                   0
price                       0
rating                      0
sellerId                    0
longitude                   0
latitude                    0
publishDate                 0
pricePerBedroom             0
avgPriceNeighbourhood       0
medianPriceNeighbourhood    0
deltaAvgPrice               0
deltaMedianPrice            0
dublinNorthSouth            0
distToCity                  0
dtype: int64

In [157]:
final_df2['date'] = pd.to_datetime(final_df2['publishDate'], unit='ms')
final_df2['daysSincePublished'] = pd.to_datetime("now") - final_df2['date']
final_df2['daysSincePublished'] = final_df2['daysSincePublished'].apply(lambda x: x.days)

In [162]:
zero_days = final_df2[final_df2['daysSincePublished'] == 0].index
final_df2.drop(zero_days , inplace=True)

final_df2.groupby(['daysSincePublished']).size()

daysSincePublished
2       6
3      10
4       3
5       9
6       9
7       1
9      18
10      3
11     10
12     10
13      8
16      5
17     14
18     10
19      7
20     12
21      1
22      2
23      6
24      9
25     11
26      5
27     11
29      2
30      9
31      9
32     24
33     21
34      8
35      2
37     11
38     12
39      8
40      9
41     14
42      2
43      1
44      2
45      1
46      9
47     13
48     35
49      1
50      2
52      2
54      3
57      9
59      3
60     14
61     22
62      6
64      1
65     16
66      9
67     11
68     35
69     18
71      4
72     13
73      8
74     15
75     14
76     19
77      1
79     17
80     27
81     25
82     22
83     12
84      1
86      3
87     20
88     24
89     11
90     19
92      1
93     10
94     16
95     13
96     21
97     36
98      3
100    25
101     9
102    19
103    18
104    16
106     4
107    12
108    23
109    25
110    23
111    19
113     2
114    14
115    13
116    22
117    12
1

In [163]:
final_df2.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,publishDate,pricePerBedroom,avgPriceNeighbourhood,medianPriceNeighbourhood,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,date,daysSincePublished
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,1613645189000,250000.0,387061.403509,350000.0,137061.403509,100000.0,N,1.387431,2021-02-18 10:46:29,3
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,1612859975000,210000.0,337167.400881,285000.0,127167.400881,75000.0,N,4.361361,2021-02-09 08:39:35,12
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,1613119918000,125000.0,387061.403509,350000.0,137061.403509,100000.0,N,1.651646,2021-02-12 08:51:58,9
13,"Apartment 127, Block B, Lymewood Mews, Northwo...",Dublin 9,Apartment,2.0,2.0,70,275000.0,C2,3658,-6.256419,53.402848,1613470411000,137500.0,389529.801325,375000.0,114529.801325,100000.0,N,5.529245,2021-02-16 10:13:31,5
15,"46 Oxmantown Road, Stoneybatter, Dublin 7",Dublin 7,Terrace,2.0,-1.0,56,280000.0,F,1087,-6.291654,53.353738,1613152397000,140000.0,387061.403509,350000.0,107061.403509,70000.0,N,2.23333,2021-02-12 17:53:17,9


In [164]:
final_df3 = final_df2[['title','neighbourhood','propertyType','numBedrooms','numBathrooms','floorArea','price','rating','sellerId','longitude','latitude','pricePerBedroom','deltaAvgPrice','deltaMedianPrice','dublinNorthSouth','distToCity','daysSincePublished']]

In [166]:
final_df3.head()

Unnamed: 0,title,neighbourhood,propertyType,numBedrooms,numBathrooms,floorArea,price,rating,sellerId,longitude,latitude,pricePerBedroom,deltaAvgPrice,deltaMedianPrice,dublinNorthSouth,distToCity,daysSincePublished
6,"Apartment 172, Block C, Dublin 7",Dublin 7,Apartment,1.0,1.0,51,250000.0,C1,1331,-6.277584,53.348715,250000.0,137061.403509,100000.0,N,1.387431,3
11,"Apartment 48, Beacon, Ashtown, Dublin 15",Dublin 15,Apartment,1.0,1.0,45,210000.0,C1,4274,-6.310559,53.376695,210000.0,127167.400881,75000.0,N,4.361361,12
12,"Apartment 15, Blackhall Court, Stoneybatter, D...",Dublin 7,Apartment,2.0,1.0,46,250000.0,E2,948,-6.282134,53.349442,125000.0,137061.403509,100000.0,N,1.651646,9
13,"Apartment 127, Block B, Lymewood Mews, Northwo...",Dublin 9,Apartment,2.0,2.0,70,275000.0,C2,3658,-6.256419,53.402848,137500.0,114529.801325,100000.0,N,5.529245,5
15,"46 Oxmantown Road, Stoneybatter, Dublin 7",Dublin 7,Terrace,2.0,-1.0,56,280000.0,F,1087,-6.291654,53.353738,140000.0,107061.403509,70000.0,N,2.23333,9


In [167]:
# below is where we make use of the Foursquare API

In [168]:
CLIENT_ID = 'WV2XS4MH5YRWGHLTCFT4CKR4SRWNHWAF3JHWHNN4MKEQWTL3' # your Foursquare ID
CLIENT_SECRET = 'QEWWIOG0M3BT4V0YPNSKJY521MDUBHBYWBXCJFZ0452KP3OT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [169]:
# 1. Food 4d4b7105d754a06374d81259
# 2. Outdoors & Recreation 4d4b7105d754a06377d81259
# 3. Shop & Service 4d4b7105d754a06378d81259

In [170]:
food_cat_ids = ['4bf58dd8d48988d16d941735','4bf58dd8d48988d128941735','4bf58dd8d48988d1e0931735','4bf58dd8d48988d110941735',
                '4bf58dd8d48988d149941735','4bf58dd8d48988d1fa931735','4bf58dd8d48988d1c4941735','4bf58dd8d48988d145941735',
                '4bf58dd8d48988d11b941735','4bf58dd8d48988d16e941735','4bf58dd8d48988d1c5941735','4bf58dd8d48988d143941735',
                '4bf58dd8d48988d1ce941735','4bf58dd8d48988d10e951735','4bf58dd8d48988d1c9941735','4bf58dd8d48988d1ca941735',
                '4bf58dd8d48988d142941735','4bf58dd8d48988d11e941735','4bf58dd8d48988d16a941735','52e81612bcbc57f1066b79f1',
                '4bf58dd8d48988d155941735','4bf58dd8d48988d1f9941735','4bf58dd8d48988d10f941735','4bf58dd8d48988d1cc941735']

recreation_cat_ids = ['4bf58dd8d48988d163941735','4bf58dd8d48988d1e6941735','4bf58dd8d48988d176941735','4bf58dd8d48988d137941735',
                     '4bf58dd8d48988d164941735','4bf58dd8d48988d1e4931735','4bf58dd8d48988d1e0941735','4bf58dd8d48988d12d951735',
                     '4bf58dd8d48988d1e2941735','4bf58dd8d48988d165941735','56aa371be4b08b9a8d57353e','58daa1558bbb0b01f18ec1fd',
                     '4deefb944765f83613cdba6e','4e74f6cabd41c4836eac4c31','56aa371be4b08b9a8d573562','4bf58dd8d48988d15e941735']

shop_cat_ids = ['52f2ab2ebcbc57f1066b8b46','4bf58dd8d48988d103951735','4bf58dd8d48988d1f6941735','4bf58dd8d48988d1fd941735',
               '4d954b0ea243a5684a65b473','4bf58dd8d48988d114951735','4bf58dd8d48988d112951735','4bf58dd8d48988d118951735',
               '4bf58dd8d48988d1f2941735','5032833091d4c4b30a586d60','52dea92d3cf9994f4e043dbb','4bf58dd8d48988d10f951735',
               '4bf58dd8d48988d1f8941735','4bf58dd8d48988d122951735','4bf58dd8d48988d106951735','4bf58dd8d48988d108951735']

In [171]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)
        num_food = 0
        num_recreation = 0
        num_shop = 0
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']

        for v in results:
            num_food += 1 if (v['venue']['categories'][0]['id'] in food_cat_ids) else 0
            num_recreation += 1 if (v['venue']['categories'][0]['id'] in recreation_cat_ids) else 0
            num_shop += 1 if (v['venue']['categories'][0]['id'] in shop_cat_ids) else 0
#         print(num_food, num_recreation, num_shop)
        
        # return only relevant information for each nearby venue
        venues_list.append([
            name, 
            num_food,
            num_recreation,
            num_shop])
#     print(venues_list)

    nearby_venues = pd.DataFrame([venue_list for venue_list in venues_list])
    nearby_venues.columns = ['title', 
                  'numFood',
                  'numRecreation',
                  'numShop']
    
    return(nearby_venues)

In [175]:
dublin_venues = getNearbyVenues(names=final_df3['title'],
                                   latitudes=final_df3['latitude'],
                                   longitudes=final_df3['longitude'])

In [176]:
dublin_venues.shape

(1308, 4)

In [177]:
dublin_venues.head()

Unnamed: 0,title,numFood,numRecreation,numShop
0,"Apartment 172, Block C, Dublin 7",53,13,4
1,"Apartment 48, Beacon, Ashtown, Dublin 15",48,18,3
2,"Apartment 15, Blackhall Court, Stoneybatter, D...",51,13,4
3,"Apartment 127, Block B, Lymewood Mews, Northwo...",49,11,9
4,"46 Oxmantown Road, Stoneybatter, Dublin 7",52,12,4


In [178]:
dublin_venues2 = final_df3.join(dublin_venues.set_index('title'), on='title')

dublin_venues2.shape

(1344, 20)

In [179]:
dublin_venues2 = dublin_venues2.drop_duplicates()

In [180]:
profile = pd_prof.ProfileReport(dublin_venues2) 
profile.to_file("output.html")

Summarize dataset:   0%|          | 0/33 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

  cmap.set_bad(cmap_bad)


Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [2]:
# add meaningful and useful features
# **1. delta from avg price for that neighbourhood = (avg_price_neighbourhood - price)
# **2. delta from median price for that neighbourhood = (median_proce_neighbourhood - price)
# **3. north_south column 1/2 or north/south = dict_north_south {'Dublin 1': 'N', 'Dublin 2', 'S', ...}
# **4. days since ad published = difference between 2 ephocs (publishDate - today)
# **5. distance from city center = difference between lat, long (Haversine formula)
# 6. commute time to city centre by {walking/cycling/train/bus}
# 7. categorical column for price ranges {bins}
# 8. calculated field from 7 => num of properties in that neighbourhood for that price range
# ***9. num_ {Pharmacies, Supermarkets, Restaurants, Cafes, Parks} in 5K radius
# **10. price per bedroom = (price / numBedrooms)