# Classification analysis of airbnb listings data

The API for the airbnb listings open dataset for this classification analysis is available here: https://public.opendatasoft.com/explore/dataset/airbnb-reviews/api/. The goal of the notebook is the supervised machine learning task of classifying listing features according to their review scores ratings. We will use Amazon SageMaker hosting and software to this end and so we begin with the necessary imports...

In [6]:
import sagemaker
from sagemaker import get_execution_role
import json
import boto3
import pandas as pd

sess = sagemaker.Session()

role = get_execution_role()
print(role)

bucket = sess.default_bucket() 
print(bucket)

arn:aws:iam::232666250507:role/service-role/AmazonSageMaker-ExecutionRole-20200611T080886
sagemaker-eu-west-2-232666250507


Let's now download and unzip the listings open dataset from http://insideairbnb.com and inspect it...

In [10]:
!wget http://data.insideairbnb.com/united-kingdom/england/london/2020-04-14/data/listings.csv.gz
!gunzip listings.csv.gz

--2020-06-11 11:05:31--  http://data.insideairbnb.com/united-kingdom/england/london/2020-04-14/data/listings.csv.gz
Resolving data.insideairbnb.com (data.insideairbnb.com)... 52.216.228.74
Connecting to data.insideairbnb.com (data.insideairbnb.com)|52.216.228.74|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78560292 (75M) [application/x-gzip]
Saving to: ‘listings.csv.gz’


2020-06-11 11:05:38 (11.2 MB/s) - ‘listings.csv.gz’ saved [78560292/78560292]



In [7]:
listings_dataf = pd.read_csv('listings.csv') 
listings_dataf.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,13913,https://www.airbnb.com/rooms/13913,20200414180850,2020-04-16,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,"Hello Everyone, I'm offering my lovely double ...",My bright double bedroom with a large window h...,business,Finsbury Park is a friendly melting pot commun...,...,f,f,moderate,f,f,2,1,1,0,0.18
1,15400,https://www.airbnb.com/rooms/15400,20200414180850,2020-04-16,Bright Chelsea Apartment. Chelsea!,Lots of windows and light. St Luke's Gardens ...,Bright Chelsea Apartment This is a bright one...,Lots of windows and light. St Luke's Gardens ...,romantic,It is Chelsea.,...,t,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
2,17402,https://www.airbnb.com/rooms/17402,20200414180850,2020-04-15,Superb 3-Bed/2 Bath & Wifi: Trendy W1,You'll have a wonderful stay in this superb mo...,"This is a wonderful very popular beautiful, sp...",You'll have a wonderful stay in this superb mo...,none,"Location, location, location! You won't find b...",...,t,f,strict_14_with_grace_period,f,f,15,15,0,0,0.38
3,17506,https://www.airbnb.com/rooms/17506,20200414180850,2020-04-16,Boutique Chelsea/Fulham Double bed 5-star ensuite,Enjoy a chic stay in this elegant but fully mo...,Enjoy a boutique London townhouse bed and brea...,Enjoy a chic stay in this elegant but fully mo...,business,Fulham is 'villagey' and residential – a real ...,...,f,f,strict_14_with_grace_period,f,f,2,0,2,0,
4,25023,https://www.airbnb.com/rooms/25023,20200414180850,2020-04-15,All-comforts 2-bed flat near Wimbledon tennis,"Large, all comforts, 2-bed flat; first floor; ...",10 mins walk to Southfields tube and Wimbledon...,"Large, all comforts, 2-bed flat; first floor; ...",none,This is a leafy residential area with excellen...,...,t,f,moderate,f,f,1,1,0,0,0.7


In [11]:
print(listings_dataf.columns[64],listings_dataf.columns[65],listings_dataf.columns[94],listings_dataf.columns[95])

('cleaning_fee', 'guests_included', 'license', 'jurisdiction_names')


Probably won't use these offending columns so just drop them...

In [12]:
listings_dataf = listings_dataf.drop(['cleaning_fee', 'guests_included', 'license', 'jurisdiction_names'],axis=1)