# 1. Introduction

In this notebook I will analyze the [Boston Airbnb dataset](https://www.kaggle.com/datasets/airbnb/boston) from Kaggle. It will follow the Cross Industry Standard Process for Data Mining (CRISP-DM).

## 1.1 Business Understanding

Airbnb offers a unique platform for homeowners to lease their homes or apartments for short-term lodging, making it a popular choice among travelers due to its convenience and range of options. 

This analysis delves into the Airbnb Seattle dataset, which encompasses a wide array of listings and their defining characteristics, such as property size, available amenities, neighborhood descriptions, and guest reviews.

**Analysis Questions:**

Q1. From a traveler's perspective, does a "superhost" enhance the guest experience?

Q2. What features have the most influence on the success and profitability of an Airbnb listing from an investor's standpoint?

Q3. How significantly do customer reviews influence the booking frequency of a listing?

# 2. Exploratory Data Analysis

## 2.1 Data Understanding

In [82]:
# Import packages
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore')

pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 100)

In [83]:
# Import data
df_listings = pd.read_csv("../data/listings.csv")
df_reviews = pd.read_csv("../data/reviews.csv")

for data in [df_listings,df_reviews]:
    display(data.head(3))
    print(data.shape)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,12147973,https://www.airbnb.com/rooms/12147973,20160906204935,2016-09-07,Sunny Bungalow in the City,"Cozy, sunny, family home. Master bedroom high...",The house has an open and cozy feel at the sam...,"Cozy, sunny, family home. Master bedroom high...",none,"Roslindale is quiet, convenient and friendly. ...",,"The bus stop is 2 blocks away, and frequent. B...","You will have access to 2 bedrooms, a living r...",,Clean up and treat the home the way you'd like...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,31303940,https://www.airbnb.com/users/show/31303940,Virginia,2015-04-15,"Boston, Massachusetts, United States",We are country and city connecting in our deck...,,,,f,https://a2.muscache.com/im/pictures/5936fef0-b...,https://a2.muscache.com/im/pictures/5936fef0-b...,Roslindale,1,1,"['email', 'phone', 'facebook', 'reviews']",t,f,"Birch Street, Boston, MA 02131, United States",Roslindale,Roslindale,,Boston,MA,2131,Boston,"Boston, MA",US,United States,42.282619,-71.133068,t,House,Entire home/apt,4,1.5,2.0,3.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",,$250.00,,,,$35.00,1,$0.00,2,1125,2 weeks ago,,0,0,0,0,2016-09-06,0,,,,,,,,,,f,,,f,moderate,f,f,1,
1,3075044,https://www.airbnb.com/rooms/3075044,20160906204935,2016-09-07,Charming room in pet friendly apt,Charming and quiet room in a second floor 1910...,Small but cozy and quite room with a full size...,Charming and quiet room in a second floor 1910...,none,"The room is in Roslindale, a diverse and prima...","If you don't have a US cell phone, you can tex...",Plenty of safe street parking. Bus stops a few...,Apt has one more bedroom (which I use) and lar...,"If I am at home, I am likely working in my hom...",Pet friendly but please confirm with me if the...,https://a1.muscache.com/im/pictures/39327812/d...,https://a1.muscache.com/im/pictures/39327812/d...,https://a1.muscache.com/im/pictures/39327812/d...,https://a1.muscache.com/im/pictures/39327812/d...,2572247,https://www.airbnb.com/users/show/2572247,Andrea,2012-06-07,"Boston, Massachusetts, United States",I live in Boston and I like to travel and have...,within an hour,100%,100%,f,https://a2.muscache.com/im/users/2572247/profi...,https://a2.muscache.com/im/users/2572247/profi...,Roslindale,1,1,"['email', 'phone', 'facebook', 'linkedin', 'am...",t,t,"Pinehurst Street, Boston, MA 02131, United States",Roslindale,Roslindale,,Boston,MA,2131,Boston,"Boston, MA",US,United States,42.286241,-71.134374,t,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",,$65.00,$400.00,,$95.00,$10.00,0,$0.00,2,15,a week ago,,26,54,84,359,2016-09-06,36,2014-06-01,2016-08-13,94.0,10.0,9.0,10.0,10.0,9.0,9.0,f,,,t,moderate,f,f,1,1.3
2,6976,https://www.airbnb.com/rooms/6976,20160906204935,2016-09-07,Mexican Folk Art Haven in Boston,"Come stay with a friendly, middle-aged guy in ...","Come stay with a friendly, middle-aged guy in ...","Come stay with a friendly, middle-aged guy in ...",none,The LOCATION: Roslindale is a safe and diverse...,I am in a scenic part of Boston with a couple ...,"PUBLIC TRANSPORTATION: From the house, quick p...","I am living in the apartment during your stay,...","ABOUT ME: I'm a laid-back, friendly, unmarried...","I encourage you to use my kitchen, cooking and...",https://a2.muscache.com/im/pictures/6ae8335d-9...,https://a2.muscache.com/im/pictures/6ae8335d-9...,https://a2.muscache.com/im/pictures/6ae8335d-9...,https://a2.muscache.com/im/pictures/6ae8335d-9...,16701,https://www.airbnb.com/users/show/16701,Phil,2009-05-11,"Boston, Massachusetts, United States","I am a middle-aged, single male with a wide ra...",within a few hours,100%,88%,t,https://a2.muscache.com/im/users/16701/profile...,https://a2.muscache.com/im/users/16701/profile...,Roslindale,1,1,"['email', 'phone', 'reviews', 'jumio']",t,t,"Ardale St., Boston, MA 02131, United States",Roslindale,Roslindale,,Boston,MA,2131,Boston,"Boston, MA",US,United States,42.292438,-71.135765,t,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",""Wireless Internet"",""Air Condit...",,$65.00,$395.00,"$1,350.00",,,1,$20.00,3,45,5 days ago,,19,46,61,319,2016-09-06,41,2009-07-19,2016-08-05,98.0,10.0,9.0,10.0,10.0,9.0,10.0,f,,,f,moderate,t,f,1,0.47


(3585, 95)


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...


(68275, 6)


After going through the [data dictionary](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sharing) provided by [Insider Airbnb](http://insideairbnb.com/data-assumptions) , I got a clearer picture of the dataset and picked out the key features I'll need for answering our main questions.

- The calendar dataset shows prices and availability for listings for the next year. It's more about what hosts plan to do in the future, so I'm skipping this data for my analysis.

- In the listings dataset, there's a lot of info about what each lisitng offers. The column **"_host_issuperhost"** will be useful in answering the first question (Q1).

- The reviews dataset provides customer comments and the dates they were left. I can model the sentiment of these reviews to help answer the third question (Q3). There are also 7 **"review_scores"** metrics within the listings dataset that will be helpful for the third question (Q3)

- For the second question (Q2) about how popular or full places are, I would like to use price and occupancy to compare success by total revenue, but occupancy isn't tracked. Thankfully, Inside Airbnb has already looked into the issue of modeling occupancy and suggest using "a Review Rate of 50%" to approximate bookings from the number of reviews. So, I'm going to use the **"_reviews_per_month"** column to estimate of how many times a place gets booked to gauge a listing's success.







## 2.2 Data Preparation

### 2.2.2 Data Wrangling

In [84]:
clean_reviews = df_reviews.copy()
clean_listings = df_listings.copy()

clean_reviews.isna().sum() # some people left an empty review

# Get creation date if dataframe for calculating days from date
listing_last_scraped = pd.Timestamp(clean_listings.last_scraped[0])

# Drop columns with all NaN
print('dropping empty columns')
clean_listings.dropna(axis=1, how="all", inplace=True)

# Drop columns with only one unique value (no variation)
static_col = [c for c in clean_listings.columns if clean_listings[c].nunique()==1]
print('dropping static columns: {}'.format(static_col))
clean_listings.drop(static_col, axis=1, inplace=True)

# Check for columns with many NaN
(clean_listings.isna().mean()).sort_values()

# Drop sq.ft.
print('dropping square footage')
clean_listings.drop('square_feet',axis=1,inplace=True)


dropping empty columns
dropping static columns: ['scrape_id', 'last_scraped', 'experiences_offered', 'state', 'country_code', 'country', 'calendar_last_scraped', 'requires_license']
dropping square footage


In [85]:
clean_listings.sample(2)

Unnamed: 0,id,listing_url,name,summary,space,description,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,zipcode,market,smart_location,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
1054,9430774,https://www.airbnb.com/rooms/9430774,Modern Industrial Styled Loft,"Loft has great style, industrial with a chic, ...","This loft is a two-story loft. The bedroom, li...","Loft has great style, industrial with a chic, ...","The loft is located in the South End, which is...",,The MBTA silver line 5 and 4 are right around ...,You have access to the entire apartment--but t...,"Anna, my roommate, will be able to help throug...",We kindly ask that you clean up after yourselv...,https://a2.muscache.com/im/pictures/c0cdc13e-d...,https://a2.muscache.com/im/pictures/c0cdc13e-d...,https://a2.muscache.com/im/pictures/c0cdc13e-d...,https://a2.muscache.com/im/pictures/c0cdc13e-d...,46153497,https://www.airbnb.com/users/show/46153497,Rania,2015-10-09,"Boston, Massachusetts, United States","I am a recent graduate of Boston University, a...",,,,f,https://a2.muscache.com/im/pictures/b1eaa7d5-4...,https://a2.muscache.com/im/pictures/b1eaa7d5-4...,South End,1,1,"['email', 'phone', 'reviews']",t,f,"East Berkeley Street, Boston, MA 02118, United...",South End,South End,Boston,2118,Boston,"Boston, MA",42.343462,-71.064195,t,Loft,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{""Cable TV"",""Wireless Internet"",""Air Condition...",$100.00,,,,,1,$0.00,1,1125,10 months ago,0,0,0,0,1,2015-12-01,2015-12-01,100.0,10.0,10.0,10.0,10.0,10.0,10.0,f,flexible,f,f,1,0.11
593,9183638,https://www.airbnb.com/rooms/9183638,Spacious Downtown Apt + Courtyard,*NEW LISTING for the Entire Apt* (see profile...,The apartment has two levels and occupies the ...,*NEW LISTING for the Entire Apt* (see profile...,Neighborhoods: The apartment is uniquely locat...,Laundry: There is a laundromat located less th...,Walking: The apartment is walking distance fro...,You will have exclusive access to the entire a...,I will be reachable by phone/email.,-Please don't wear shoes inside the apartment ...,https://a2.muscache.com/im/pictures/630c902c-f...,https://a2.muscache.com/im/pictures/630c902c-f...,https://a2.muscache.com/im/pictures/630c902c-f...,https://a2.muscache.com/im/pictures/630c902c-f...,14007443,https://www.airbnb.com/users/show/14007443,Sarah,2014-04-07,"Boston, Massachusetts, United States",I'm a laid-back nerdy lady who came to Boston ...,within an hour,100%,97%,t,https://a2.muscache.com/im/pictures/d79fa5de-b...,https://a2.muscache.com/im/pictures/d79fa5de-b...,Chinatown,2,2,"['email', 'phone', 'facebook', 'reviews', 'jum...",t,t,"Hudson Street, Boston, MA 02111, United States",Chinatown,Chinatown,Boston,2111,Boston,"Boston, MA",42.347483,-71.060796,t,Apartment,Entire home/apt,4,2.0,2.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",$179.00,,,$300.00,$20.00,2,$20.00,1,1125,2 weeks ago,0,0,0,0,3,2016-02-15,2016-05-27,100.0,10.0,10.0,10.0,10.0,9.0,10.0,f,strict,f,f,2,0.44


In [86]:
clean_listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3585 entries, 0 to 3584
Data columns (total 82 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   id                                3585 non-null   int64  
 1   listing_url                       3585 non-null   object 
 2   name                              3585 non-null   object 
 3   summary                           3442 non-null   object 
 4   space                             2528 non-null   object 
 5   description                       3585 non-null   object 
 6   neighborhood_overview             2170 non-null   object 
 7   notes                             1610 non-null   object 
 8   transit                           2295 non-null   object 
 9   access                            2096 non-null   object 
 10  interaction                       2031 non-null   object 
 11  house_rules                       2393 non-null   object 
 12  thumbn

In [87]:
### Select features relevant to questions
features_host = ['host_is_superhost','host_about','host_response_time','host_response_rate', 'host_listings_count',
                    'host_verifications','host_has_profile_pic','host_identity_verified','host_since',
                    'calculated_host_listings_count']

features_property = ['id','name','summary','space','description','neighborhood_overview','notes','transit',
                        'access','interaction','house_rules', 'street','neighbourhood','zipcode','latitude',
                        'longitude','is_location_exact','property_type','room_type','accommodates','bathrooms',
                        'bedrooms','beds','bed_type','amenities','price','weekly_price','security_deposit',
                        'cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights']

features_traveler = ['number_of_reviews','last_review','first_review','review_scores_rating',
                        'review_scores_accuracy','review_scores_cleanliness','review_scores_checkin',
                        'review_scores_communication','review_scores_location','review_scores_value',
                        'instant_bookable','cancellation_policy','require_guest_profile_picture',
                        'require_guest_phone_verification','reviews_per_month']

features = features_host + features_property + features_traveler
clean_listings = clean_listings[features]

**Data Wrangling Checklist**
- Convert to numeric
    - host_response_time
    - host_response_rate
    - host_acceptance_rate
    - price
    - weekly_price
    - monthly_price
    - security_deposit
    - cleaning_fee
    - extra_people
- Convert to bool
    - host_is_superhost
    - host_has_profile_pic
    - host_identity_verified
    - is_location_exact
    - instant_bookable
    - require_guest_profile_picture
    - require_guest_phone_verification
- Convert to catagory
    - property_type
    - room_type
    - bed_type
    - cancellation_policy
- Get days from date
    - host_since
    - last_review
    - first_review
- candidates for encoding
    - host_verifications
    - amenities

In [89]:
# Test cleaned dataframes
assert clean_reviews.duplicated().sum() == 0
assert clean_reviews.isna().sum().sum() == 0
assert clean_listings.duplicated().sum() == 0
assert clean_listings.isna().sum().sum() == 0

AssertionError: 