# 1. Introduction

In this notebook I will analyze the [Boston Airbnb dataset](https://www.kaggle.com/datasets/airbnb/boston) from Kaggle. It will follow the Cross Industry Standard Process for Data Mining (CRISP-DM).

## 1.1 Business Understanding

Airbnb offers a unique platform for homeowners to lease their homes or apartments for short-term lodging, making it a popular choice among travelers due to its convenience and range of options. 

This analysis delves into the Airbnb Seattle dataset, which encompasses a wide array of listings and their defining characteristics, such as property size, available amenities, neighborhood descriptions, and guest reviews.

**Analysis Questions:**

Q1. From a traveler's perspective, does a "superhost" enhance the guest experience?

Q2. What features have the most influence on the success and profitability of an Airbnb listing from an investor's standpoint?

Q3. How significantly do customer reviews influence the booking frequency of a listing?

# 2. Exploratory Data Analysis

## 2.1 Data Understanding

In [9]:
# Import packages
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore')

pd.set_option('display.max_rows', 25)
pd.set_option('display.max_columns', 25)

In [12]:
# Import data
df_listings = pd.read_csv("../data/listings.csv")
df_reviews = pd.read_csv("../data/reviews.csv")

for data in [df_listings,df_reviews]:
    display(data.head(3))
    print(data.shape)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,...,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,12147973,https://www.airbnb.com/rooms/12147973,20160906204935,2016-09-07,Sunny Bungalow in the City,"Cozy, sunny, family home. Master bedroom high...",The house has an open and cozy feel at the sam...,"Cozy, sunny, family home. Master bedroom high...",none,"Roslindale is quiet, convenient and friendly. ...",,"The bus stop is 2 blocks away, and frequent. B...",...,,,,f,,,f,moderate,f,f,1,
1,3075044,https://www.airbnb.com/rooms/3075044,20160906204935,2016-09-07,Charming room in pet friendly apt,Charming and quiet room in a second floor 1910...,Small but cozy and quite room with a full size...,Charming and quiet room in a second floor 1910...,none,"The room is in Roslindale, a diverse and prima...","If you don't have a US cell phone, you can tex...",Plenty of safe street parking. Bus stops a few...,...,10.0,9.0,9.0,f,,,t,moderate,f,f,1,1.3
2,6976,https://www.airbnb.com/rooms/6976,20160906204935,2016-09-07,Mexican Folk Art Haven in Boston,"Come stay with a friendly, middle-aged guy in ...","Come stay with a friendly, middle-aged guy in ...","Come stay with a friendly, middle-aged guy in ...",none,The LOCATION: Roslindale is a safe and diverse...,I am in a scenic part of Boston with a couple ...,"PUBLIC TRANSPORTATION: From the house, quick p...",...,10.0,9.0,10.0,f,,,f,moderate,t,f,1,0.47


(3585, 95)


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...


(68275, 6)


After going through the [data dictionary](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sharing) provided by [Insider Airbnb](http://insideairbnb.com/data-assumptions) , I got a clearer picture of the dataset and picked out the key features I'll need for answering our main questions.

- The calendar dataset shows prices and availability for listings for the next year. It's more about what hosts plan to do in the future, so I'm skipping this data for my analysis.

- In the listings dataset, there's a lot of info about what each lisitng offers. The column **"_host_issuperhost"** will be useful in answering the first question (Q1).

- The reviews dataset provides customer comments and the dates they were left. I can model the sentiment of these reviews to help answer the third question (Q3). There are also 7 **"review_scores"** metrics within the listings dataset that will be helpful for the third question (Q3)

- For the second question (Q2) about how popular or full places are, I would like to use price and occupancy to compare success by total revenue, but occupancy isn't tracked. Thankfully, Inside Airbnb has already looked into the issue of modeling occupancy and suggest using "a Review Rate of 50%" to approximate bookings from the number of reviews. So, I'm going to use the **"_reviews_per_month"** column to estimate of how many times a place gets booked to gauge a listing's success.





