# Introduction
Airbnb, Inc (Airbnb) adalah *marketplace* untuk memesan penginapan/*homestay* atau paket liburan, secara online melalui aplikasi. Pengguna cukup mengunduh aplikasi Airbnb melalui Google Play/Apple Store, kemudian pengguna akan mencari lokasi properti penginapan, berikut tanggal menginap. Aplikasi kemudian akan menampilkan daftar properti yang tersedia pada lokasi dan tanggal tersebut. Pengguna kemudian membayarkan *fee* kepada Airbnb, untuk diteruskan kepada pemilik properti. Dalam hal ini, perusahaan Airbnb tidak memiliki aset properti yang ditawarkan, namun Airbnb bertindak sebagai perantara (antara pemilik properti [host] dan pengguna). Airbnb berkantor pusat di San Fransisco, California, Amerika Serikat.

Dari properti yang tersedia di Airbnb, apakah faktor penentu harga properti di Airbnb?

# Methodology

SteveZheng (Steve Zheng), merupakan kontributor di Kaggle.com yang mengunggah dataset ["Airbnb price prediction"](https://www.kaggle.com/stevezhenghp/airbnb-price-prediction). Dataset terdiri dari **74111** baris data, dengan **29** fitur

In [6]:
library(tidyverse)
list.files(path = "../input")
airBNB<-read_csv("../input/airbnb-price-prediction/train.csv")

Parsed with column specification:
cols(
  .default = col_character(),
  id = [32mcol_double()[39m,
  log_price = [32mcol_double()[39m,
  accommodates = [32mcol_double()[39m,
  bathrooms = [32mcol_double()[39m,
  cleaning_fee = [33mcol_logical()[39m,
  first_review = [34mcol_date(format = "")[39m,
  host_has_profile_pic = [33mcol_logical()[39m,
  host_identity_verified = [33mcol_logical()[39m,
  host_since = [34mcol_date(format = "")[39m,
  instant_bookable = [33mcol_logical()[39m,
  last_review = [34mcol_date(format = "")[39m,
  latitude = [32mcol_double()[39m,
  longitude = [32mcol_double()[39m,
  number_of_reviews = [32mcol_double()[39m,
  review_scores_rating = [32mcol_double()[39m,
  bedrooms = [32mcol_double()[39m,
  beds = [32mcol_double()[39m
)

See spec(...) for full column specifications.



Fitur-fitur yang tersedia (menurut interpretasi penulis, dikarenakan tidak adanya dokumentasi resmi dari kontributor) adalah sebagai berikut : 

|No| Feature | Description |
|1| ----------- | ----------- |
|2| id | Nomor ID properti |
|3| log_price | Harga sewa properti (dalam bentuk log) |
|4| property_type | Jenis properti (contoh : apartemen, rumah, dsb) |
|5| room_type | Jenis kamar yang disewakan (contoh : entire home/apt --> seluruh rumah/apartemen, private room --> kamar pribadi yang disewakan, Shared room --> kamar yang dibagi dengan pengguna lain) |
|6|amenities|Kelengkapan lain dari properti|
|7|accomodates|Jumlah kapasitas orang yang dapat menginap dalam properti yang disewakan|
|8|bathrooms|Jumlah kamar mandi tersedia|
|9|bed_type|Jenis kasur yang disediakan : Real Bed, Futon, Pull-out Sofa, Couch atau Airbed|
|10|cancellation_policy|Persyaratan pembatalan|
|11|cleaning_fee|Ada atau tidaknya biaya pembersihan kamar|
|12|city|Kota letak properti berada|
|13|description|Deskripsi properti|
|14|first_review|Tanggal review pertama|
|15|host_has_profile_pic|Penanda apakah *host* menampilkan foto profil pada akun Airbnb|
|16|host_identity_verified|Penanda apakah *host* telah diverifikasi oleh Airbnb|
|17|host_response_rate|Kecepatan *host* dalam merespons setiap pertanyaan yang masuk|
|18|host_since|Tanggal *host* mulai memasukkan propertinya pada Airbnb|
|19|instant_bookable|Penanda apakah properti dapat langsung di book tanpa perlu adanya konfirmasi dari *host*|
|20|last_review|Tanggal review terakhir|
|21|latitude|Latitude lokasi properti|



* id = 

In [5]:
summary(airBNB)

       id             log_price     property_type       room_type        
 Min.   :     344   Min.   :0.000   Length:74111       Length:74111      
 1st Qu.: 6261964   1st Qu.:4.317   Class :character   Class :character  
 Median :12254147   Median :4.710   Mode  :character   Mode  :character  
 Mean   :11266617   Mean   :4.782                                        
 3rd Qu.:16402260   3rd Qu.:5.220                                        
 Max.   :21230903   Max.   :7.600                                        
                                                                         
  amenities          accommodates      bathrooms       bed_type        
 Length:74111       Min.   : 1.000   Min.   :0.000   Length:74111      
 Class :character   1st Qu.: 2.000   1st Qu.:1.000   Class :character  
 Mode  :character   Median : 2.000   Median :1.000   Mode  :character  
                    Mean   : 3.155   Mean   :1.235                     
                    3rd Qu.: 4.000   3rd Qu.:1.0

In [7]:
class(airBNB)

In [1]:
## Importing packages

# This R environment comes with all of CRAN and many other helpful packages preinstalled.
# You can see which packages are installed by checking out the kaggle/rstats docker image: 
# https://github.com/kaggle/docker-rstats

library(tidyverse) # metapackage with lots of helpful functions

## Running code

# In a notebook, you can run a single code cell by clicking in the cell and then hitting 
# the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
# you can run code by highlighting the code you want to run and then clicking the blue arrow
# at the bottom of this window.

## Reading in files

# You can access files from datasets you've added to this kernel in the "../input/" directory.
# You can see the files added to this kernel by running the code below. 

list.files(path = "../input")

## Saving data

# If you save any files or images, these will be put in the "output" directory. You 
# can see the output directory by committing and running your kernel (using the 
# Commit & Run button) and then checking out the compiled version of your kernel.

airBNB<-read_csv("../input/airbnb-price-prediction/train.csv")
summary(airBNB)
airBNB$host_response_rate<-as.numeric(gsub("[^[:alnum:][:blank:]+?&/\\-]", "", airBNB$host_response_rate))
airBNB$host_length_time<-as.numeric(difftime(as.Date(Sys.Date()), as.Date(airBNB$host_since),units = c("days")))
airBNB$review_scores_rating[is.na(airBNB$review_scores_rating)] <- 0
head(airBNB$review_scores_rating)

tail(airBNB$host_length_time)


as.numeric("100%")
linearModel<-lm(log_price ~ property_type+room_type+accommodates+bathrooms+bed_type+cancellation_policy+cleaning_fee+city+host_has_profile_pic+host_identity_verified+host_length_time+instant_bookable+latitude+longitude+review_scores_rating+beds+bedrooms,data=airBNB)
summary(linearModel)

# Perlu dipisahkan antara listing yang belum pernah ada review vs yang sudah ada review

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──

[32m✔[39m [34mggplot2[39m 3.2.1.[31m9000[39m     [32m✔[39m [34mpurrr  [39m 0.3.3     
[32m✔[39m [34mtibble [39m 2.1.3          [32m✔[39m [34mdplyr  [39m 0.8.3     
[32m✔[39m [34mtidyr  [39m 1.0.0          [32m✔[39m [34mstringr[39m 1.4.0     
[32m✔[39m [34mreadr  [39m 1.3.1          [32m✔[39m [34mforcats[39m 0.4.0     

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



Parsed with column specification:
cols(
  .default = col_character(),
  id = [32mcol_double()[39m,
  log_price = [32mcol_double()[39m,
  accommodates = [32mcol_double()[39m,
  bathrooms = [32mcol_double()[39m,
  cleaning_fee = [33mcol_logical()[39m,
  first_review = [34mcol_date(format = "")[39m,
  host_has_profile_pic = [33mcol_logical()[39m,
  host_identity_verified = [33mcol_logical()[39m,
  host_since = [34mcol_date(format = "")[39m,
  instant_bookable = [33mcol_logical()[39m,
  last_review = [34mcol_date(format = "")[39m,
  latitude = [32mcol_double()[39m,
  longitude = [32mcol_double()[39m,
  number_of_reviews = [32mcol_double()[39m,
  review_scores_rating = [32mcol_double()[39m,
  bedrooms = [32mcol_double()[39m,
  beds = [32mcol_double()[39m
)

See spec(...) for full column specifications.



       id             log_price     property_type       room_type        
 Min.   :     344   Min.   :0.000   Length:74111       Length:74111      
 1st Qu.: 6261964   1st Qu.:4.317   Class :character   Class :character  
 Median :12254147   Median :4.710   Mode  :character   Mode  :character  
 Mean   :11266617   Mean   :4.782                                        
 3rd Qu.:16402260   3rd Qu.:5.220                                        
 Max.   :21230903   Max.   :7.600                                        
                                                                         
  amenities          accommodates      bathrooms       bed_type        
 Length:74111       Min.   : 1.000   Min.   :0.000   Length:74111      
 Class :character   1st Qu.: 2.000   1st Qu.:1.000   Class :character  
 Mode  :character   Median : 2.000   Median :1.000   Mode  :character  
                    Mean   : 3.155   Mean   :1.235                     
                    3rd Qu.: 4.000   3rd Qu.:1.0

“NAs introduced by coercion”



Call:
lm(formula = log_price ~ property_type + room_type + accommodates + 
    bathrooms + bed_type + cancellation_policy + cleaning_fee + 
    city + host_has_profile_pic + host_identity_verified + host_length_time + 
    instant_bookable + latitude + longitude + review_scores_rating + 
    beds + bedrooms, data = airBNB)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8028 -0.2940 -0.0245  0.2581  3.9601 

Coefficients:
                                     Estimate Std. Error  t value Pr(>|t|)    
(Intercept)                        -7.346e+01  1.678e+00  -43.777  < 2e-16 ***
property_typeBed & Breakfast        1.762e-01  2.197e-02    8.022 1.06e-15 ***
property_typeBoat                   2.943e-01  5.765e-02    5.104 3.34e-07 ***
property_typeBoutique hotel         4.000e-01  5.603e-02    7.139 9.47e-13 ***
property_typeBungalow              -7.801e-03  2.457e-02   -0.317 0.750883    
property_typeCabin                 -1.311e-01  5.522e-02   -2.374 0.017587 *  
property_typ