<a href="https://colab.research.google.com/github/yoohanko98/Airbnb-recommender/blob/main/Airbnb_price_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Airbnb-price-prediction
### EE695 Final Project
Yoohan Ko, Alex Walker, Marcin Wisniowski

**Problem Statement**
> Many Airbnb hosts may have a hard time deciding how to best price their rental in current market conditions, and recommendations are to manually examine competing host’s prices. Too high of a price may mean no renters and too low of a price could be potential income wasted. The group would like to look at Airbnb listings from March to September of 2020 in Jersey City to come up with models these hosts could use for setting an ideal price, and also factor in Covid cases as a parameter, if it ends up being applicable.

**Implementation Plan**
> Using these data sources, it will be possible to correlate the parameters of each listing to the price at each property and hopefully see a correlation between Airbnb listing prices and Covid cases. The group will look through the data at the beginning to try to find any variables that show correlation to prices from late October to Early November. Early to late November will include implementation and tweaking the Decision tree and SVM algorithms. Late November to early December will include implementing the neural network algorithm. All of these models will be supervised regression to a price value.

**Team Members and Task Allocation**
- All - Identify and propose important data parameters/clean data
- Yoohan Ko - Specialize in SVM algorithm, secondary in Neural Network
- Alex Walker - Specialize in Neural Network, secondary in Decision Tree
- Marcin Wisniowski - Specialize in Decision Tree, secondary in SVM

testing


# **Index**: 
*   [Pre-processing](#pre-processing)
*   [Neural Network](#neural_network)
*   [SVM](#svm)
*   [Decision Tree](#dtc)





#**Pre-processing** <a name="pre-processing"></a>
#### **Importing the data & libraries**

In [None]:
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import time
from datetime import datetime


from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler

In [None]:
# data is stored in git repo
url = 'https://raw.githubusercontent.com/yoohanko98/Airbnb-recommender/main/listings.csv'

dataset = pd.read_csv(url)
print(f"The dataset contains {len(dataset)} Airbnb listings")
pd.set_option('display.max_columns', len(dataset.columns)) # View all columns
pd.set_option('display.max_rows', 100)
dataset.head(3)

The dataset contains 1428 Airbnb listings


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,40669,https://www.airbnb.com/rooms/40669,20201025134822,2020-10-26,Skyy’s Lounge / Cozy,<b>The space</b><br />Skyy’s Lounge ....Everyt...,The neighborhood is very diverse & friendly sh...,https://a0.muscache.com/pictures/af7e4a45-0118...,175412,https://www.airbnb.com/users/show/175412,Skyy,2010-07-20,"Jersey City, New Jersey, United States",I am the owner of a high end Nail Salon in the...,,,,f,https://a0.muscache.com/im/users/175412/profil...,https://a0.muscache.com/im/users/175412/profil...,,2,2,"['email', 'phone', 'facebook', 'reviews']",t,f,"Jersey City, New Jersey, United States",Ward C (councilmember Richard Boggiano),,40.73742,-74.05255,Private room in condominium,Private room,2,,1 shared bath,1.0,0.0,"[""Carbon monoxide alarm"", ""Hair dryer"", ""Lugga...",$82.00,3,365,3,3,365,365,3.0,365.0,,t,28,58,88,363,2020-10-26,10,0,0,2010-09-23,2019-10-12,100.0,10.0,10.0,10.0,10.0,10.0,10.0,,f,2,0,2,0,0.08
1,63282,https://www.airbnb.com/rooms/63282,20201025134822,2020-10-26,"2bed/2bath,furnished,doorman, by NY",<b>The space</b><br />MINIMUM STAY OF 5 MONTHS...,,https://a0.muscache.com/pictures/388465/eb5f4f...,304762,https://www.airbnb.com/users/show/304762,Gil,2010-11-30,"New York, New York, United States",Very low-impact traveler. I'll treat your plac...,,,,f,https://a0.muscache.com/im/users/304762/profil...,https://a0.muscache.com/im/users/304762/profil...,,1,1,"['email', 'phone', 'jumio', 'offline_governmen...",t,t,,Ward B (councilmember Mira Prinz-Arey),,40.72813,-74.07037,Entire apartment,Entire home/apt,4,,2 baths,2.0,3.0,"[""Carbon monoxide alarm"", ""Elevator"", ""Dryer"",...","$2,000.00",150,730,150,150,730,730,150.0,730.0,,t,30,60,90,365,2020-10-26,0,0,0,,,,,,,,,,,f,1,1,0,0,
2,146144,https://www.airbnb.com/rooms/146144,20201025134822,2020-10-26,Shared Room,"<b>The space</b><br />Hi,<br />Well, this is a...",,https://a0.muscache.com/pictures/923609/cf3964...,266070,https://www.airbnb.com/users/show/266070,Patricia,2010-10-19,"Florence, Tuscany, Italy",I am Executive Director of a global health non...,,,,f,https://a0.muscache.com/im/users/266070/profil...,https://a0.muscache.com/im/users/266070/profil...,,1,1,"['email', 'phone', 'reviews', 'kba']",t,t,,Ward E (councilmember James Solomon),,40.71077,-74.03833,Shared room in apartment,Shared room,1,,,1.0,1.0,[],$200.00,2,2,2,2,2,2,2.0,2.0,,t,30,60,90,365,2020-10-26,0,0,0,,,,,,,,,,,f,1,0,0,1,


In [None]:
# Select columns
cols_to_include = ['room_type', 'bathrooms_text','bedrooms','review_scores_rating', 'number_of_reviews', 'longitude', 'latitude', 'accommodates','price']

def float_bathrooms_text(x):
    if isinstance(x, float):
        return None
    
    if ('half' in x.lower()):
        return 0.5
    else:
        return float(x.split(' ')[0])

testing_portion = 0.20

# Read and select desired fields
dataset = dataset[cols_to_include]

# Process as average
#dataset['review_scores_rating'].fillna((dataset['review_scores_rating'].mean()), inplace=True)

# Process as floats
dataset['bathrooms_text'] = dataset["bathrooms_text"].apply(lambda x: float_bathrooms_text(x))
dataset['room_type'] = dataset["room_type"].map({'Entire home/apt': 1.0, 'Private room': 2.0, 
'Shared room': 3.0, 'Hotel room': 4.0})

dataset['price'] = (dataset['price'].replace( '[\$,)]','', regex=True )
               .replace( '[(]','-',   regex=True ).astype(float))

# Drop nulls
dataset = dataset.dropna()

# Separate x and y
x = dataset[['room_type', 'bathrooms_text', 'bedrooms','review_scores_rating', 'number_of_reviews', 'longitude', 'latitude', 'accommodates']]
y = dataset['price']

# Testing and training split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=testing_portion)

## Neural Network <a name="neural_network"></a>

In [None]:


print('X Training')
print(x_train.head)
print('\n')
print('Y Training')
print(y_train.head)
print('\n')
print('X Testing')
print(x_test.head)
print('\n')
print('Y Testing')
print(y_test.head)
print('\n')

scaler = StandardScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

print('X Train Scaled:')
print(x_train)
print('X Test Scaled:')
print(x_test)

X Training
<bound method NDFrame.head of       room_type  bathrooms_text  bedrooms  ...  longitude  latitude  accommodates
717         2.0             1.0       1.0  ...  -74.07642  40.71002             2
496         2.0             1.0       1.0  ...  -74.09234  40.69183             2
357         1.0             2.0       2.0  ...  -74.03596  40.72391             5
794         1.0             1.0       2.0  ...  -74.05843  40.73346             8
1223        1.0             2.0       2.0  ...  -74.04860  40.72473             4
...         ...             ...       ...  ...        ...       ...           ...
186         2.0             1.0       1.0  ...  -74.06262  40.71176             2
175         1.0             1.0       1.0  ...  -74.03658  40.72413             3
588         1.0             1.0       1.0  ...  -74.04688  40.72575             3
1192        1.0             1.0       2.0  ...  -74.04298  40.75179             6
971         2.0             2.0       1.0  ...  -74.07907

In [None]:
layer1 = (60,30)
alpha = 0.0001
max_iter = 7500

# Give params for neural network
clf = MLPRegressor(hidden_layer_sizes=layer1, activation='logistic', solver='adam', verbose=True, alpha=alpha, max_iter=max_iter)

# Fit neural network
clf.fit(x_train, y_train)

print(clf.score(x_test, y_test))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Iteration 2502, loss = 2435.25623199
Iteration 2503, loss = 2434.41188648
Iteration 2504, loss = 2433.55332662
Iteration 2505, loss = 2432.90590535
Iteration 2506, loss = 2432.13400488
Iteration 2507, loss = 2431.30285326
Iteration 2508, loss = 2430.38831018
Iteration 2509, loss = 2429.46840122
Iteration 2510, loss = 2428.71550122
Iteration 2511, loss = 2428.20255825
Iteration 2512, loss = 2427.07126723
Iteration 2513, loss = 2426.99139261
Iteration 2514, loss = 2425.74509763
Iteration 2515, loss = 2424.91404501
Iteration 2516, loss = 2424.00089128
Iteration 2517, loss = 2423.27342401
Iteration 2518, loss = 2422.35234874
Iteration 2519, loss = 2421.50223262
Iteration 2520, loss = 2420.70098009
Iteration 2521, loss = 2419.97898108
Iteration 2522, loss = 2419.16714769
Iteration 2523, loss = 2418.44542206
Iteration 2524, loss = 2417.43246690
Iteration 2525, loss = 2416.58996210
Iteration 2526, loss = 2415.86332048
Iteration 



## SVM <a name="svm"></a>

## Decision Tree Classifier <a name="dtc"></a>